# More topics on the exponential distribution

This is a continuation of two previous posts on the exponential distribution (an introduction and a post on the connection with the Poisson process). This post presents more properties that are not discussed in the two previous posts. Of course, a serious and in-depth discussion of the exponential distribution can fill volumes. The goal here is quite modest – to present a few more properties related to the memoryless property of the exponential distribution.

_______________________________________________________________________________________________

The Failure Rate

A previous post discusses the Poisson process and its relation to the exponential distribution. Now we present another way of looking at both notions. Suppose a counting process counts the occurrences of a type of random events. Suppose that an event means a termination of a system, be it biological or manufactured. Furthermore suppose that the terminations occur according to a Poisson process at a constant rate $\lambda$ per unit time. Then what is the meaning of the rate $\lambda$? It is the rate of termination (dying). It is usually called the failure rate (or hazard rate or force of mortality). The meaning of the constant rate $\lambda$ is that the rate of dying is the same regardless of the location of the time scale (i.e. regardless how long a life has lived). This means that the lifetime (the time until death) of such a system has no memory. Since the exponential distribution is the only continuous distribution with the memoryless property, the time until the next termination inherent in the Poisson process in question must be an exponential random variable with rate $\lambda$ or mean $\frac{1}{\lambda}$. So the notion of failure rate function (or hazard rate function) runs through the notions of exponential and Poisson process and further illustrates the memoryless property.

Consider a continuous random variable $X$ that only takes on the positive real numbers. Suppose $F$ and $f$ are the CDF and density function of $X$, respectively. The survival function is $S=1-F$. The failure rate (or hazard rate) $\mu(t)$ is defined as:

$\displaystyle \mu(t)=\frac{f(t)}{1-F(t)}=\frac{f(t)}{S(t)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

The function $\mu(t)$ can be interpreted as the rate of failure at the next instant given that the life has survived to time $t$. Suppose that the lifetime distribution is exponential. Because of the memoryless proeprty, the remaining lifetime of a $t$-year old is the same as the lifetime distribution for a new item. It is then intuitively clear that the failure rate must be constant. Indeed, it is straightforward to show that if the lifetime $X$ is an exponential random variable with rate parameter $\lambda$, i.e. the density is $f(t)=\lambda e^{-\lambda t}$, then the failure rate is $\mu(t)=\lambda$. This is why the parameter $\lambda$ in the density function $f(t)=\lambda e^{-\lambda t}$ is called the rate parameter.

On the other hand, the failure rate or hazard rate is not constant for other lifetime distributions. However, the hazard rate function $\mu(t)$ uniquely determines the distributional quantities such as the CDF and the survival function. The definition $(1)$ shows that the failure rate is derived using the CDF or survival function. We show that the failure rate is enough information to derive the CDF or the survival function. From the definition, we have:

$\displaystyle \mu(t)=\frac{-\frac{d}{dt} S(t)}{S(t)} \ \ \ \text{or} \ \ \ -\mu(t)=\frac{\frac{d}{dt} S(t)}{S(t)}$

Integrate both sides produces the following:

$\displaystyle - \int_0^t \ \mu(x) \ dx=\int_0^t \ \frac{\frac{d}{dt} S(t)}{S(t)} \ dx=\text{ln} S(t)$

Exponentiate on each side produces the following:

$\displaystyle S(t)=e^{-\int_0^t \ \mu(x) \ dx} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

Once the survival function is obtained from $\mu(t)$, the CDF and the density function can be obtained. Interestingly, the derivation in $(2)$ can give another proof of the fact that the exponential distribution is the only one with the memoryless property. If $X$ is memoryless, then the failure rate must be constant. If $\mu(x)$ is constant, then by $(2)$, $S(t)$ is the exponential survival function. Of course, the other way is clear: if $X$ is exponential, then $X$ is memoryless.

The preceding discussion shows that having a constant failure rate is another way to characterize the exponential distribution, in particular the memoryless property of the exponential distribution. Before moving onto the next topics, another example of a failure rate function is $\mu(t)=\frac{\alpha}{\beta} (\frac{t}{\beta})^{\alpha-1}$ where both $\alpha$ and $\beta$ are positive constants. This is the Weibull hazard rate function. The derived survival function is $S(t)=e^{-(\frac{t}{\beta})^\alpha}$, which is the survival function for the Weibull distribution. This distribution is an excellent model choice for describing the life of manufactured objects. See here for an introduction to the Weibull distribution.

_______________________________________________________________________________________________

The Minimum Statistic

The following result is about the minimum of independent exponential distributions.

Suppose that $X_1,X_2,\cdots,X_n$ are independent exponential random variables with rates $\lambda_1,\lambda_2,\cdots,\lambda_n$, respectively. Then the minimum of the sample, denoted by $Y=\text{min}(X_1,X_2,\cdots,X_n)$, is also an exponential random variable with rate $\lambda_1+\lambda_2 +\cdots+\lambda_n. \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)$

For the minimum to be $> y$, all sample items must be $> y$, Thus $P[Y> y]$ is:

\displaystyle \begin{aligned} P[Y> y]&=P[X_1> y] \times P[X_2> y] \times \cdots \times P[X_n> y] \\&=e^{-\lambda_1 y} \times e^{-\lambda_2 y} \times \cdots \times e^{-\lambda_n y}\\&=e^{-(\lambda_1+\lambda_2+\cdots+\lambda_n) y} \end{aligned}

This means that $Y=\text{min}(X_1,X_2,\cdots,X_n)$ has an exponential distribution with rate $\lambda_1+\lambda_2+\cdots+\lambda_n$. As a result, the smallest item of a sample of independent exponential observations also follows an exponential distribution with rate being the sum of all the individual exponential rates.

Example 1
Suppose that $X_1,X_2,X_3$ are independent exponential random variables with rates $\lambda_i$ for $i=1,2,3$. Calculate $E[\text{min}(X_1,X_2,X_3)]$ and $E[\text{max}(X_1,X_2,X_3)]$.

Let $X=\text{min}(X_1,X_2,X_3)$ and $Y=\text{max}(X_1,X_2,X_3)$. Then $X$ is an exponential distribution with rate $\lambda_1+\lambda_2+\lambda_3$. As a result, $E[X]=\frac{1}{\lambda_1+\lambda_2+\lambda_3}$. Finding the expected value of the maximum requires calculation. First calculate $P[Y \le y]$.

\displaystyle \begin{aligned} P[Y \le y]&=P[X_1 \le y] \times P[X_2 \le y] \times P[X_3 \le y] \\&=(1-e^{-\lambda_1 y}) \times (1-e^{-\lambda_2 y}) \times (1-e^{-\lambda_3 y}) \\&=1-e^{-\lambda_1 y}-e^{-\lambda_2 y}-e^{-\lambda_3 y}+e^{-(\lambda_1+\lambda_2) y}+e^{-(\lambda_1+\lambda_3) y} \\& \ \ \ +e^{-(\lambda_2+\lambda_3) y}-e^{-(\lambda_1+\lambda_2+\cdots+\lambda_n) y} \end{aligned}

Differentiate $F_Y(y)=P[Y \le y]$ to obtain the density function $f_Y(y)$.

\displaystyle \begin{aligned} f_Y(y)&=\lambda_1 e^{-\lambda_1 y}+\lambda_2 e^{-\lambda_2 y}+ \lambda_3 e^{-\lambda_3 y} \\& \ \ \ -(\lambda_1+\lambda_2) e^{-(\lambda_1+\lambda_2) y}-(\lambda_1+\lambda_3) e^{-(\lambda_1+\lambda_3) y}-(\lambda_2+\lambda_3)e^{-(\lambda_2+\lambda_3) y} \\& \ \ \ \ +(\lambda_1+\lambda_2+\cdots+\lambda_n)e^{-(\lambda_1+\lambda_2+\cdots+\lambda_n) y} \end{aligned}

Each term in the density function $f_Y(y)$ is by itself an exponential density. Thus the mean of the maximum is:

$\displaystyle E[Y]=\frac{1}{\lambda_1}+\frac{1}{\lambda_2}+ \frac{1}{\lambda_3} -\frac{1}{\lambda_1+\lambda_2}-\frac{1}{\lambda_1+\lambda_3}-\frac{1}{\lambda_2+\lambda_3}+\frac{1}{\lambda_1+\lambda_2+\lambda_3}$

To make sense of the numbers, let $\lambda_1=\lambda_2=\lambda_3=1$. Then $E[X]=\frac{1}{3}=\frac{2}{6}$ and $E[Y]=\frac{11}{6}$. In this case, the expected value of the maximum is 5.5 times larger than the expected value of the minimum. For $\lambda_1=1$, $\lambda_2=2$ and $\lambda_3=3$, $E[X]=\frac{1}{6}$ and $E[Y]=\frac{73}{60}=\frac{7.3}{6}$. In the second case, the expected value of the maximum is 7.3 times larger than the expected value of the minimum. $\square$

_______________________________________________________________________________________________

Ranking Independent Exponential Distributions

In this section, $X_1,X_2,\cdots,X_n$ are independent exponential random variables where the rate of $X_i$ is $\lambda_i$ for $i=1,2,\cdots,n$. What is the probability of $P[X_{j(1)} ? Here the subscripts $j(1),\cdots,j(k)$ are distinct integers from $\left\{1,2,\cdots,n \right\}$. For example, for sample size of 2, what are the probabilities $P[X_1 and $P[X_2? For sample size of 3, what are $P[X_1 and $P[X_2? First, the case of ranking two independent exponential random variables.

Ranking $X_1$ and $X_2$.

$\displaystyle P[X_1

where $\lambda_1$ is the rate of $X_1$ and $\lambda_2$ is the rate of $X_2$. Note that this probability is the ratio of the rate of the smaller exponential random variable over the total rate. The probability $P[X_1 can be computed by evaluating the following integral:

$\displaystyle P[X_1

The natural next step is to rank three or more exponential random variables. Ranking three variables would require a triple integral and ranking more variables would require a larger multiple integral. Instead, we rely on a useful fact about the minimum statistic. First, another basic result.

When one of the variables is the minimum:

$\displaystyle P[X_i=\text{min}(X_1,\cdots,X_n)]=\frac{\lambda_i}{\lambda_1+\lambda_2+\cdots+\lambda_n} \ \ \ \ \ \ \ \ \ \ \ \ \ (5)$

The above says that the probability that the $i$th random variable is the smallest is simply the ratio of the rate of the $i$th variable over the total rate. This follows from $(4)$ since we are ranking the two exponential variables $X_i$ and $\text{min}(X_1,\cdots,X_{i-1},X_{i+1},\cdots,X_n)$.

We now consider the following theorem.

Theorem 1
Let $X_1,X_2,\cdots,X_n$ be independent exponential random variables. Then the minimum statistic $\text{min}(X_1,X_2,\cdots,X_n)$ and the rank ordering of $X_i$ are independent.

The theorem basically says that the probability of a ranking is not dependent on the location of the minimum statistic. For example, if we know that the minimum is more than 3, what is the probability of $X_1? The theorem is saying that conditioning on $\text{min}(X_1,X_2,X_3)>3$ makes no difference on the probability of the ranking. Let $Y=\text{min}(X_1,X_2,\cdots,X_n)$. The following establishes the theorem.

$\displaystyle P[X_{j(1)}t]$

$\displaystyle =P[X_{j(1)}-tt]$

$\displaystyle =P[X_{j(1)}

The key to the proof is the step to go from the second line to the third line. Assume that each $X_i$ is the lifetime of a machine. When $\text{min}(X_1,X_2,\cdots,X_n)>t$, all the lifetimes $X_i>t$. By the memoryless property, the remaining lifetimes $X_i-t$ are independent and exponential with the original rates. In other words, each $X_i-t$ has the same exponential distribution as $X_i$. Consequently, the second line equals to the third line. To make it easier to see the step from the second line to the third line, think of the two dimensional case with $X_1$ and $X_2$. $\square$

The following is a consequence of Theorem 1.

Corollary 2
Let $X_1,X_2,\cdots,X_n$ be independent exponential random variables. Then the event $X_i=\text{min}(X_1,X_2,\cdots,X_n)$ and the rank ordering of $X_1,\cdots,X_{i-1},X_{i+1}\cdots,X_n$ are independent.

Another way to state the corollary is that the knowledge that $X_i$ is the smallest in the sample has no effect on the ranking of the variables other than $X_i$. This is the consequence of Theorem 1. To see why, let $t=X_i=\text{min}(X_1,X_2,\cdots,X_n)$.

$\displaystyle P[X_{j(1)}

$\displaystyle =P[X_{j(1)}t]$

$\displaystyle =P[X_{j(1)}

We now present examples demonstrating how these ideas are used.

Example 2
Suppose that a bank has three tellers for serving its customers. The random variable $X_1,X_2,X_3$ are independent exponential random variables where $X_i$ is the time spent by teller $i$ serving a customer. The rate parameter of $X_i$ is $\lambda_i$ where $i=1,2,3$. If all three tellers are busy serving customers, what is $P[X_1? If the bank has 4 tellers instead, then what is $P[X_1?

The answer is given by the following:

\displaystyle \begin{aligned} P[X_1

The derivation uses Corollary 2 and the idea in $(5)$. The same idea can be used for $P[X_1.

\displaystyle \begin{aligned} P[X_1

The above applies the independence result twice, the first time on $X_1,X_2,X_3,X_4$, the second time on $X_2,X_3,X_4$. This approach is much preferred over direction calculation, which would involve integral calculation that is tedious and error prone. $\square$

Example 3
As in Example 2, a bank has three tellers for serving its customers. The service times of the three tellers are independent exponential random variables. The mean service time for teller $i$ is $\frac{1}{\lambda_i}$ minutes where $i=1,2,3$. You walk into the bank and find that all three tellers are busy serving customers. You are the only customer waiting for an available teller. Calculate the expected amount of time you spend at the bank.

Let $T$ be the total time you spend in the bank, which is $T=W+S$ where $W$ is the waiting time for a teller to become free and $S$ is the service time of the teller helping you. When you walk into the bank, the tellers are already busy. Let $X_i$ be the remaining service time for teller $i$, $i=1,2,3$. By the memoryless property, $X_i$ is exponential with the original mean $\frac{1}{\lambda_i}$. As a result, the rate parameter of $X_i$ is $\lambda_i$.

The waiting time $W$ is simply $\text{min}(X_1,X_2,X_3)$. Thus $E[W]=\frac{1}{\lambda_1+\lambda_2+\lambda_3}$. To find $E[S]$, we need to consider three cases, depending on which teller finishes serving the current customer first.

\displaystyle \begin{aligned} E[S]&=E[S | X_1=\text{min}(X_1,X_2,X_3)] \times P[X_1=\text{min}(X_1,X_2,X_3)] \\& \ \ + E[S | X_2=\text{min}(X_1,X_2,X_3)] \times P[X_2=\text{min}(X_1,X_2,X_3)] \\& \ \ + E[S | X_3=\text{min}(X_1,X_2,X_3)] \times P[X_3=\text{min}(X_1,X_2,X_3)]\end{aligned}

Finishing the calculation,

\displaystyle \begin{aligned} E[S]&=\frac{1}{\lambda_1} \times \frac{\lambda_1}{\lambda_1+\lambda_2+\lambda_3} + \frac{1}{\lambda_2} \times \frac{\lambda_2}{\lambda_1+\lambda_2+\lambda_3} + \frac{1}{\lambda_3} \times \frac{\lambda_3}{\lambda_1+\lambda_2+\lambda_3} \\&=\frac{3}{\lambda_1+\lambda_2+\lambda_3} \end{aligned}

$\displaystyle E[T]=\frac{1}{\lambda_1+\lambda_2+\lambda_3}+\frac{3}{\lambda_1+\lambda_2+\lambda_3}=\frac{4}{\lambda_1+\lambda_2+\lambda_3}$

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$