More topics on the exponential distribution

This is a continuation of two previous posts on the exponential distribution (an introduction and a post on the connection with the Poisson process). This post presents more properties that are not discussed in the two previous posts. Of course, a serious and in-depth discussion of the exponential distribution can fill volumes. The goal here is quite modest – to present a few more properties related to the memoryless property of the exponential distribution.

_______________________________________________________________________________________________

The Failure Rate

A previous post discusses the Poisson process and its relation to the exponential distribution. Now we present another way of looking at both notions. Suppose a counting process counts the occurrences of a type of random events. Suppose that an event means a termination of a system, be it biological or manufactured. Furthermore suppose that the terminations occur according to a Poisson process at a constant rate \lambda per unit time. Then what is the meaning of the rate \lambda? It is the rate of termination (dying). It is usually called the failure rate (or hazard rate or force of mortality). The meaning of the constant rate \lambda is that the rate of dying is the same regardless of the location of the time scale (i.e. regardless how long a life has lived). This means that the lifetime (the time until death) of such a system has no memory. Since the exponential distribution is the only continuous distribution with the memoryless property, the time until the next termination inherent in the Poisson process in question must be an exponential random variable with rate \lambda or mean \frac{1}{\lambda}. So the notion of failure rate function (or hazard rate function) runs through the notions of exponential and Poisson process and further illustrates the memoryless property.

Consider a continuous random variable X that only takes on the positive real numbers. Suppose F and f are the CDF and density function of X, respectively. The survival function is S=1-F. The failure rate (or hazard rate) \mu(t) is defined as:

    \displaystyle \mu(t)=\frac{f(t)}{1-F(t)}=\frac{f(t)}{S(t)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

The function \mu(t) can be interpreted as the rate of failure at the next instant given that the life has survived to time t. Suppose that the lifetime distribution is exponential. Because of the memoryless proeprty, the remaining lifetime of a t-year old is the same as the lifetime distribution for a new item. It is then intuitively clear that the failure rate must be constant. Indeed, it is straightforward to show that if the lifetime X is an exponential random variable with rate parameter \lambda, i.e. the density is f(t)=\lambda e^{-\lambda t}, then the failure rate is \mu(t)=\lambda. This is why the parameter \lambda in the density function f(t)=\lambda e^{-\lambda t} is called the rate parameter.

On the other hand, the failure rate or hazard rate is not constant for other lifetime distributions. However, the hazard rate function \mu(t) uniquely determines the distributional quantities such as the CDF and the survival function. The definition (1) shows that the failure rate is derived using the CDF or survival function. We show that the failure rate is enough information to derive the CDF or the survival function. From the definition, we have:

    \displaystyle \mu(t)=\frac{-\frac{d}{dt} S(t)}{S(t)} \ \ \  \text{or} \ \ \ -\mu(t)=\frac{\frac{d}{dt} S(t)}{S(t)}

Integrate both sides produces the following:

    \displaystyle - \int_0^t \ \mu(x) \ dx=\int_0^t \ \frac{\frac{d}{dt} S(t)}{S(t)} \ dx=\text{ln} S(t)

Exponentiate on each side produces the following:

    \displaystyle S(t)=e^{-\int_0^t \ \mu(x) \ dx} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

Once the survival function is obtained from \mu(t), the CDF and the density function can be obtained. Interestingly, the derivation in (2) can give another proof of the fact that the exponential distribution is the only one with the memoryless property. If X is memoryless, then the failure rate must be constant. If \mu(x) is constant, then by (2), S(t) is the exponential survival function. Of course, the other way is clear: if X is exponential, then X is memoryless.

The preceding discussion shows that having a constant failure rate is another way to characterize the exponential distribution, in particular the memoryless property of the exponential distribution. Before moving onto the next topics, another example of a failure rate function is \mu(t)=\frac{\alpha}{\beta} (\frac{t}{\beta})^{\alpha-1} where both \alpha and \beta are positive constants. This is the Weibull hazard rate function. The derived survival function is S(t)=e^{-(\frac{t}{\beta})^\alpha}, which is the survival function for the Weibull distribution. This distribution is an excellent model choice for describing the life of manufactured objects. See here for an introduction to the Weibull distribution.

_______________________________________________________________________________________________

The Minimum Statistic

The following result is about the minimum of independent exponential distributions.

    Suppose that X_1,X_2,\cdots,X_n are independent exponential random variables with rates \lambda_1,\lambda_2,\cdots,\lambda_n, respectively. Then the minimum of the sample, denoted by Y=\text{min}(X_1,X_2,\cdots,X_n), is also an exponential random variable with rate \lambda_1+\lambda_2 +\cdots+\lambda_n. \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)

For the minimum to be > y, all sample items must be > y, Thus P[Y> y] is:

    \displaystyle \begin{aligned} P[Y> y]&=P[X_1> y] \times P[X_2> y] \times \cdots \times P[X_n> y] \\&=e^{-\lambda_1 y} \times e^{-\lambda_2 y} \times \cdots \times e^{-\lambda_n y}\\&=e^{-(\lambda_1+\lambda_2+\cdots+\lambda_n) y}  \end{aligned}

This means that Y=\text{min}(X_1,X_2,\cdots,X_n) has an exponential distribution with rate \lambda_1+\lambda_2+\cdots+\lambda_n. As a result, the smallest item of a sample of independent exponential observations also follows an exponential distribution with rate being the sum of all the individual exponential rates.

Example 1
Suppose that X_1,X_2,X_3 are independent exponential random variables with rates \lambda_i for i=1,2,3. Calculate E[\text{min}(X_1,X_2,X_3)] and E[\text{max}(X_1,X_2,X_3)].

Let X=\text{min}(X_1,X_2,X_3) and Y=\text{max}(X_1,X_2,X_3). Then X is an exponential distribution with rate \lambda_1+\lambda_2+\lambda_3. As a result, E[X]=\frac{1}{\lambda_1+\lambda_2+\lambda_3}. Finding the expected value of the maximum requires calculation. First calculate P[Y \le y].

    \displaystyle \begin{aligned} P[Y \le y]&=P[X_1 \le y] \times P[X_2 \le y] \times  P[X_3 \le y] \\&=(1-e^{-\lambda_1 y}) \times (1-e^{-\lambda_2 y}) \times (1-e^{-\lambda_3 y}) \\&=1-e^{-\lambda_1 y}-e^{-\lambda_2 y}-e^{-\lambda_3 y}+e^{-(\lambda_1+\lambda_2) y}+e^{-(\lambda_1+\lambda_3) y} \\& \ \ \ +e^{-(\lambda_2+\lambda_3) y}-e^{-(\lambda_1+\lambda_2+\cdots+\lambda_n) y}  \end{aligned}

Differentiate F_Y(y)=P[Y \le y] to obtain the density function f_Y(y).

    \displaystyle \begin{aligned} f_Y(y)&=\lambda_1 e^{-\lambda_1 y}+\lambda_2 e^{-\lambda_2 y}+ \lambda_3 e^{-\lambda_3 y} \\& \ \ \ -(\lambda_1+\lambda_2) e^{-(\lambda_1+\lambda_2) y}-(\lambda_1+\lambda_3) e^{-(\lambda_1+\lambda_3) y}-(\lambda_2+\lambda_3)e^{-(\lambda_2+\lambda_3) y} \\& \ \ \ \ +(\lambda_1+\lambda_2+\cdots+\lambda_n)e^{-(\lambda_1+\lambda_2+\cdots+\lambda_n) y}  \end{aligned}

Each term in the density function f_Y(y) is by itself an exponential density. Thus the mean of the maximum is:

    \displaystyle E[Y]=\frac{1}{\lambda_1}+\frac{1}{\lambda_2}+ \frac{1}{\lambda_3} -\frac{1}{\lambda_1+\lambda_2}-\frac{1}{\lambda_1+\lambda_3}-\frac{1}{\lambda_2+\lambda_3}+\frac{1}{\lambda_1+\lambda_2+\lambda_3}

To make sense of the numbers, let \lambda_1=\lambda_2=\lambda_3=1. Then E[X]=\frac{1}{3}=\frac{2}{6} and E[Y]=\frac{11}{6}. In this case, the expected value of the maximum is 5.5 times larger than the expected value of the minimum. For \lambda_1=1, \lambda_2=2 and \lambda_3=3, E[X]=\frac{1}{6} and E[Y]=\frac{73}{60}=\frac{7.3}{6}. In the second case, the expected value of the maximum is 7.3 times larger than the expected value of the minimum. \square

_______________________________________________________________________________________________

Ranking Independent Exponential Distributions

In this section, X_1,X_2,\cdots,X_n are independent exponential random variables where the rate of X_i is \lambda_i for i=1,2,\cdots,n. What is the probability of P[X_{j(1)} <X_{j(2)}<\cdots<X_{j(k)}]? Here the subscripts j(1),\cdots,j(k) are distinct integers from \left\{1,2,\cdots,n \right\}. For example, for sample size of 2, what are the probabilities P[X_1<X_2] and P[X_2<X_1]? For sample size of 3, what are P[X_1<X_2<X_3] and P[X_2<X_1<X_3]? First, the case of ranking two independent exponential random variables.

    Ranking X_1 and X_2.

      \displaystyle P[X_1<X_2]=\frac{\lambda_1}{\lambda_1+\lambda_2} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)

    where \lambda_1 is the rate of X_1 and \lambda_2 is the rate of X_2. Note that this probability is the ratio of the rate of the smaller exponential random variable over the total rate. The probability P[X_1<X_2] can be computed by evaluating the following integral:

      \displaystyle P[X_1<X_2]=\int_0^\infty \int_x^\infty \ \lambda_1 e^{-\lambda_1 \ x} \ \lambda_2 e^{-\lambda_2 \ x} \ dy \ dx

The natural next step is to rank three or more exponential random variables. Ranking three variables would require a triple integral and ranking more variables would require a larger multiple integral. Instead, we rely on a useful fact about the minimum statistic. First, another basic result.

    When one of the variables is the minimum:

      \displaystyle P[X_i=\text{min}(X_1,\cdots,X_n)]=\frac{\lambda_i}{\lambda_1+\lambda_2+\cdots+\lambda_n} \ \ \ \ \ \ \ \ \ \ \ \ \ (5)

    The above says that the probability that the ith random variable is the smallest is simply the ratio of the rate of the ith variable over the total rate. This follows from (4) since we are ranking the two exponential variables X_i and \text{min}(X_1,\cdots,X_{i-1},X_{i+1},\cdots,X_n).

We now consider the following theorem.

Theorem 1
Let X_1,X_2,\cdots,X_n be independent exponential random variables. Then the minimum statistic \text{min}(X_1,X_2,\cdots,X_n) and the rank ordering of X_i are independent.

The theorem basically says that the probability of a ranking is not dependent on the location of the minimum statistic. For example, if we know that the minimum is more than 3, what is the probability of X_1<X_2<X_3? The theorem is saying that conditioning on \text{min}(X_1,X_2,X_3)>3 makes no difference on the probability of the ranking. Let Y=\text{min}(X_1,X_2,\cdots,X_n). The following establishes the theorem.

    \displaystyle P[X_{j(1)}<X_{j(2)}<\cdots<X_{j(k)} \ |\ \text{min}(X_1,X_2,\cdots,X_n)>t]

    \displaystyle =P[X_{j(1)}-t<X_{j(2)}-t<\cdots<X_{j(k)}-t \ |\ \text{min}(X_1,X_2,\cdots,X_n)>t]

    \displaystyle =P[X_{j(1)}<X_{j(2)}<\cdots<X_{j(k)}]

The key to the proof is the step to go from the second line to the third line. Assume that each X_i is the lifetime of a machine. When \text{min}(X_1,X_2,\cdots,X_n)>t, all the lifetimes X_i>t. By the memoryless property, the remaining lifetimes X_i-t are independent and exponential with the original rates. In other words, each X_i-t has the same exponential distribution as X_i. Consequently, the second line equals to the third line. To make it easier to see the step from the second line to the third line, think of the two dimensional case with X_1 and X_2. \square

The following is a consequence of Theorem 1.

Corollary 2
Let X_1,X_2,\cdots,X_n be independent exponential random variables. Then the event X_i=\text{min}(X_1,X_2,\cdots,X_n) and the rank ordering of X_1,\cdots,X_{i-1},X_{i+1}\cdots,X_n are independent.

Another way to state the corollary is that the knowledge that X_i is the smallest in the sample has no effect on the ranking of the variables other than X_i. This is the consequence of Theorem 1. To see why, let t=X_i=\text{min}(X_1,X_2,\cdots,X_n).

    \displaystyle P[X_{j(1)}<X_{j(2)}<\cdots<X_{j(k)} \ |\ t=X_i=\text{min}(X_1,X_2,\cdots,X_n)]

    \displaystyle =P[X_{j(1)}<X_{j(2)}<\cdots<X_{j(k)} \ |\ \text{min}(X_1,\cdots,X_{i-1},X_{i+1}\cdots,X_n)>t]

    \displaystyle =P[X_{j(1)}<X_{j(2)}<\cdots<X_{j(k)}] \square

We now present examples demonstrating how these ideas are used.

Example 2
Suppose that a bank has three tellers for serving its customers. The random variable X_1,X_2,X_3 are independent exponential random variables where X_i is the time spent by teller i serving a customer. The rate parameter of X_i is \lambda_i where i=1,2,3. If all three tellers are busy serving customers, what is P[X_1<X_2<X_3]? If the bank has 4 tellers instead, then what is P[X_1<X_2<X_3<X_4]?

The answer is given by the following:

    \displaystyle \begin{aligned} P[X_1<X_2<X_3]&=P[X_1=\text{min}(X_1,X_2,X_3)] \\& \ \ \ \times P[X_2<X_3 | X_1=\text{min}(X_1,X_2,X_3)] \\&\text{ } \\&=P[X_1=\text{min}(X_1,X_2,X_3)] \times P[X_2<X_3] \\&\text{ } \\&=\frac{\lambda_1}{\lambda_1+\lambda_2+\lambda_3} \times \frac{\lambda_2}{\lambda_2+\lambda_3}  \end{aligned}

The derivation uses Corollary 2 and the idea in (5). The same idea can be used for P[X_1<X_2<X_3<X_4].

    \displaystyle \begin{aligned} P[X_1<X_2<X_3<X_4]&=P[X_1=\text{min}(X_1,X_2,X_3,X_4)] \\& \ \ \times P[X_2<X_3<X_4 | X_1=\text{min}(X_1,X_2,X_3,X_4)] \\&\text{ } \\&=P[X_1=\text{min}(X_1,X_2,X_3,X_4)] \times P[X_2<X_3<X_4] \\&\text{ } \\&=P[X_1=\text{min}(X_1,X_2,X_3,X_4)] \\& \ \ \ \times P[X_2=\text{min}(X_2,X_3,X_4)] \times P[X_3<X_4] \\&\text{ } \\&=\frac{\lambda_1}{\lambda_1+\lambda_2+\lambda_3+\lambda_4} \times \frac{\lambda_2}{\lambda_2+\lambda_3+\lambda_4} \times \frac{\lambda_3}{\lambda_3+\lambda_4} \end{aligned}

The above applies the independence result twice, the first time on X_1,X_2,X_3,X_4, the second time on X_2,X_3,X_4. This approach is much preferred over direction calculation, which would involve integral calculation that is tedious and error prone. \square

Example 3
As in Example 2, a bank has three tellers for serving its customers. The service times of the three tellers are independent exponential random variables. The mean service time for teller i is \frac{1}{\lambda_i} minutes where i=1,2,3. You walk into the bank and find that all three tellers are busy serving customers. You are the only customer waiting for an available teller. Calculate the expected amount of time you spend at the bank.

Let T be the total time you spend in the bank, which is T=W+S where W is the waiting time for a teller to become free and S is the service time of the teller helping you. When you walk into the bank, the tellers are already busy. Let X_i be the remaining service time for teller i, i=1,2,3. By the memoryless property, X_i is exponential with the original mean \frac{1}{\lambda_i}. As a result, the rate parameter of X_i is \lambda_i.

The waiting time W is simply \text{min}(X_1,X_2,X_3). Thus E[W]=\frac{1}{\lambda_1+\lambda_2+\lambda_3}. To find E[S], we need to consider three cases, depending on which teller finishes serving the current customer first.

    \displaystyle \begin{aligned} E[S]&=E[S | X_1=\text{min}(X_1,X_2,X_3)] \times P[X_1=\text{min}(X_1,X_2,X_3)] \\& \ \ + E[S | X_2=\text{min}(X_1,X_2,X_3)] \times P[X_2=\text{min}(X_1,X_2,X_3)]  \\& \ \ + E[S | X_3=\text{min}(X_1,X_2,X_3)] \times P[X_3=\text{min}(X_1,X_2,X_3)]\end{aligned}

Finishing the calculation,

    \displaystyle \begin{aligned} E[S]&=\frac{1}{\lambda_1} \times \frac{\lambda_1}{\lambda_1+\lambda_2+\lambda_3}  + \frac{1}{\lambda_2} \times \frac{\lambda_2}{\lambda_1+\lambda_2+\lambda_3}  + \frac{1}{\lambda_3} \times \frac{\lambda_3}{\lambda_1+\lambda_2+\lambda_3}   \\&=\frac{3}{\lambda_1+\lambda_2+\lambda_3} \end{aligned}

    \displaystyle E[T]=\frac{1}{\lambda_1+\lambda_2+\lambda_3}+\frac{3}{\lambda_1+\lambda_2+\lambda_3}=\frac{4}{\lambda_1+\lambda_2+\lambda_3}

_______________________________________________________________________________________________
\copyright \ 2016 - \text{Dan Ma}

3 thoughts on “More topics on the exponential distribution

  1. Pingback: The hyperexponential and hypoexponential distributions | Topics in Actuarial Modeling

  2. Pingback: The Weibull distribution | Topics in Actuarial Modeling

  3. Pingback: The exponential distribution | Topics in Actuarial Modeling

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s