# More topics on the exponential distribution

This is a continuation of two previous posts on the exponential distribution (an introduction and a post on the connection with the Poisson process). This post presents more properties that are not discussed in the two previous posts. Of course, a serious and in-depth discussion of the exponential distribution can fill volumes. The goal here is quite modest – to present a few more properties related to the memoryless property of the exponential distribution.

_______________________________________________________________________________________________

The Failure Rate

A previous post discusses the Poisson process and its relation to the exponential distribution. Now we present another way of looking at both notions. Suppose a counting process counts the occurrences of a type of random events. Suppose that an event means a termination of a system, be it biological or manufactured. Furthermore suppose that the terminations occur according to a Poisson process at a constant rate $\lambda$ per unit time. Then what is the meaning of the rate $\lambda$? It is the rate of termination (dying). It is usually called the failure rate (or hazard rate or force of mortality). The meaning of the constant rate $\lambda$ is that the rate of dying is the same regardless of the location of the time scale (i.e. regardless how long a life has lived). This means that the lifetime (the time until death) of such a system has no memory. Since the exponential distribution is the only continuous distribution with the memoryless property, the time until the next termination inherent in the Poisson process in question must be an exponential random variable with rate $\lambda$ or mean $\frac{1}{\lambda}$. So the notion of failure rate function (or hazard rate function) runs through the notions of exponential and Poisson process and further illustrates the memoryless property.

Consider a continuous random variable $X$ that only takes on the positive real numbers. Suppose $F$ and $f$ are the CDF and density function of $X$, respectively. The survival function is $S=1-F$. The failure rate (or hazard rate) $\mu(t)$ is defined as:

$\displaystyle \mu(t)=\frac{f(t)}{1-F(t)}=\frac{f(t)}{S(t)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

The function $\mu(t)$ can be interpreted as the rate of failure at the next instant given that the life has survived to time $t$. Suppose that the lifetime distribution is exponential. Because of the memoryless proeprty, the remaining lifetime of a $t$-year old is the same as the lifetime distribution for a new item. It is then intuitively clear that the failure rate must be constant. Indeed, it is straightforward to show that if the lifetime $X$ is an exponential random variable with rate parameter $\lambda$, i.e. the density is $f(t)=\lambda e^{-\lambda t}$, then the failure rate is $\mu(t)=\lambda$. This is why the parameter $\lambda$ in the density function $f(t)=\lambda e^{-\lambda t}$ is called the rate parameter.

On the other hand, the failure rate or hazard rate is not constant for other lifetime distributions. However, the hazard rate function $\mu(t)$ uniquely determines the distributional quantities such as the CDF and the survival function. The definition $(1)$ shows that the failure rate is derived using the CDF or survival function. We show that the failure rate is enough information to derive the CDF or the survival function. From the definition, we have:

$\displaystyle \mu(t)=\frac{-\frac{d}{dt} S(t)}{S(t)} \ \ \ \text{or} \ \ \ -\mu(t)=\frac{\frac{d}{dt} S(t)}{S(t)}$

Integrate both sides produces the following:

$\displaystyle - \int_0^t \ \mu(x) \ dx=\int_0^t \ \frac{\frac{d}{dt} S(t)}{S(t)} \ dx=\text{ln} S(t)$

Exponentiate on each side produces the following:

$\displaystyle S(t)=e^{-\int_0^t \ \mu(x) \ dx} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

Once the survival function is obtained from $\mu(t)$, the CDF and the density function can be obtained. Interestingly, the derivation in $(2)$ can give another proof of the fact that the exponential distribution is the only one with the memoryless property. If $X$ is memoryless, then the failure rate must be constant. If $\mu(x)$ is constant, then by $(2)$, $S(t)$ is the exponential survival function. Of course, the other way is clear: if $X$ is exponential, then $X$ is memoryless.

The preceding discussion shows that having a constant failure rate is another way to characterize the exponential distribution, in particular the memoryless property of the exponential distribution. Before moving onto the next topics, another example of a failure rate function is $\mu(t)=\frac{\alpha}{\beta} (\frac{t}{\beta})^{\alpha-1}$ where both $\alpha$ and $\beta$ are positive constants. This is the Weibull hazard rate function. The derived survival function is $S(t)=e^{-(\frac{t}{\beta})^\alpha}$, which is the survival function for the Weibull distribution. This distribution is an excellent model choice for describing the life of manufactured objects. See here for an introduction to the Weibull distribution.

_______________________________________________________________________________________________

The Minimum Statistic

The following result is about the minimum of independent exponential distributions.

Suppose that $X_1,X_2,\cdots,X_n$ are independent exponential random variables with rates $\lambda_1,\lambda_2,\cdots,\lambda_n$, respectively. Then the minimum of the sample, denoted by $Y=\text{min}(X_1,X_2,\cdots,X_n)$, is also an exponential random variable with rate $\lambda_1+\lambda_2 +\cdots+\lambda_n. \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)$

For the minimum to be $> y$, all sample items must be $> y$, Thus $P[Y> y]$ is:

\displaystyle \begin{aligned} P[Y> y]&=P[X_1> y] \times P[X_2> y] \times \cdots \times P[X_n> y] \\&=e^{-\lambda_1 y} \times e^{-\lambda_2 y} \times \cdots \times e^{-\lambda_n y}\\&=e^{-(\lambda_1+\lambda_2+\cdots+\lambda_n) y} \end{aligned}

This means that $Y=\text{min}(X_1,X_2,\cdots,X_n)$ has an exponential distribution with rate $\lambda_1+\lambda_2+\cdots+\lambda_n$. As a result, the smallest item of a sample of independent exponential observations also follows an exponential distribution with rate being the sum of all the individual exponential rates.

Example 1
Suppose that $X_1,X_2,X_3$ are independent exponential random variables with rates $\lambda_i$ for $i=1,2,3$. Calculate $E[\text{min}(X_1,X_2,X_3)]$ and $E[\text{max}(X_1,X_2,X_3)]$.

Let $X=\text{min}(X_1,X_2,X_3)$ and $Y=\text{max}(X_1,X_2,X_3)$. Then $X$ is an exponential distribution with rate $\lambda_1+\lambda_2+\lambda_3$. As a result, $E[X]=\frac{1}{\lambda_1+\lambda_2+\lambda_3}$. Finding the expected value of the maximum requires calculation. First calculate $P[Y \le y]$.

\displaystyle \begin{aligned} P[Y \le y]&=P[X_1 \le y] \times P[X_2 \le y] \times P[X_3 \le y] \\&=(1-e^{-\lambda_1 y}) \times (1-e^{-\lambda_2 y}) \times (1-e^{-\lambda_3 y}) \\&=1-e^{-\lambda_1 y}-e^{-\lambda_2 y}-e^{-\lambda_3 y}+e^{-(\lambda_1+\lambda_2) y}+e^{-(\lambda_1+\lambda_3) y} \\& \ \ \ +e^{-(\lambda_2+\lambda_3) y}-e^{-(\lambda_1+\lambda_2+\cdots+\lambda_n) y} \end{aligned}

Differentiate $F_Y(y)=P[Y \le y]$ to obtain the density function $f_Y(y)$.

\displaystyle \begin{aligned} f_Y(y)&=\lambda_1 e^{-\lambda_1 y}+\lambda_2 e^{-\lambda_2 y}+ \lambda_3 e^{-\lambda_3 y} \\& \ \ \ -(\lambda_1+\lambda_2) e^{-(\lambda_1+\lambda_2) y}-(\lambda_1+\lambda_3) e^{-(\lambda_1+\lambda_3) y}-(\lambda_2+\lambda_3)e^{-(\lambda_2+\lambda_3) y} \\& \ \ \ \ +(\lambda_1+\lambda_2+\cdots+\lambda_n)e^{-(\lambda_1+\lambda_2+\cdots+\lambda_n) y} \end{aligned}

Each term in the density function $f_Y(y)$ is by itself an exponential density. Thus the mean of the maximum is:

$\displaystyle E[Y]=\frac{1}{\lambda_1}+\frac{1}{\lambda_2}+ \frac{1}{\lambda_3} -\frac{1}{\lambda_1+\lambda_2}-\frac{1}{\lambda_1+\lambda_3}-\frac{1}{\lambda_2+\lambda_3}+\frac{1}{\lambda_1+\lambda_2+\lambda_3}$

To make sense of the numbers, let $\lambda_1=\lambda_2=\lambda_3=1$. Then $E[X]=\frac{1}{3}=\frac{2}{6}$ and $E[Y]=\frac{11}{6}$. In this case, the expected value of the maximum is 5.5 times larger than the expected value of the minimum. For $\lambda_1=1$, $\lambda_2=2$ and $\lambda_3=3$, $E[X]=\frac{1}{6}$ and $E[Y]=\frac{73}{60}=\frac{7.3}{6}$. In the second case, the expected value of the maximum is 7.3 times larger than the expected value of the minimum. $\square$

_______________________________________________________________________________________________

Ranking Independent Exponential Distributions

In this section, $X_1,X_2,\cdots,X_n$ are independent exponential random variables where the rate of $X_i$ is $\lambda_i$ for $i=1,2,\cdots,n$. What is the probability of $P[X_{j(1)} ? Here the subscripts $j(1),\cdots,j(k)$ are distinct integers from $\left\{1,2,\cdots,n \right\}$. For example, for sample size of 2, what are the probabilities $P[X_1 and $P[X_2? For sample size of 3, what are $P[X_1 and $P[X_2? First, the case of ranking two independent exponential random variables.

Ranking $X_1$ and $X_2$.

$\displaystyle P[X_1

where $\lambda_1$ is the rate of $X_1$ and $\lambda_2$ is the rate of $X_2$. Note that this probability is the ratio of the rate of the smaller exponential random variable over the total rate. The probability $P[X_1 can be computed by evaluating the following integral:

$\displaystyle P[X_1

The natural next step is to rank three or more exponential random variables. Ranking three variables would require a triple integral and ranking more variables would require a larger multiple integral. Instead, we rely on a useful fact about the minimum statistic. First, another basic result.

When one of the variables is the minimum:

$\displaystyle P[X_i=\text{min}(X_1,\cdots,X_n)]=\frac{\lambda_i}{\lambda_1+\lambda_2+\cdots+\lambda_n} \ \ \ \ \ \ \ \ \ \ \ \ \ (5)$

The above says that the probability that the $i$th random variable is the smallest is simply the ratio of the rate of the $i$th variable over the total rate. This follows from $(4)$ since we are ranking the two exponential variables $X_i$ and $\text{min}(X_1,\cdots,X_{i-1},X_{i+1},\cdots,X_n)$.

We now consider the following theorem.

Theorem 1
Let $X_1,X_2,\cdots,X_n$ be independent exponential random variables. Then the minimum statistic $\text{min}(X_1,X_2,\cdots,X_n)$ and the rank ordering of $X_i$ are independent.

The theorem basically says that the probability of a ranking is not dependent on the location of the minimum statistic. For example, if we know that the minimum is more than 3, what is the probability of $X_1? The theorem is saying that conditioning on $\text{min}(X_1,X_2,X_3)>3$ makes no difference on the probability of the ranking. Let $Y=\text{min}(X_1,X_2,\cdots,X_n)$. The following establishes the theorem.

$\displaystyle P[X_{j(1)}t]$

$\displaystyle =P[X_{j(1)}-tt]$

$\displaystyle =P[X_{j(1)}

The key to the proof is the step to go from the second line to the third line. Assume that each $X_i$ is the lifetime of a machine. When $\text{min}(X_1,X_2,\cdots,X_n)>t$, all the lifetimes $X_i>t$. By the memoryless property, the remaining lifetimes $X_i-t$ are independent and exponential with the original rates. In other words, each $X_i-t$ has the same exponential distribution as $X_i$. Consequently, the second line equals to the third line. To make it easier to see the step from the second line to the third line, think of the two dimensional case with $X_1$ and $X_2$. $\square$

The following is a consequence of Theorem 1.

Corollary 2
Let $X_1,X_2,\cdots,X_n$ be independent exponential random variables. Then the event $X_i=\text{min}(X_1,X_2,\cdots,X_n)$ and the rank ordering of $X_1,\cdots,X_{i-1},X_{i+1}\cdots,X_n$ are independent.

Another way to state the corollary is that the knowledge that $X_i$ is the smallest in the sample has no effect on the ranking of the variables other than $X_i$. This is the consequence of Theorem 1. To see why, let $t=X_i=\text{min}(X_1,X_2,\cdots,X_n)$.

$\displaystyle P[X_{j(1)}

$\displaystyle =P[X_{j(1)}t]$

$\displaystyle =P[X_{j(1)}

We now present examples demonstrating how these ideas are used.

Example 2
Suppose that a bank has three tellers for serving its customers. The random variable $X_1,X_2,X_3$ are independent exponential random variables where $X_i$ is the time spent by teller $i$ serving a customer. The rate parameter of $X_i$ is $\lambda_i$ where $i=1,2,3$. If all three tellers are busy serving customers, what is $P[X_1? If the bank has 4 tellers instead, then what is $P[X_1?

The answer is given by the following:

\displaystyle \begin{aligned} P[X_1

The derivation uses Corollary 2 and the idea in $(5)$. The same idea can be used for $P[X_1.

\displaystyle \begin{aligned} P[X_1

The above applies the independence result twice, the first time on $X_1,X_2,X_3,X_4$, the second time on $X_2,X_3,X_4$. This approach is much preferred over direction calculation, which would involve integral calculation that is tedious and error prone. $\square$

Example 3
As in Example 2, a bank has three tellers for serving its customers. The service times of the three tellers are independent exponential random variables. The mean service time for teller $i$ is $\frac{1}{\lambda_i}$ minutes where $i=1,2,3$. You walk into the bank and find that all three tellers are busy serving customers. You are the only customer waiting for an available teller. Calculate the expected amount of time you spend at the bank.

Let $T$ be the total time you spend in the bank, which is $T=W+S$ where $W$ is the waiting time for a teller to become free and $S$ is the service time of the teller helping you. When you walk into the bank, the tellers are already busy. Let $X_i$ be the remaining service time for teller $i$, $i=1,2,3$. By the memoryless property, $X_i$ is exponential with the original mean $\frac{1}{\lambda_i}$. As a result, the rate parameter of $X_i$ is $\lambda_i$.

The waiting time $W$ is simply $\text{min}(X_1,X_2,X_3)$. Thus $E[W]=\frac{1}{\lambda_1+\lambda_2+\lambda_3}$. To find $E[S]$, we need to consider three cases, depending on which teller finishes serving the current customer first.

\displaystyle \begin{aligned} E[S]&=E[S | X_1=\text{min}(X_1,X_2,X_3)] \times P[X_1=\text{min}(X_1,X_2,X_3)] \\& \ \ + E[S | X_2=\text{min}(X_1,X_2,X_3)] \times P[X_2=\text{min}(X_1,X_2,X_3)] \\& \ \ + E[S | X_3=\text{min}(X_1,X_2,X_3)] \times P[X_3=\text{min}(X_1,X_2,X_3)]\end{aligned}

Finishing the calculation,

\displaystyle \begin{aligned} E[S]&=\frac{1}{\lambda_1} \times \frac{\lambda_1}{\lambda_1+\lambda_2+\lambda_3} + \frac{1}{\lambda_2} \times \frac{\lambda_2}{\lambda_1+\lambda_2+\lambda_3} + \frac{1}{\lambda_3} \times \frac{\lambda_3}{\lambda_1+\lambda_2+\lambda_3} \\&=\frac{3}{\lambda_1+\lambda_2+\lambda_3} \end{aligned}

$\displaystyle E[T]=\frac{1}{\lambda_1+\lambda_2+\lambda_3}+\frac{3}{\lambda_1+\lambda_2+\lambda_3}=\frac{4}{\lambda_1+\lambda_2+\lambda_3}$

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$

# The exponential distribution and the Poisson process

This post is a continuation of the previous post on the exponential distribution. The previous post discusses the basic mathematical properties of the exponential distribution including the memoryless property. This post looks at the exponential distribution from another angle by focusing on the intimate relation with the Poisson process.

_______________________________________________________________________________________________

The Poisson Process

A previous post shows that a sub family of the gamma distribution that includes the exponential distribution is derived from a Poisson process. This post gives another discussion on the Poisson process to draw out the intimate connection between the exponential distribution and the Poisson process.

Suppose a type of random events occur at the rate of $\lambda$ events in a time interval of length 1. As the random events occur, we wish to count the occurrences. Starting at time 0, let $N(t)$ be the number of events that occur by time $t$. A counting process is the collection of all the random variables $N(t)$. That is, we are interested in the collection $\left\{N(t): t \ge 0 \right\}$. More specifically, we are interested in a counting process that satisfies the following three axioms:

• The probability of having more than one occurrence in a short time interval is essentially zero.
• The probability of the occurrence of a random event in a short time interval is proportional to the length of the time interval and not on where the time interval is located. More specifically, the probability of the occurrence of the random event in a short interval of length $h$ is approximately $\lambda \ h$.
• The numbers of random events occurring in non-overlapping time intervals are independent.

Any counting process that satisfies the above three axioms is called a Poisson process with the rate parameter $\lambda$. Of special interest are the counting random variables $N(t)$, which is the number of random events that occur in the interval $[0,t]$ and $N(s+t)-N(t)$, which is the number of events that occur in the interval $[s,s+t]$. There are also continuous variables that are of interest. For example, the time until the occurrence of the first event, denoted by $S_1$, and in general, the time until the occurrence of the $n$th event, denoted by $S_n$. We are also interested in the interarrival times, i.e. the time between the occurrences of two consecutive events. These are notated by $T_1, T_2, T_3,\cdots$ where $T_n$ is the time between the occurrence of the $(n-1)$st event and the occurrence of the $n$th event. Of course, $S_n=T_1+T_2+\cdots+T_n$.

_______________________________________________________________________________________________

Some Basic Results

Given a Poisson process $\left\{N(t): t \ge 0 \right\}$ with rate parameter $\lambda>0$, we discuss the following basic results:

Counting Variables

• The discrete random variable $N(t)$ is a Poisson random variable with mean $\lambda \ t$.
• The discrete random variable $N(s+t)-N(s)$ is a Poisson random variable with mean $\lambda \ t$.

Waiting Time Variables

• The interarrival times, $T_n$, where $n=1,2,3,\cdots$, are independent and identically distributed exponential random variables with rate parameter $\lambda$ (or mean $\frac{1}{\lambda}$).
• The waiting time until the $n$th event has a gamma distribution with shape parameter $n$ and rate parameter $\lambda$.

The result that $N(t)$ is a Poisson random variable is a consequence of the fact that the Poisson distribution is the limit of the binomial distribution. To see this, let’s say we have a Poisson process with rate $\lambda>0$. Note thar $\lambda$ is the rate of occurrence of the event per unit time interval. Then subdivide the interval $[0,t]$ into $n$ subintervals of equal length. When $n$ is sufficiently large, we can assume that there can be only at most one event occurring in a subinterval (using the first two axioms in the Poisson process). Each subinterval is then like a Bermoulli trial (either 0 events or 1 event occurring in the subinterval). The probability of having exactly one event occurring in a subinterval is approximately $\lambda (\frac{t}{n})=\frac{\lambda t}{n}$. By the third criterion in the Poisson process, the $n$ subintervals are independent Bernoulli trials. Thus the total number of events occurring in these $n$ subintervals is a Binomial random variable with $n$ trials and with probability of success in each trial being $\frac{\lambda t}{n}$. It can be shown mathematically that when $n \rightarrow \infty$, the binomial distributions converge to the Poisson distribution with mean $\lambda \ t$. This fact is shown here and here. It follows that $N(t)$ has a Poisson distribution with mean $\lambda \ t$.

To show that the increment $N(s+t)-N(s)$ is a Poisson distribution, we simply count the events in the Poisson process starting at time $s$. It is clear that the resulting counting process is also a Poisson process with rate $\lambda$. We can use the same subdivision argument to derive the fact that $N(s+t)-N(s)$ is a Poisson random variable with mean $\lambda \ t$. The subdividing is of course on the interval $[s,s+t]$.

Based on the preceding discussion, given a Poisson process with rate parameter $\lambda$, the number of occurrences of the random events in any interval of length $t$ has a Poisson distribution with mean $\lambda \ t$. Thus in a Poisson process, the number of events that occur in any interval of the same length has the same distribution. Any counting process that satisfies this property is said to possess stationary increments. On the other hand, any counting process that satisfies the third criteria in the Poisson process (the numbers of occurrences of events in disjoint intervals are independent) is said to have independent increments. Thus a Poisson process possesses independent increments and stationary increments.

Here is an interesting observation as a result of the possession of independent increments and stationary increments in a Poisson process. By independent increments, the process from any point forward is independent of what had previously occurred. By stationary increments, from any point forward, the occurrences of events follow the same distribution as in the previous phase. Starting with a Poisson process, if we count the events from some point forward (calling the new point as time zero), the resulting counting process is probabilistically the same as the original process. The probabilistic behavior of the new process from some point on is not dependent on history. The probability statements we can make about the new process from some point on can be made using the same parameter as the original process. In other words, a Poisson process has no memory.

We now discuss the continuous random variables derived from a Poisson process. The time until the first change, $T_1$, has an exponential distribution with mean $\frac{1}{\lambda}$. To see this, for $T_1>t$ to happen, there must be no events occurring in the interval $[0,t]$. Thus, $P[T_1>t]$ is identical to $P[T_1>t]=P[N(t)=0]=e^{-\lambda t}$. This means that $T_1$ has an exponential distribution with rate $\lambda$. What about $T_2$ and $T_3$ and so on? After the first event had occurred, we can reset the counting process to count the events starting at time $T_1$. Then the time until the next occurrence is also an exponential random variable with rate $\lambda$. Consequently, all the interarrival times $T_n$ are exponential random variables with the same rate $\lambda$. Furthermore, by the discussion in the preceding paragraph, the exponential interarrival times $T_1, T_2, T_3,\cdots$ are independent.

As a consequence of the $T_n$ being independent exponential random variables, the waiting time until the $n$th change $S_n$ is a gamma random variable with shape parameter $n$ and rate parameter $\lambda$. Note that $S_n=T_1+\cdots+T_n$ and that independent sum of $n$ identical exponential distribution has a gamma distribution with parameters $n$ and $\lambda$, which is the identical exponential rate parameter.

The connection between exponential/gamma and the Poisson process provides an expression of the CDF and survival function for the gamma distribution when the shape parameter is an integer. Specifically, the following shows the survival function and CDF of the waiting time $S_n$ as well as the density.

$\displaystyle P(S_n>t)=\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}$

$\displaystyle P(S_n \le t)=1-\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}=\sum \limits_{k=n}^{\infty} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}$

$\displaystyle f_{S_n}(t)=\frac{1}{(n-1)!} \ \lambda^n \ t^{n-1} \ e^{-\lambda t}$

The key in establishing the survival $\displaystyle P(S_n>t)$ is that the waiting time $S_n$ is intimately related to $N(t)$, which has a Poisson distribution with mean $\lambda t$. For $S_n>t$ to happen, there can be at most $n$ events occurring prior to time $t$, i.e. $N(t) \le t$.

_______________________________________________________________________________________________

Reversing the Poisson Process

The preceding discussion shows that a Poisson process has independent exponential waiting times between any two consecutive events and gamma waiting time between any two events. Starting with a collection $\left\{N(t): t \ge 0 \right\}$ of Poisson counting random variables that satisfies the three axioms described above, it can be shown that the sequence of interarrival times are independent exponential random variables with the same rate parameter as in the given Poisson process. Interestingly, the process can also be reversed, i.e. given a sequence of independent and identically distributed exponential distributions, each with rate $\lambda$, a Poisson process can be generated.

To see this, let $T_1,T_2,T_3,\cdots$ be a sequence of independent and identically distributed exponential random variables with rate parameter $\lambda$. Mathematically, the $T_n$ are just independent and identically distributed exponential random variables. Now think of them as the interarrival times between consecutive events. So the first event occurs at time $S_1=T_1$ and the second event occurs at time $S_2=T_1+T_2$ and so on. In general, the $n$th event occurs at time $S_n=T_1+\cdots+T_n$. More specifically, the counting process is $\left\{N(t): t \ge 0 \right\}$ where $N(t)$ is defined below:

$N(t)=\text{max} \left\{n: S_n=T_1+\cdots+T_n \le t \right\}$

For $N(t)=n$ to happen, it must be true that $S_n=T_1+\cdots+T_n \le t$ and $S_{n+1}=T_1+\cdots+T_n+T_{n+1} > t$. The probability $P[N(t)=n]$ is then

\displaystyle \begin{aligned}P[N(t)=n]&=P[S_{n+1} > t \text{ and } S_n \le t]\\&=P[S_{n+1} > t \text{ and } (\text{ not } S_n > t)] \\&=P[S_{n+1} > t]-P[S_n > t] \\&=\sum \limits_{k=0}^{n} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}-\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!} \\&=\frac{e^{-\lambda t} \ (\lambda t)^n}{n!} \end{aligned}

The above derivation shows that the counting variable $N(t)$ is a Poisson random variable with mean $\lambda t$. The derivation uses the gamma survival function derived earlier. On the other hand, because of the memoryless property, $T_1-s,T_2-s,T_3-s,\cdots$ are also independent exponential random variables with the same rate $\lambda$. If the counting of events starts at a time $s$ rather than at time 0, the counting would be based on $T_k-s,T_{k+1}-s,T_{k+3}-s,\cdots$ for some $k$. By the same argument, $N(s+t)-N(s)$ would be Poisson with mean $\lambda t$.

We have just established that the resulting counting process from independent exponential interarrival times $T_n$ has stationary increments. The resulting counting process has independent increments too. This is because the interarrival times $T_n$ are independent and that the interarrival times $T_n$ are also memoryless.

From a mathematical point of view, a sequence of independent and identically distributed exponential random variables leads to a Poisson counting process. On the other hand, if random events occur in such a manner that the times between two consecutive events are independent and identically and exponentially distributed, then such a process is a Poisson process. This characterization gives another way work with Poisson processes.

_______________________________________________________________________________________________

Examples

Example 1
Suppose that the time until the next departure of a bus at a certain bus station is exponentially distributed with mean 10 minutes. Assume that the times in between consecutive departures at this bus station are independent.

• Tom and his friend Mike are to take a bus trip together. Tom arrives at the bus station at 12:00 PM and is the first one to arrive. Mike arrives at the bus stop at 12:30 PM. They will board the first bus to depart after the arrival of Mike. What is the the probability that zero buses depart from this bus station while Tom is waiting for Mike?
• Answer the same question for one bus, and two buses?
• What is the probability that there are at least three buses leaving the station while Tom is waiting?

Because the inter-departure times are independent and exponential with the same mean, the random events (bus departures) occur according to a Poisson process with rate $\frac{1}{10}$ per minute, or 1 bus per 10 minutes. The number of bus departures in a 30-minute period is a Poisson random variable with mean 3 (per 30 minutes). Let $N$ be the number of buses leaving the bus station between 12:00 PM and 12:30 PM. Thus the answers are:

$P[N=0]=e^{-3}=0.05$

$P[N=1]=e^{-3} \times 3=0.149$

$P[N=2]=e^{-3} \times \frac{3^2}{2}=0.224$

$P[N \ge 3]=1-P[n=0]-P[n=1]-P[n=2]=0.5768$ $\square$

Example 2
Taxi arrives at a certain street corner according to a Poisson process at the rate of two taxi for every 15 minutes. Suppose that you are waiting for a taxi at this street corner and you are third in line. Assume that the people waiting for taxi do not know each other and each one will have his own taxi. What is the probability that you will board a taxi within 30 minutes?

The number of arrivals of taxi in a 30-minute period has a Poisson distribution with a mean of 4 (per 30 minutes). If there are at least 3 taxi arriving, then you are fine. Let $N$ be the number of arrivals of taxi in a 30-minute period. The answer is

\displaystyle \begin{aligned}P[N \ge 3]&=1-P[N=0] -P[N=1]-P[N=2]\\&=1-e^{-4}-e^{-4} \times 4-e^{-4} \times \frac{4^2}{2} \\&=1-0.2381 \\&=0.7619 \end{aligned}

_______________________________________________________________________________________________

Remarks

In this post, we present a view of the exponential distribution through the view point of the Poisson process. Any counting process that satisfies the three axioms of a Poisson process has independent and exponential waiting time between any two consecutive events and gamma waiting time between any two events. On the other hand, given a sequence of independent and identically distributed exponential interarrival times, a Poisson process can be derived. The memoryless property of the exponential distribution plays a central role in the interplay between Poisson and exponential.

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$

# The exponential distribution

This post focuses on the mathematical properties of the exponential distribution. Since the exponential distribution is a special case of the gamma distribution, the starting point of the discussion is on the properties that are inherited from the gamma distribution. The discussion then switches to other intrinsic properties of the exponential distribution, e.g. the memoryless property. The next post discusses the intimate relation with the Poisson process. Additional topics on exponential distribution are discussed in this post.

The exponential distribution is highly mathematically tractable. For the exponential distribution with mean $\frac{1}{\beta}$ (or rate parameter $\beta$), the density function is $f(x)=\frac{1}{\beta} \ e^{-\frac{x}{\beta}}$. Thus for the exponential distribution, many distributional items have expression in closed form. The exponential distribution can certainly be introduced by performing calculation using the density function. For the sake of completeness, the distribution is introduced as a special case of the gamma distribution.

_______________________________________________________________________________________________

The Gamma Perspective

The exponential distribution is a special case of the gamma distribution. Recall that the gamma distribution has two parameters, the shape parameter $\alpha$ and the rate parameter $\beta$. Another parametrization uses the scale parameter $\theta$, which is $\theta=1 / \beta$. For now we stick with the rate parameter $\beta$ because of the connection with the Poisson process discussed below. The exponential distribution is simply a gamma distribution when $\alpha=1$. Immediately the properties of the gamma distribution discussed in the previous post can be transferred here. Suppose that $X$ is a random variable that follows the exponential distribution with rate parameter $\beta$, equivalently with mean $1 / \beta$. Immediately we know the following.

\displaystyle \begin{array}{lllll} \text{ } &\text{ } & \text{Definition} & \text{ } & \text{Exponential Distribution} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{PDF} &\text{ } & \text{ } & \text{ } & \displaystyle \begin{array}{ll} \displaystyle f_X(y)=\beta \ e^{-\beta x} & \ \ \displaystyle x>0 \\ \text{ } & \text{ } \end{array} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{CDF} &\text{ } & \displaystyle F_X(x)=\int_0^x \ f_X(t) \ dt & \text{ } & \displaystyle \begin{aligned} F_X(x)&=1-e^{-\beta \ x} \end{aligned} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Survival Function} &\text{ } & \displaystyle S_X(x)=\int_x^\infty \ f_X(t) \ dt & \text{ } & \displaystyle \begin{aligned} S_X(x)&=e^{-\beta \ x} \end{aligned} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Mean} &\text{ } & \displaystyle E(X)=\int_0^\infty x \ f_X(t) \ dt & \text{ } & \displaystyle \frac{1}{\beta} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Higher Moments} &\text{ } & \displaystyle E(X^k)=\int_0^\infty x^k \ f_X(t) \ dt & \text{ } & \displaystyle \left\{ \begin{array}{ll} \displaystyle \frac{\Gamma(1+k)}{\beta^k} &\ k>-1 \\ \text{ } & \text{ } \\ \displaystyle \frac{k!}{\beta^k} &\ k \text{ is a positive integer} \end{array} \right. \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Variance} &\text{ } & E(X^2)-E(X)^2 & \text{ } & \displaystyle \frac{1}{\beta^2} \\ \text{ } &\text{ } & \text{ } & \text{ } & \text{ } \\ \end{array}

$\displaystyle \begin{array}{lllll} \text{ } &\text{ } & \text{ } & \text{ } & \text{ } \\ \text{Mode} \ \ &\text{ } & \text{ } & \text{ } & \text{always } 0 \\ \text{ } &\text{ } & \text{ } & \text{ } & \text{ } \\ \text{MGF} \ \ &\text{ } & M_X(t)=E[e^{tX}] \ \ \ \ \ \ \ \ & \text{ } & \displaystyle \begin{array}{ll} \displaystyle\frac{\beta}{\beta- t} & \ \ \ \ \ \displaystyle t<\beta \end{array} \\ \text{ } &\text{ } & \text{ } & \text{ } & \text{ } \\ \text{CV} \ \ &\text{ } & \displaystyle \frac{\sqrt{Var(X)}}{E(X)}=\frac{\sigma}{\mu} \ \ \ \ \ \ \ \ & \text{ } & 1 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Skewness} \ \ &\text{ } & \displaystyle E\biggl[\biggl(\frac{X-\mu}{\sigma}\biggr)^3\biggr] \ \ \ \ \ \ \ \ & \text{ } & 2 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Kurtosis} \ \ &\text{ } & \displaystyle E\biggl[\biggl(\frac{X-\mu}{\sigma}\biggr)^4\biggr] \ \ \ \ \ \ \ \ & \text{ } & 9 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Excess Kurtosis} \ \ &\text{ } & \displaystyle E\biggl[\biggl(\frac{X-\mu}{\sigma}\biggr)^4\biggr]-3 \ \ \ \ \ \ \ \ & \text{ } & 6 \end{array}$

The above items are obtained by plugging $\alpha=1$ into the results in the post on gamma distribution. It is clear that exponential distribution is very mathematically tractable. The CDF has a closed form. The moments have a closed form. As a result, it is possible to derive many more properties than the ones shown here.

_______________________________________________________________________________________________

The Memoryless Property

No discussion on the exponential distribution is complete without the mentioning of the memoryless property. Suppose a random variable $X$ has support on the interval $(0, \infty)$. The random variable $X$ is said to have the memoryless property (or the “no memory” property) if

$\displaystyle P(X > u+t \ | \ X > t)=P(X > u) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

holds for all positive real numbers $t$ and $u$. To get an appreciation of this property, let’s think of $X$ as the lifetime (in years) of some machine or some device. The statement in $(1)$ says that if the device has lived at least $t$ years, then the probability that the device will live at least $u$ more years is the same as the probability that a brand new device lives to age $u$ years. It is as if the device does not remember that it has already been in use for $t$ years. If the lifetime of a device satisfies this property, it does not matter if you buy an old one or a new one. Both old and new have the same probability of living an additional $u$ years. In other words, old device is as good as new.

Since the following is true,

\displaystyle \begin{aligned} P(X > u+t \ | \ X > t)&=\frac{P(X > u+t \text{ and } X > t)}{P(X > t)} \\&=\frac{P(X > u+t)}{P(X > t)} \end{aligned}

the memoryless property $(1)$ is equivalent to the following:

$\displaystyle P(X > u+t )=P(X > u) \times P(X > t) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

The property $(2)$ says that the survival function of this distribution is a multiplicative function. The exponential distribution satisfies this property, i.e.

$\displaystyle P(X > u+t )=e^{-\beta (u+t)}=e^{-\beta u} \ e^{-\beta t}=P(X > u) \times P(X > t)$

On the other hand, any continuous function that satisfies the multiplicative property $(2)$ must be an exponential function (see the argument at the end of the post). Thus there is only one continuous probability distribution that possesses the memoryless property. The memoryless property can also be stated in a different but equivalent way as follows:

The conditional random variable $X-t \ | \ X > t$ is distributed identically as the unconditional random variable $X. \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)$

The statement $(1)$ can be rewritten $\displaystyle P(X-t > u \ | \ X > t)=P(X > u)$. This means the conditional variable $X-t \ | \ X > t$ shares the same survival function as the original random variable $X$, hence sharing the same cumulative distribution and the same density function and so on.

The memoryless property, anyone of the three statements, is a striking property. Once again, thinking of $X$ as the lifetime of a system or a device. The conditional random variable in statement $(3)$ is the remaining lifetime for a device at age $t$. Statement $(3)$ says that it does not matter how old the device is, the remaining useful life is still governed by the same probability distribution for a brand new device, i.e. it is just as likely for an old device to live $u$ more years as for a new device, and that an old device is expected to live as long as a new device.

_______________________________________________________________________________________________

Examples

Let’s look a few examples.

Example 1
Suppose that a bank has two tellers serving retailed customers that walk into the bank lobby. A queue is set up for customers to wait for one of the tellers. The time between the arrivals of two consecutive customers in the queue has an exponential distribution with mean 3 minutes. The time each teller spent with a customer has an exponential distribution with mean 5 minutes. Assume that the service times for the two tellers are independent. At 12:00 PM, both tellers are busy and a customer has just arrived in the queue.

1. What is the probability that the next arrival in the queue will come before 12:05 PM? Between 12:05 and 12:10? After 12:10 PM?
2. If no additional customers arrive before 12:05 PM, what is the probability that the next arrival will come within the next 5 minutes?
3. If both tellers are busy serving customers at 1:00 PM, what is the probability that neither teller will finish the service before 1:05 PM? Before 1:10 PM?

Let $X$ be the waiting time between two consecutive customers and $Y$ be the service time of a bank teller, both in minutes. The answers for Part 1 are

$P[X \le 5]=1-e^{-\frac{5}{3}}=0.81$

$P[5 < X \le 10]=e^{-\frac{5}{3}}-e^{-\frac{10}{3}}=0.1532$

$P[X > 10]=e^{-\frac{10}{3}}=0.0357$

Part 2 involves the memoryless property. The answer is:

\displaystyle \begin{aligned} P[X \le 10 |X > 5]&=1-P[X > 10 |X > 5] \\&=1-P[X > 5] \\&=P[X \le 5] \\&=1-e^{-\frac{5}{3}}=0.81 \end{aligned}

Part 3 also involves the memoryless property. It does not matter how long each server has spent with the customer prior to 1:00 PM, the remaining service time after 1:00 PM still has the same exponential service time distribution. The answers are:

\displaystyle \begin{aligned} P[\text{both service times more than 5 min}]&=P[Y > 5] \times P[Y > 5] \\&=e^{-\frac{5}{5}} \times e^{-\frac{5}{5}} \\&=e^{-2} \\&=0.1353 \end{aligned}

\displaystyle \begin{aligned} P[\text{both service times more than 10 min}]&=P[Y > 10] \times P[Y > 10] \\&=e^{-\frac{10}{5}} \times e^{-\frac{10}{5}} \\&=e^{-4} \\&=0.0183 \end{aligned}

Example 2
Suppose that times between fatal auto accidents on a stretch of busy highway have an exponential distribution with a mean of 20 days. Suppose that an accident occurred on July 1. What is the probability that another fatal accident occurred in the same month? If the month of July were accident-free in this stretch of highway except for the accident on July 1, what is the probability that there will be another fatal accident in the following month (August)?

Let $X$ be the time in days from July 1 to the next fatal accident on this stretch of highway. Then $X$ is exponentially distributed with a mean of 20 days. The probability that another fatal accident will occur in the month of July is $P[X \le 31]$, which is

$P[X \le 31]=1-e^{-\frac{31}{20}}=1-e^{-1.55}=0.7878$.

Note that the month of July has 31 days and the month of August has 31 days. If the month of July is accident-free except for the accident on July 1, then the probability that an accident occurs in August is:

\displaystyle \begin{aligned} P[X \le 62 |X > 31]&=1-P[X > 62 |X > 31] \\&=1-P[X > 31] \\&=P[X \le 31] \\&=1-e^{-\frac{31}{20}}=1-e^{-1.55}=0.7878 \end{aligned}

In setting $P[X > 62 |X > 31]=P[X > 31]$, the memoryless property is used in the above derivation . If the occurrence of fatal accidents is a random event and furthermore, if the time between two successive accidents is exponentially distributed, then there is no “memory” in the waiting of the next fatal accident. Having a full month of no accidents has no bearing on when the next fatal accident occurs. $\square$

Example 3
Suppose that the amount of damage in an automobile accident follows an exponential distribution with mean 2000. An insurance coverage is available to cover such damages subject to a deductible of 1000. That is, if the damage amount is less than the deductible, the insurance pays nothing. If the damage amount is greater than the deductible, the policy pays the damage amount in excess of the deductible. Determine the mean, variance and standard deviation of the insurance payment per accident.

For clarity, the example is first discussed using $\beta$ and $d$, where $\frac{1}{\beta}$ is the mean of the exponential damage and $d$ is the deductible. Let $X$ be the amount of the damage of an auto accident. Let $Y$ be the amount paid by the insurance policy per accident. Then the following is the rule for determining the amount of payment.

$\displaystyle Y = \left\{ \begin{array}{ll} \displaystyle 0 & \ \ \ \ X \le d \\ \text{ } & \text{ } \\ \displaystyle X-d &\ \ \ \ d < X \end{array} \right.$

With this payment rule, $E[Y]$, $Var[Y]$ and $\sigma_Y$ can be worked out based on the exponential random variable $X$ for the damage amount as follows:

$\displaystyle E[Y]=\int_{d}^\infty (x-d) \ \beta \ e^{-\beta x} \ dx$

$\displaystyle E[Y^2]=\int_{d}^\infty (x-d)^2 \ \beta \ e^{-\beta x} \ dx$

$Var[Y]=E[Y^2]-E[Y]^2$

$\sigma_Y=\sqrt{Var[Y]}$

Because the exponential distribution is mathematically very tractable, the mean $E[Y]$ and the variance $Var[Y]$ are very doable. Indeed the above integrals are excellent exercise for working with exponential distribution. We would like to demonstrate a different approach. Because of the memoryless property, there is no need to calculate the above integrals.

The insurance payment $Y$ is a mixture. Specifically it can be one of two possibilities. With probability $P[X \le d]=1-e^{-\beta d}$, $Y=0$. With probability $P[X > d]=e^{-\beta d}$, $Y=X-d |X > d$. Because $X$ is an exponential random variable, $Y=X-d |X > d$ is distributed identically as the original damage amount $X$. Thus the mean of $Y$, $E[Y]$, is the weighted average of $E[Y|X \le d]$ and $E[Y|X > d]$. Likewise $E[Y^2]$ is also a weighted average. The following shows how to calculate the first two moments of $Y$.

\displaystyle \begin{aligned} E[Y]&=0 \times P[X \le d]+E[X-d |X > d] \times P[X > d] \\&=0 \times (1-e^{-\beta d})+E[X] \times e^{-\beta d} \\&=0 \times (1-e^{-\beta d})+\frac{1}{\beta} \times e^{-\beta d} \\&=\frac{1}{\beta} \times e^{-\beta d} \end{aligned}

\displaystyle \begin{aligned} E[Y^2]&=0 \times P[X \le d]+E[(X-d)^2 |X > d] \times P[X > d] \\&=0 \times (1-e^{-\beta d})+E[X^2] \times e^{-\beta d} \\&=0 \times (1-e^{-\beta d})+\frac{2}{\beta^2} \times e^{-\beta d} \\&=\frac{2}{\beta^2} \times e^{-\beta d} \end{aligned}

The second moment of the random variable $X$ is $E[X^2]=Var[X]+E[X]^2=\frac{1}{\beta^2}+\frac{1}{\beta^2}=\frac{2}{\beta^2}$. The variance of the insurance payment $Y$ is

$\displaystyle Var[Y]=\frac{2 e^{-\beta d}}{\beta^2}-\biggl( \frac{e^{-\beta d}}{\beta} \biggr)^2$

In this example, $\frac{1}{\beta}=2000$ and $d=1000$. We have:

$\displaystyle E[Y]=2000 \ e^{-\frac{1000}{2000}}=2000 \ e^{-0.5}=1213.06$

$\displaystyle Var[Y]=2 \ (2000^2) \ e^{-0.5}-(2000 e^{-0.5})^2=3380727.513$

$\sigma_Y=\sqrt{Var[Y]}=1838.675$

Using the memoryless property and the fact that the insurance $Y$ is a mixture requires less calculation. If the damage amount $X$ is not exponential, then we may have to resort to the direct calculation by doing the above integrals. $\square$

_______________________________________________________________________________________________

The Unique Distribution with the Memoryless Property

Now we show that exponential distribution is the only one with the memoryless property. First establish the fact that any right continuous function defined on $(0,\infty)$ satisfying the functional relation $g(s+t)=g(s) \ g(t)$ must be an exponential function. The statement that $g$ is a right continuous function means that if $x_n \rightarrow x$ and $x for all $n$, then $g(x_n) \rightarrow g(x)$.

Let $g$ be a right continuous function that is defined on $(0,\infty)$ such that it satisfies the functional relation $g(s+t)=g(s) \ g(t)$. First, establish the following:

$g(\frac{m}{n})=g(1)^{\frac{m}{n}}$ for any positive integers $m$ and $n$.

Note that for any positive integer $n$, $g(\frac{2}{n})=g(\frac{1}{n}+\frac{1}{n})=g(\frac{1}{n}) \ g(\frac{1}{n})=g(\frac{1}{n})^2$. It follows that for any positive integer $m$, $g(\frac{m}{n})=g(\frac{1}{n})^m$. On the other hand, $g(\frac{1}{n})=g(1)^\frac{1}{n}$. To see this, note that $g(1)=g(\frac{1}{n}+\cdots+\frac{1}{n})=g(\frac{1}{n})^n$. Raising both sides to $\frac{1}{n}$ gives the claim. Combining these two claims give the fact stated above. We have established the fact that $g(r)=g(1)^r$ for any positive rational number $r$.

Next, we show that $g(x)=g(1)^x$ for any $x$. To see this, let $r_j$ be a sequence of rational numbers converging to $x$ from the right. Then $g(r_j) \rightarrow g(x)$. By the above fact, $g(r_j)=g(1)^{r_j}$ for all $j$. On the other hand, $g(1)^{r_j} \rightarrow g(1)^{x}$. The same sequence $g(r_j)=g(1)^{r_j}$ converges to both $g(x)$ and $g(1)^{x}$. Thus $g(x)=g(1)^{x}$. This means that $g$ is an exponential function with base $g(1)$. If the natural log constant $e$ is desired as a base, $g(x)=e^{a x}$ where $a=\text{ln}(g(1))$.

Suppose that $X$ is memoryless. Let $S(x)$ be the survival function of the random variable $X$, i.e. $S(x)=P[X>x]$. Then by property $(2)$, $S(s+t)=S(s) \ S(t)$. So the survival function $S(x)$ satisfies the functional relation $g(s+t)=g(s) \ g(t)$. The survival function $S(x)$ is always right continuous. By the fact that any right continuous function satisfying the functional relation $g(s+t)=g(s) \ g(t)$ must be an exponential function, the survival function $S(x)$ must be an exponential function. Thus $X$ is an exponential random variable. This establishes the fact that among the continuous distributions, the exponential distribution is the only one with the memoryless property.

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$