# The exponential distribution

This post focuses on the mathematical properties of the exponential distribution. Since the exponential distribution is a special case of the gamma distribution, the starting point of the discussion is on the properties that are inherited from the gamma distribution. The discussion then switches to other intrinsic properties of the exponential distribution, e.g. the memoryless property. The next post discusses the intimate relation with the Poisson process. Additional topics on exponential distribution are discussed in this post.

The exponential distribution is highly mathematically tractable. For the exponential distribution with mean $\frac{1}{\beta}$ (or rate parameter $\beta$), the density function is $f(x)=\frac{1}{\beta} \ e^{-\frac{x}{\beta}}$. Thus for the exponential distribution, many distributional items have expression in closed form. The exponential distribution can certainly be introduced by performing calculation using the density function. For the sake of completeness, the distribution is introduced as a special case of the gamma distribution.

_______________________________________________________________________________________________

The Gamma Perspective

The exponential distribution is a special case of the gamma distribution. Recall that the gamma distribution has two parameters, the shape parameter $\alpha$ and the rate parameter $\beta$. Another parametrization uses the scale parameter $\theta$, which is $\theta=1 / \beta$. For now we stick with the rate parameter $\beta$ because of the connection with the Poisson process discussed below. The exponential distribution is simply a gamma distribution when $\alpha=1$. Immediately the properties of the gamma distribution discussed in the previous post can be transferred here. Suppose that $X$ is a random variable that follows the exponential distribution with rate parameter $\beta$, equivalently with mean $1 / \beta$. Immediately we know the following.

\displaystyle \begin{array}{lllll} \text{ } &\text{ } & \text{Definition} & \text{ } & \text{Exponential Distribution} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{PDF} &\text{ } & \text{ } & \text{ } & \displaystyle \begin{array}{ll} \displaystyle f_X(y)=\beta \ e^{-\beta x} & \ \ \displaystyle x>0 \\ \text{ } & \text{ } \end{array} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{CDF} &\text{ } & \displaystyle F_X(x)=\int_0^x \ f_X(t) \ dt & \text{ } & \displaystyle \begin{aligned} F_X(x)&=1-e^{-\beta \ x} \end{aligned} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Survival Function} &\text{ } & \displaystyle S_X(x)=\int_x^\infty \ f_X(t) \ dt & \text{ } & \displaystyle \begin{aligned} S_X(x)&=e^{-\beta \ x} \end{aligned} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Mean} &\text{ } & \displaystyle E(X)=\int_0^\infty x \ f_X(t) \ dt & \text{ } & \displaystyle \frac{1}{\beta} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Higher Moments} &\text{ } & \displaystyle E(X^k)=\int_0^\infty x^k \ f_X(t) \ dt & \text{ } & \displaystyle \left\{ \begin{array}{ll} \displaystyle \frac{\Gamma(1+k)}{\beta^k} &\ k>-1 \\ \text{ } & \text{ } \\ \displaystyle \frac{k!}{\beta^k} &\ k \text{ is a positive integer} \end{array} \right. \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Variance} &\text{ } & E(X^2)-E(X)^2 & \text{ } & \displaystyle \frac{1}{\beta^2} \\ \text{ } &\text{ } & \text{ } & \text{ } & \text{ } \\ \end{array}

$\displaystyle \begin{array}{lllll} \text{ } &\text{ } & \text{ } & \text{ } & \text{ } \\ \text{Mode} \ \ &\text{ } & \text{ } & \text{ } & \text{always } 0 \\ \text{ } &\text{ } & \text{ } & \text{ } & \text{ } \\ \text{MGF} \ \ &\text{ } & M_X(t)=E[e^{tX}] \ \ \ \ \ \ \ \ & \text{ } & \displaystyle \begin{array}{ll} \displaystyle\frac{\beta}{\beta- t} & \ \ \ \ \ \displaystyle t<\beta \end{array} \\ \text{ } &\text{ } & \text{ } & \text{ } & \text{ } \\ \text{CV} \ \ &\text{ } & \displaystyle \frac{\sqrt{Var(X)}}{E(X)}=\frac{\sigma}{\mu} \ \ \ \ \ \ \ \ & \text{ } & 1 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Skewness} \ \ &\text{ } & \displaystyle E\biggl[\biggl(\frac{X-\mu}{\sigma}\biggr)^3\biggr] \ \ \ \ \ \ \ \ & \text{ } & 2 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Kurtosis} \ \ &\text{ } & \displaystyle E\biggl[\biggl(\frac{X-\mu}{\sigma}\biggr)^4\biggr] \ \ \ \ \ \ \ \ & \text{ } & 9 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Excess Kurtosis} \ \ &\text{ } & \displaystyle E\biggl[\biggl(\frac{X-\mu}{\sigma}\biggr)^4\biggr]-3 \ \ \ \ \ \ \ \ & \text{ } & 6 \end{array}$

The above items are obtained by plugging $\alpha=1$ into the results in the post on gamma distribution. It is clear that exponential distribution is very mathematically tractable. The CDF has a closed form. The moments have a closed form. As a result, it is possible to derive many more properties than the ones shown here.

_______________________________________________________________________________________________

The Memoryless Property

No discussion on the exponential distribution is complete without the mentioning of the memoryless property. Suppose a random variable $X$ has support on the interval $(0, \infty)$. The random variable $X$ is said to have the memoryless property (or the “no memory” property) if

$\displaystyle P(X > u+t \ | \ X > t)=P(X > u) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

holds for all positive real numbers $t$ and $u$. To get an appreciation of this property, let’s think of $X$ as the lifetime (in years) of some machine or some device. The statement in $(1)$ says that if the device has lived at least $t$ years, then the probability that the device will live at least $u$ more years is the same as the probability that a brand new device lives to age $u$ years. It is as if the device does not remember that it has already been in use for $t$ years. If the lifetime of a device satisfies this property, it does not matter if you buy an old one or a new one. Both old and new have the same probability of living an additional $u$ years. In other words, old device is as good as new.

Since the following is true,

\displaystyle \begin{aligned} P(X > u+t \ | \ X > t)&=\frac{P(X > u+t \text{ and } X > t)}{P(X > t)} \\&=\frac{P(X > u+t)}{P(X > t)} \end{aligned}

the memoryless property $(1)$ is equivalent to the following:

$\displaystyle P(X > u+t )=P(X > u) \times P(X > t) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

The property $(2)$ says that the survival function of this distribution is a multiplicative function. The exponential distribution satisfies this property, i.e.

$\displaystyle P(X > u+t )=e^{-\beta (u+t)}=e^{-\beta u} \ e^{-\beta t}=P(X > u) \times P(X > t)$

On the other hand, any continuous function that satisfies the multiplicative property $(2)$ must be an exponential function (see the argument at the end of the post). Thus there is only one continuous probability distribution that possesses the memoryless property. The memoryless property can also be stated in a different but equivalent way as follows:

The conditional random variable $X-t \ | \ X > t$ is distributed identically as the unconditional random variable $X. \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)$

The statement $(1)$ can be rewritten $\displaystyle P(X-t > u \ | \ X > t)=P(X > u)$. This means the conditional variable $X-t \ | \ X > t$ shares the same survival function as the original random variable $X$, hence sharing the same cumulative distribution and the same density function and so on.

The memoryless property, anyone of the three statements, is a striking property. Once again, thinking of $X$ as the lifetime of a system or a device. The conditional random variable in statement $(3)$ is the remaining lifetime for a device at age $t$. Statement $(3)$ says that it does not matter how old the device is, the remaining useful life is still governed by the same probability distribution for a brand new device, i.e. it is just as likely for an old device to live $u$ more years as for a new device, and that an old device is expected to live as long as a new device.

_______________________________________________________________________________________________

Examples

Let’s look a few examples.

Example 1
Suppose that a bank has two tellers serving retailed customers that walk into the bank lobby. A queue is set up for customers to wait for one of the tellers. The time between the arrivals of two consecutive customers in the queue has an exponential distribution with mean 3 minutes. The time each teller spent with a customer has an exponential distribution with mean 5 minutes. Assume that the service times for the two tellers are independent. At 12:00 PM, both tellers are busy and a customer has just arrived in the queue.

1. What is the probability that the next arrival in the queue will come before 12:05 PM? Between 12:05 and 12:10? After 12:10 PM?
2. If no additional customers arrive before 12:05 PM, what is the probability that the next arrival will come within the next 5 minutes?
3. If both tellers are busy serving customers at 1:00 PM, what is the probability that neither teller will finish the service before 1:05 PM? Before 1:10 PM?

Let $X$ be the waiting time between two consecutive customers and $Y$ be the service time of a bank teller, both in minutes. The answers for Part 1 are

$P[X \le 5]=1-e^{-\frac{5}{3}}=0.81$

$P[5 < X \le 10]=e^{-\frac{5}{3}}-e^{-\frac{10}{3}}=0.1532$

$P[X > 10]=e^{-\frac{10}{3}}=0.0357$

Part 2 involves the memoryless property. The answer is:

\displaystyle \begin{aligned} P[X \le 10 |X > 5]&=1-P[X > 10 |X > 5] \\&=1-P[X > 5] \\&=P[X \le 5] \\&=1-e^{-\frac{5}{3}}=0.81 \end{aligned}

Part 3 also involves the memoryless property. It does not matter how long each server has spent with the customer prior to 1:00 PM, the remaining service time after 1:00 PM still has the same exponential service time distribution. The answers are:

\displaystyle \begin{aligned} P[\text{both service times more than 5 min}]&=P[Y > 5] \times P[Y > 5] \\&=e^{-\frac{5}{5}} \times e^{-\frac{5}{5}} \\&=e^{-2} \\&=0.1353 \end{aligned}

\displaystyle \begin{aligned} P[\text{both service times more than 10 min}]&=P[Y > 10] \times P[Y > 10] \\&=e^{-\frac{10}{5}} \times e^{-\frac{10}{5}} \\&=e^{-4} \\&=0.0183 \end{aligned}

Example 2
Suppose that times between fatal auto accidents on a stretch of busy highway have an exponential distribution with a mean of 20 days. Suppose that an accident occurred on July 1. What is the probability that another fatal accident occurred in the same month? If the month of July were accident-free in this stretch of highway except for the accident on July 1, what is the probability that there will be another fatal accident in the following month (August)?

Let $X$ be the time in days from July 1 to the next fatal accident on this stretch of highway. Then $X$ is exponentially distributed with a mean of 20 days. The probability that another fatal accident will occur in the month of July is $P[X \le 31]$, which is

$P[X \le 31]=1-e^{-\frac{31}{20}}=1-e^{-1.55}=0.7878$.

Note that the month of July has 31 days and the month of August has 31 days. If the month of July is accident-free except for the accident on July 1, then the probability that an accident occurs in August is:

\displaystyle \begin{aligned} P[X \le 62 |X > 31]&=1-P[X > 62 |X > 31] \\&=1-P[X > 31] \\&=P[X \le 31] \\&=1-e^{-\frac{31}{20}}=1-e^{-1.55}=0.7878 \end{aligned}

In setting $P[X > 62 |X > 31]=P[X > 31]$, the memoryless property is used in the above derivation . If the occurrence of fatal accidents is a random event and furthermore, if the time between two successive accidents is exponentially distributed, then there is no “memory” in the waiting of the next fatal accident. Having a full month of no accidents has no bearing on when the next fatal accident occurs. $\square$

Example 3
Suppose that the amount of damage in an automobile accident follows an exponential distribution with mean 2000. An insurance coverage is available to cover such damages subject to a deductible of 1000. That is, if the damage amount is less than the deductible, the insurance pays nothing. If the damage amount is greater than the deductible, the policy pays the damage amount in excess of the deductible. Determine the mean, variance and standard deviation of the insurance payment per accident.

For clarity, the example is first discussed using $\beta$ and $d$, where $\frac{1}{\beta}$ is the mean of the exponential damage and $d$ is the deductible. Let $X$ be the amount of the damage of an auto accident. Let $Y$ be the amount paid by the insurance policy per accident. Then the following is the rule for determining the amount of payment.

$\displaystyle Y = \left\{ \begin{array}{ll} \displaystyle 0 & \ \ \ \ X \le d \\ \text{ } & \text{ } \\ \displaystyle X-d &\ \ \ \ d < X \end{array} \right.$

With this payment rule, $E[Y]$, $Var[Y]$ and $\sigma_Y$ can be worked out based on the exponential random variable $X$ for the damage amount as follows:

$\displaystyle E[Y]=\int_{d}^\infty (x-d) \ \beta \ e^{-\beta x} \ dx$

$\displaystyle E[Y^2]=\int_{d}^\infty (x-d)^2 \ \beta \ e^{-\beta x} \ dx$

$Var[Y]=E[Y^2]-E[Y]^2$

$\sigma_Y=\sqrt{Var[Y]}$

Because the exponential distribution is mathematically very tractable, the mean $E[Y]$ and the variance $Var[Y]$ are very doable. Indeed the above integrals are excellent exercise for working with exponential distribution. We would like to demonstrate a different approach. Because of the memoryless property, there is no need to calculate the above integrals.

The insurance payment $Y$ is a mixture. Specifically it can be one of two possibilities. With probability $P[X \le d]=1-e^{-\beta d}$, $Y=0$. With probability $P[X > d]=e^{-\beta d}$, $Y=X-d |X > d$. Because $X$ is an exponential random variable, $Y=X-d |X > d$ is distributed identically as the original damage amount $X$. Thus the mean of $Y$, $E[Y]$, is the weighted average of $E[Y|X \le d]$ and $E[Y|X > d]$. Likewise $E[Y^2]$ is also a weighted average. The following shows how to calculate the first two moments of $Y$.

\displaystyle \begin{aligned} E[Y]&=0 \times P[X \le d]+E[X-d |X > d] \times P[X > d] \\&=0 \times (1-e^{-\beta d})+E[X] \times e^{-\beta d} \\&=0 \times (1-e^{-\beta d})+\frac{1}{\beta} \times e^{-\beta d} \\&=\frac{1}{\beta} \times e^{-\beta d} \end{aligned}

\displaystyle \begin{aligned} E[Y^2]&=0 \times P[X \le d]+E[(X-d)^2 |X > d] \times P[X > d] \\&=0 \times (1-e^{-\beta d})+E[X^2] \times e^{-\beta d} \\&=0 \times (1-e^{-\beta d})+\frac{2}{\beta^2} \times e^{-\beta d} \\&=\frac{2}{\beta^2} \times e^{-\beta d} \end{aligned}

The second moment of the random variable $X$ is $E[X^2]=Var[X]+E[X]^2=\frac{1}{\beta^2}+\frac{1}{\beta^2}=\frac{2}{\beta^2}$. The variance of the insurance payment $Y$ is

$\displaystyle Var[Y]=\frac{2 e^{-\beta d}}{\beta^2}-\biggl( \frac{e^{-\beta d}}{\beta} \biggr)^2$

In this example, $\frac{1}{\beta}=2000$ and $d=1000$. We have:

$\displaystyle E[Y]=2000 \ e^{-\frac{1000}{2000}}=2000 \ e^{-0.5}=1213.06$

$\displaystyle Var[Y]=2 \ (2000^2) \ e^{-0.5}-(2000 e^{-0.5})^2=3380727.513$

$\sigma_Y=\sqrt{Var[Y]}=1838.675$

Using the memoryless property and the fact that the insurance $Y$ is a mixture requires less calculation. If the damage amount $X$ is not exponential, then we may have to resort to the direct calculation by doing the above integrals. $\square$

_______________________________________________________________________________________________

The Unique Distribution with the Memoryless Property

Now we show that exponential distribution is the only one with the memoryless property. First establish the fact that any right continuous function defined on $(0,\infty)$ satisfying the functional relation $g(s+t)=g(s) \ g(t)$ must be an exponential function. The statement that $g$ is a right continuous function means that if $x_n \rightarrow x$ and $x for all $n$, then $g(x_n) \rightarrow g(x)$.

Let $g$ be a right continuous function that is defined on $(0,\infty)$ such that it satisfies the functional relation $g(s+t)=g(s) \ g(t)$. First, establish the following:

$g(\frac{m}{n})=g(1)^{\frac{m}{n}}$ for any positive integers $m$ and $n$.

Note that for any positive integer $n$, $g(\frac{2}{n})=g(\frac{1}{n}+\frac{1}{n})=g(\frac{1}{n}) \ g(\frac{1}{n})=g(\frac{1}{n})^2$. It follows that for any positive integer $m$, $g(\frac{m}{n})=g(\frac{1}{n})^m$. On the other hand, $g(\frac{1}{n})=g(1)^\frac{1}{n}$. To see this, note that $g(1)=g(\frac{1}{n}+\cdots+\frac{1}{n})=g(\frac{1}{n})^n$. Raising both sides to $\frac{1}{n}$ gives the claim. Combining these two claims give the fact stated above. We have established the fact that $g(r)=g(1)^r$ for any positive rational number $r$.

Next, we show that $g(x)=g(1)^x$ for any $x$. To see this, let $r_j$ be a sequence of rational numbers converging to $x$ from the right. Then $g(r_j) \rightarrow g(x)$. By the above fact, $g(r_j)=g(1)^{r_j}$ for all $j$. On the other hand, $g(1)^{r_j} \rightarrow g(1)^{x}$. The same sequence $g(r_j)=g(1)^{r_j}$ converges to both $g(x)$ and $g(1)^{x}$. Thus $g(x)=g(1)^{x}$. This means that $g$ is an exponential function with base $g(1)$. If the natural log constant $e$ is desired as a base, $g(x)=e^{a x}$ where $a=\text{ln}(g(1))$.

Suppose that $X$ is memoryless. Let $S(x)$ be the survival function of the random variable $X$, i.e. $S(x)=P[X>x]$. Then by property $(2)$, $S(s+t)=S(s) \ S(t)$. So the survival function $S(x)$ satisfies the functional relation $g(s+t)=g(s) \ g(t)$. The survival function $S(x)$ is always right continuous. By the fact that any right continuous function satisfying the functional relation $g(s+t)=g(s) \ g(t)$ must be an exponential function, the survival function $S(x)$ must be an exponential function. Thus $X$ is an exponential random variable. This establishes the fact that among the continuous distributions, the exponential distribution is the only one with the memoryless property.

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$