More topics on the exponential distribution

This is a continuation of two previous posts on the exponential distribution (an introduction and a post on the connection with the Poisson process). This post presents more properties that are not discussed in the two previous posts. Of course, a serious and in-depth discussion of the exponential distribution can fill volumes. The goal here is quite modest – to present a few more properties related to the memoryless property of the exponential distribution.

_______________________________________________________________________________________________

The Failure Rate

A previous post discusses the Poisson process and its relation to the exponential distribution. Now we present another way of looking at both notions. Suppose a counting process counts the occurrences of a type of random events. Suppose that an event means a termination of a system, be it biological or manufactured. Furthermore suppose that the terminations occur according to a Poisson process at a constant rate \lambda per unit time. Then what is the meaning of the rate \lambda? It is the rate of termination (dying). It is usually called the failure rate (or hazard rate or force of mortality). The meaning of the constant rate \lambda is that the rate of dying is the same regardless of the location of the time scale (i.e. regardless how long a life has lived). This means that the lifetime (the time until death) of such a system has no memory. Since the exponential distribution is the only continuous distribution with the memoryless property, the time until the next termination inherent in the Poisson process in question must be an exponential random variable with rate \lambda or mean \frac{1}{\lambda}. So the notion of failure rate function (or hazard rate function) runs through the notions of exponential and Poisson process and further illustrates the memoryless property.

Consider a continuous random variable X that only takes on the positive real numbers. Suppose F and f are the CDF and density function of X, respectively. The survival function is S=1-F. The failure rate (or hazard rate) \mu(t) is defined as:

    \displaystyle \mu(t)=\frac{f(t)}{1-F(t)}=\frac{f(t)}{S(t)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

The function \mu(t) can be interpreted as the rate of failure at the next instant given that the life has survived to time t. Suppose that the lifetime distribution is exponential. Because of the memoryless proeprty, the remaining lifetime of a t-year old is the same as the lifetime distribution for a new item. It is then intuitively clear that the failure rate must be constant. Indeed, it is straightforward to show that if the lifetime X is an exponential random variable with rate parameter \lambda, i.e. the density is f(t)=\lambda e^{-\lambda t}, then the failure rate is \mu(t)=\lambda. This is why the parameter \lambda in the density function f(t)=\lambda e^{-\lambda t} is called the rate parameter.

On the other hand, the failure rate or hazard rate is not constant for other lifetime distributions. However, the hazard rate function \mu(t) uniquely determines the distributional quantities such as the CDF and the survival function. The definition (1) shows that the failure rate is derived using the CDF or survival function. We show that the failure rate is enough information to derive the CDF or the survival function. From the definition, we have:

    \displaystyle \mu(t)=\frac{-\frac{d}{dt} S(t)}{S(t)} \ \ \  \text{or} \ \ \ -\mu(t)=\frac{\frac{d}{dt} S(t)}{S(t)}

Integrate both sides produces the following:

    \displaystyle - \int_0^t \ \mu(x) \ dx=\int_0^t \ \frac{\frac{d}{dt} S(t)}{S(t)} \ dx=\text{ln} S(t)

Exponentiate on each side produces the following:

    \displaystyle S(t)=e^{-\int_0^t \ \mu(x) \ dx} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

Once the survival function is obtained from \mu(t), the CDF and the density function can be obtained. Interestingly, the derivation in (2) can give another proof of the fact that the exponential distribution is the only one with the memoryless property. If X is memoryless, then the failure rate must be constant. If \mu(x) is constant, then by (2), S(t) is the exponential survival function. Of course, the other way is clear: if X is exponential, then X is memoryless.

The preceding discussion shows that having a constant failure rate is another way to characterize the exponential distribution, in particular the memoryless property of the exponential distribution. Before moving onto the next topics, another example of a failure rate function is \mu(t)=\frac{\alpha}{\beta} (\frac{t}{\beta})^{\alpha-1} where both \alpha and \beta are positive constants. This is the Weibull hazard rate function. The derived survival function is S(t)=e^{-(\frac{t}{\beta})^\alpha}, which is the survival function for the Weibull distribution. This distribution is an excellent model choice for describing the life of manufactured objects. See here for an introduction to the Weibull distribution.

_______________________________________________________________________________________________

The Minimum Statistic

The following result is about the minimum of independent exponential distributions.

    Suppose that X_1,X_2,\cdots,X_n are independent exponential random variables with rates \lambda_1,\lambda_2,\cdots,\lambda_n, respectively. Then the minimum of the sample, denoted by Y=\text{min}(X_1,X_2,\cdots,X_n), is also an exponential random variable with rate \lambda_1+\lambda_2 +\cdots+\lambda_n. \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)

For the minimum to be > y, all sample items must be > y, Thus P[Y> y] is:

    \displaystyle \begin{aligned} P[Y> y]&=P[X_1> y] \times P[X_2> y] \times \cdots \times P[X_n> y] \\&=e^{-\lambda_1 y} \times e^{-\lambda_2 y} \times \cdots \times e^{-\lambda_n y}\\&=e^{-(\lambda_1+\lambda_2+\cdots+\lambda_n) y}  \end{aligned}

This means that Y=\text{min}(X_1,X_2,\cdots,X_n) has an exponential distribution with rate \lambda_1+\lambda_2+\cdots+\lambda_n. As a result, the smallest item of a sample of independent exponential observations also follows an exponential distribution with rate being the sum of all the individual exponential rates.

Example 1
Suppose that X_1,X_2,X_3 are independent exponential random variables with rates \lambda_i for i=1,2,3. Calculate E[\text{min}(X_1,X_2,X_3)] and E[\text{max}(X_1,X_2,X_3)].

Let X=\text{min}(X_1,X_2,X_3) and Y=\text{max}(X_1,X_2,X_3). Then X is an exponential distribution with rate \lambda_1+\lambda_2+\lambda_3. As a result, E[X]=\frac{1}{\lambda_1+\lambda_2+\lambda_3}. Finding the expected value of the maximum requires calculation. First calculate P[Y \le y].

    \displaystyle \begin{aligned} P[Y \le y]&=P[X_1 \le y] \times P[X_2 \le y] \times  P[X_3 \le y] \\&=(1-e^{-\lambda_1 y}) \times (1-e^{-\lambda_2 y}) \times (1-e^{-\lambda_3 y}) \\&=1-e^{-\lambda_1 y}-e^{-\lambda_2 y}-e^{-\lambda_3 y}+e^{-(\lambda_1+\lambda_2) y}+e^{-(\lambda_1+\lambda_3) y} \\& \ \ \ +e^{-(\lambda_2+\lambda_3) y}-e^{-(\lambda_1+\lambda_2+\cdots+\lambda_n) y}  \end{aligned}

Differentiate F_Y(y)=P[Y \le y] to obtain the density function f_Y(y).

    \displaystyle \begin{aligned} f_Y(y)&=\lambda_1 e^{-\lambda_1 y}+\lambda_2 e^{-\lambda_2 y}+ \lambda_3 e^{-\lambda_3 y} \\& \ \ \ -(\lambda_1+\lambda_2) e^{-(\lambda_1+\lambda_2) y}-(\lambda_1+\lambda_3) e^{-(\lambda_1+\lambda_3) y}-(\lambda_2+\lambda_3)e^{-(\lambda_2+\lambda_3) y} \\& \ \ \ \ +(\lambda_1+\lambda_2+\cdots+\lambda_n)e^{-(\lambda_1+\lambda_2+\cdots+\lambda_n) y}  \end{aligned}

Each term in the density function f_Y(y) is by itself an exponential density. Thus the mean of the maximum is:

    \displaystyle E[Y]=\frac{1}{\lambda_1}+\frac{1}{\lambda_2}+ \frac{1}{\lambda_3} -\frac{1}{\lambda_1+\lambda_2}-\frac{1}{\lambda_1+\lambda_3}-\frac{1}{\lambda_2+\lambda_3}+\frac{1}{\lambda_1+\lambda_2+\lambda_3}

To make sense of the numbers, let \lambda_1=\lambda_2=\lambda_3=1. Then E[X]=\frac{1}{3}=\frac{2}{6} and E[Y]=\frac{11}{6}. In this case, the expected value of the maximum is 5.5 times larger than the expected value of the minimum. For \lambda_1=1, \lambda_2=2 and \lambda_3=3, E[X]=\frac{1}{6} and E[Y]=\frac{73}{60}=\frac{7.3}{6}. In the second case, the expected value of the maximum is 7.3 times larger than the expected value of the minimum. \square

_______________________________________________________________________________________________

Ranking Independent Exponential Distributions

In this section, X_1,X_2,\cdots,X_n are independent exponential random variables where the rate of X_i is \lambda_i for i=1,2,\cdots,n. What is the probability of P[X_{j(1)} <X_{j(2)}<\cdots<X_{j(k)}]? Here the subscripts j(1),\cdots,j(k) are distinct integers from \left\{1,2,\cdots,n \right\}. For example, for sample size of 2, what are the probabilities P[X_1<X_2] and P[X_2<X_1]? For sample size of 3, what are P[X_1<X_2<X_3] and P[X_2<X_1<X_3]? First, the case of ranking two independent exponential random variables.

    Ranking X_1 and X_2.

      \displaystyle P[X_1<X_2]=\frac{\lambda_1}{\lambda_1+\lambda_2} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)

    where \lambda_1 is the rate of X_1 and \lambda_2 is the rate of X_2. Note that this probability is the ratio of the rate of the smaller exponential random variable over the total rate. The probability P[X_1<X_2] can be computed by evaluating the following integral:

      \displaystyle P[X_1<X_2]=\int_0^\infty \int_x^\infty \ \lambda_1 e^{-\lambda_1 \ x} \ \lambda_2 e^{-\lambda_2 \ x} \ dy \ dx

The natural next step is to rank three or more exponential random variables. Ranking three variables would require a triple integral and ranking more variables would require a larger multiple integral. Instead, we rely on a useful fact about the minimum statistic. First, another basic result.

    When one of the variables is the minimum:

      \displaystyle P[X_i=\text{min}(X_1,\cdots,X_n)]=\frac{\lambda_i}{\lambda_1+\lambda_2+\cdots+\lambda_n} \ \ \ \ \ \ \ \ \ \ \ \ \ (5)

    The above says that the probability that the ith random variable is the smallest is simply the ratio of the rate of the ith variable over the total rate. This follows from (4) since we are ranking the two exponential variables X_i and \text{min}(X_1,\cdots,X_{i-1},X_{i+1},\cdots,X_n).

We now consider the following theorem.

Theorem 1
Let X_1,X_2,\cdots,X_n be independent exponential random variables. Then the minimum statistic \text{min}(X_1,X_2,\cdots,X_n) and the rank ordering of X_i are independent.

The theorem basically says that the probability of a ranking is not dependent on the location of the minimum statistic. For example, if we know that the minimum is more than 3, what is the probability of X_1<X_2<X_3? The theorem is saying that conditioning on \text{min}(X_1,X_2,X_3)>3 makes no difference on the probability of the ranking. Let Y=\text{min}(X_1,X_2,\cdots,X_n). The following establishes the theorem.

    \displaystyle P[X_{j(1)}<X_{j(2)}<\cdots<X_{j(k)} \ |\ \text{min}(X_1,X_2,\cdots,X_n)>t]

    \displaystyle =P[X_{j(1)}-t<X_{j(2)}-t<\cdots<X_{j(k)}-t \ |\ \text{min}(X_1,X_2,\cdots,X_n)>t]

    \displaystyle =P[X_{j(1)}<X_{j(2)}<\cdots<X_{j(k)}]

The key to the proof is the step to go from the second line to the third line. Assume that each X_i is the lifetime of a machine. When \text{min}(X_1,X_2,\cdots,X_n)>t, all the lifetimes X_i>t. By the memoryless property, the remaining lifetimes X_i-t are independent and exponential with the original rates. In other words, each X_i-t has the same exponential distribution as X_i. Consequently, the second line equals to the third line. To make it easier to see the step from the second line to the third line, think of the two dimensional case with X_1 and X_2. \square

The following is a consequence of Theorem 1.

Corollary 2
Let X_1,X_2,\cdots,X_n be independent exponential random variables. Then the event X_i=\text{min}(X_1,X_2,\cdots,X_n) and the rank ordering of X_1,\cdots,X_{i-1},X_{i+1}\cdots,X_n are independent.

Another way to state the corollary is that the knowledge that X_i is the smallest in the sample has no effect on the ranking of the variables other than X_i. This is the consequence of Theorem 1. To see why, let t=X_i=\text{min}(X_1,X_2,\cdots,X_n).

    \displaystyle P[X_{j(1)}<X_{j(2)}<\cdots<X_{j(k)} \ |\ t=X_i=\text{min}(X_1,X_2,\cdots,X_n)]

    \displaystyle =P[X_{j(1)}<X_{j(2)}<\cdots<X_{j(k)} \ |\ \text{min}(X_1,\cdots,X_{i-1},X_{i+1}\cdots,X_n)>t]

    \displaystyle =P[X_{j(1)}<X_{j(2)}<\cdots<X_{j(k)}] \square

We now present examples demonstrating how these ideas are used.

Example 2
Suppose that a bank has three tellers for serving its customers. The random variable X_1,X_2,X_3 are independent exponential random variables where X_i is the time spent by teller i serving a customer. The rate parameter of X_i is \lambda_i where i=1,2,3. If all three tellers are busy serving customers, what is P[X_1<X_2<X_3]? If the bank has 4 tellers instead, then what is P[X_1<X_2<X_3<X_4]?

The answer is given by the following:

    \displaystyle \begin{aligned} P[X_1<X_2<X_3]&=P[X_1=\text{min}(X_1,X_2,X_3)] \\& \ \ \ \times P[X_2<X_3 | X_1=\text{min}(X_1,X_2,X_3)] \\&\text{ } \\&=P[X_1=\text{min}(X_1,X_2,X_3)] \times P[X_2<X_3] \\&\text{ } \\&=\frac{\lambda_1}{\lambda_1+\lambda_2+\lambda_3} \times \frac{\lambda_2}{\lambda_2+\lambda_3}  \end{aligned}

The derivation uses Corollary 2 and the idea in (5). The same idea can be used for P[X_1<X_2<X_3<X_4].

    \displaystyle \begin{aligned} P[X_1<X_2<X_3<X_4]&=P[X_1=\text{min}(X_1,X_2,X_3,X_4)] \\& \ \ \times P[X_2<X_3<X_4 | X_1=\text{min}(X_1,X_2,X_3,X_4)] \\&\text{ } \\&=P[X_1=\text{min}(X_1,X_2,X_3,X_4)] \times P[X_2<X_3<X_4] \\&\text{ } \\&=P[X_1=\text{min}(X_1,X_2,X_3,X_4)] \\& \ \ \ \times P[X_2=\text{min}(X_2,X_3,X_4)] \times P[X_3<X_4] \\&\text{ } \\&=\frac{\lambda_1}{\lambda_1+\lambda_2+\lambda_3+\lambda_4} \times \frac{\lambda_2}{\lambda_2+\lambda_3+\lambda_4} \times \frac{\lambda_3}{\lambda_3+\lambda_4} \end{aligned}

The above applies the independence result twice, the first time on X_1,X_2,X_3,X_4, the second time on X_2,X_3,X_4. This approach is much preferred over direction calculation, which would involve integral calculation that is tedious and error prone. \square

Example 3
As in Example 2, a bank has three tellers for serving its customers. The service times of the three tellers are independent exponential random variables. The mean service time for teller i is \frac{1}{\lambda_i} minutes where i=1,2,3. You walk into the bank and find that all three tellers are busy serving customers. You are the only customer waiting for an available teller. Calculate the expected amount of time you spend at the bank.

Let T be the total time you spend in the bank, which is T=W+S where W is the waiting time for a teller to become free and S is the service time of the teller helping you. When you walk into the bank, the tellers are already busy. Let X_i be the remaining service time for teller i, i=1,2,3. By the memoryless property, X_i is exponential with the original mean \frac{1}{\lambda_i}. As a result, the rate parameter of X_i is \lambda_i.

The waiting time W is simply \text{min}(X_1,X_2,X_3). Thus E[W]=\frac{1}{\lambda_1+\lambda_2+\lambda_3}. To find E[S], we need to consider three cases, depending on which teller finishes serving the current customer first.

    \displaystyle \begin{aligned} E[S]&=E[S | X_1=\text{min}(X_1,X_2,X_3)] \times P[X_1=\text{min}(X_1,X_2,X_3)] \\& \ \ + E[S | X_2=\text{min}(X_1,X_2,X_3)] \times P[X_2=\text{min}(X_1,X_2,X_3)]  \\& \ \ + E[S | X_3=\text{min}(X_1,X_2,X_3)] \times P[X_3=\text{min}(X_1,X_2,X_3)]\end{aligned}

Finishing the calculation,

    \displaystyle \begin{aligned} E[S]&=\frac{1}{\lambda_1} \times \frac{\lambda_1}{\lambda_1+\lambda_2+\lambda_3}  + \frac{1}{\lambda_2} \times \frac{\lambda_2}{\lambda_1+\lambda_2+\lambda_3}  + \frac{1}{\lambda_3} \times \frac{\lambda_3}{\lambda_1+\lambda_2+\lambda_3}   \\&=\frac{3}{\lambda_1+\lambda_2+\lambda_3} \end{aligned}

    \displaystyle E[T]=\frac{1}{\lambda_1+\lambda_2+\lambda_3}+\frac{3}{\lambda_1+\lambda_2+\lambda_3}=\frac{4}{\lambda_1+\lambda_2+\lambda_3}

_______________________________________________________________________________________________
\copyright \ 2016 - \text{Dan Ma}

The exponential distribution and the Poisson process

This post is a continuation of the previous post on the exponential distribution. The previous post discusses the basic mathematical properties of the exponential distribution including the memoryless property. This post looks at the exponential distribution from another angle by focusing on the intimate relation with the Poisson process.

_______________________________________________________________________________________________

The Poisson Process

A previous post shows that a sub family of the gamma distribution that includes the exponential distribution is derived from a Poisson process. This post gives another discussion on the Poisson process to draw out the intimate connection between the exponential distribution and the Poisson process.

Suppose a type of random events occur at the rate of \lambda events in a time interval of length 1. As the random events occur, we wish to count the occurrences. Starting at time 0, let N(t) be the number of events that occur by time t. A counting process is the collection of all the random variables N(t). That is, we are interested in the collection \left\{N(t): t \ge 0 \right\}. More specifically, we are interested in a counting process that satisfies the following three axioms:

  • The probability of having more than one occurrence in a short time interval is essentially zero.
  • The probability of the occurrence of a random event in a short time interval is proportional to the length of the time interval and not on where the time interval is located. More specifically, the probability of the occurrence of the random event in a short interval of length h is approximately \lambda \ h.
  • The numbers of random events occurring in non-overlapping time intervals are independent.

Any counting process that satisfies the above three axioms is called a Poisson process with the rate parameter \lambda. Of special interest are the counting random variables N(t), which is the number of random events that occur in the interval [0,t] and N(s+t)-N(t), which is the number of events that occur in the interval [s,s+t]. There are also continuous variables that are of interest. For example, the time until the occurrence of the first event, denoted by S_1, and in general, the time until the occurrence of the nth event, denoted by S_n. We are also interested in the interarrival times, i.e. the time between the occurrences of two consecutive events. These are notated by T_1, T_2, T_3,\cdots where T_n is the time between the occurrence of the (n-1)st event and the occurrence of the nth event. Of course, S_n=T_1+T_2+\cdots+T_n.

_______________________________________________________________________________________________

Some Basic Results

Given a Poisson process \left\{N(t): t \ge 0 \right\} with rate parameter \lambda>0, we discuss the following basic results:

    Counting Variables

  • The discrete random variable N(t) is a Poisson random variable with mean \lambda \ t.
  • The discrete random variable N(s+t)-N(s) is a Poisson random variable with mean \lambda \ t.

    Waiting Time Variables

  • The interarrival times, T_n, where n=1,2,3,\cdots, are independent and identically distributed exponential random variables with rate parameter \lambda (or mean \frac{1}{\lambda}).
  • The waiting time until the nth event has a gamma distribution with shape parameter n and rate parameter \lambda.

The result that N(t) is a Poisson random variable is a consequence of the fact that the Poisson distribution is the limit of the binomial distribution. To see this, let’s say we have a Poisson process with rate \lambda>0. Note thar \lambda is the rate of occurrence of the event per unit time interval. Then subdivide the interval [0,t] into n subintervals of equal length. When n is sufficiently large, we can assume that there can be only at most one event occurring in a subinterval (using the first two axioms in the Poisson process). Each subinterval is then like a Bermoulli trial (either 0 events or 1 event occurring in the subinterval). The probability of having exactly one event occurring in a subinterval is approximately \lambda (\frac{t}{n})=\frac{\lambda t}{n}. By the third criterion in the Poisson process, the n subintervals are independent Bernoulli trials. Thus the total number of events occurring in these n subintervals is a Binomial random variable with n trials and with probability of success in each trial being \frac{\lambda t}{n}. It can be shown mathematically that when n \rightarrow \infty, the binomial distributions converge to the Poisson distribution with mean \lambda \ t. This fact is shown here and here. It follows that N(t) has a Poisson distribution with mean \lambda \ t.

To show that the increment N(s+t)-N(s) is a Poisson distribution, we simply count the events in the Poisson process starting at time s. It is clear that the resulting counting process is also a Poisson process with rate \lambda. We can use the same subdivision argument to derive the fact that N(s+t)-N(s) is a Poisson random variable with mean \lambda \ t. The subdividing is of course on the interval [s,s+t].

Based on the preceding discussion, given a Poisson process with rate parameter \lambda, the number of occurrences of the random events in any interval of length t has a Poisson distribution with mean \lambda \ t. Thus in a Poisson process, the number of events that occur in any interval of the same length has the same distribution. Any counting process that satisfies this property is said to possess stationary increments. On the other hand, any counting process that satisfies the third criteria in the Poisson process (the numbers of occurrences of events in disjoint intervals are independent) is said to have independent increments. Thus a Poisson process possesses independent increments and stationary increments.

Here is an interesting observation as a result of the possession of independent increments and stationary increments in a Poisson process. By independent increments, the process from any point forward is independent of what had previously occurred. By stationary increments, from any point forward, the occurrences of events follow the same distribution as in the previous phase. Starting with a Poisson process, if we count the events from some point forward (calling the new point as time zero), the resulting counting process is probabilistically the same as the original process. The probabilistic behavior of the new process from some point on is not dependent on history. The probability statements we can make about the new process from some point on can be made using the same parameter as the original process. In other words, a Poisson process has no memory.

We now discuss the continuous random variables derived from a Poisson process. The time until the first change, T_1, has an exponential distribution with mean \frac{1}{\lambda}. To see this, for T_1>t to happen, there must be no events occurring in the interval [0,t]. Thus, P[T_1>t] is identical to P[T_1>t]=P[N(t)=0]=e^{-\lambda t}. This means that T_1 has an exponential distribution with rate \lambda. What about T_2 and T_3 and so on? After the first event had occurred, we can reset the counting process to count the events starting at time T_1. Then the time until the next occurrence is also an exponential random variable with rate \lambda. Consequently, all the interarrival times T_n are exponential random variables with the same rate \lambda. Furthermore, by the discussion in the preceding paragraph, the exponential interarrival times T_1, T_2, T_3,\cdots are independent.

As a consequence of the T_n being independent exponential random variables, the waiting time until the nth change S_n is a gamma random variable with shape parameter n and rate parameter \lambda. Note that S_n=T_1+\cdots+T_n and that independent sum of n identical exponential distribution has a gamma distribution with parameters n and \lambda, which is the identical exponential rate parameter.

The connection between exponential/gamma and the Poisson process provides an expression of the CDF and survival function for the gamma distribution when the shape parameter is an integer. Specifically, the following shows the survival function and CDF of the waiting time S_n as well as the density.

    \displaystyle P(S_n>t)=\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}

    \displaystyle P(S_n \le t)=1-\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}=\sum \limits_{k=n}^{\infty} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}

    \displaystyle f_{S_n}(t)=\frac{1}{(n-1)!} \ \lambda^n \ t^{n-1} \ e^{-\lambda t}

The key in establishing the survival \displaystyle P(S_n>t) is that the waiting time S_n is intimately related to N(t), which has a Poisson distribution with mean \lambda t. For S_n>t to happen, there can be at most n events occurring prior to time t, i.e. N(t) \le t.

_______________________________________________________________________________________________

Reversing the Poisson Process

The preceding discussion shows that a Poisson process has independent exponential waiting times between any two consecutive events and gamma waiting time between any two events. Starting with a collection \left\{N(t): t \ge 0 \right\} of Poisson counting random variables that satisfies the three axioms described above, it can be shown that the sequence of interarrival times are independent exponential random variables with the same rate parameter as in the given Poisson process. Interestingly, the process can also be reversed, i.e. given a sequence of independent and identically distributed exponential distributions, each with rate \lambda, a Poisson process can be generated.

To see this, let T_1,T_2,T_3,\cdots be a sequence of independent and identically distributed exponential random variables with rate parameter \lambda. Mathematically, the T_n are just independent and identically distributed exponential random variables. Now think of them as the interarrival times between consecutive events. So the first event occurs at time S_1=T_1 and the second event occurs at time S_2=T_1+T_2 and so on. In general, the nth event occurs at time S_n=T_1+\cdots+T_n. More specifically, the counting process is \left\{N(t): t \ge 0 \right\} where N(t) is defined below:

    N(t)=\text{max} \left\{n: S_n=T_1+\cdots+T_n \le t \right\}

For N(t)=n to happen, it must be true that S_n=T_1+\cdots+T_n \le t and S_{n+1}=T_1+\cdots+T_n+T_{n+1} > t. The probability P[N(t)=n] is then

    \displaystyle \begin{aligned}P[N(t)=n]&=P[S_{n+1} > t \text{ and } S_n \le t]\\&=P[S_{n+1} > t \text{ and } (\text{ not } S_n > t)] \\&=P[S_{n+1} > t]-P[S_n > t] \\&=\sum \limits_{k=0}^{n} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}-\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!} \\&=\frac{e^{-\lambda t} \ (\lambda t)^n}{n!} \end{aligned}

The above derivation shows that the counting variable N(t) is a Poisson random variable with mean \lambda t. The derivation uses the gamma survival function derived earlier. On the other hand, because of the memoryless property, T_1-s,T_2-s,T_3-s,\cdots are also independent exponential random variables with the same rate \lambda. If the counting of events starts at a time s rather than at time 0, the counting would be based on T_k-s,T_{k+1}-s,T_{k+3}-s,\cdots for some k. By the same argument, N(s+t)-N(s) would be Poisson with mean \lambda t.

We have just established that the resulting counting process from independent exponential interarrival times T_n has stationary increments. The resulting counting process has independent increments too. This is because the interarrival times T_n are independent and that the interarrival times T_n are also memoryless.

From a mathematical point of view, a sequence of independent and identically distributed exponential random variables leads to a Poisson counting process. On the other hand, if random events occur in such a manner that the times between two consecutive events are independent and identically and exponentially distributed, then such a process is a Poisson process. This characterization gives another way work with Poisson processes.

_______________________________________________________________________________________________

Examples

Example 1
Suppose that the time until the next departure of a bus at a certain bus station is exponentially distributed with mean 10 minutes. Assume that the times in between consecutive departures at this bus station are independent.

  • Tom and his friend Mike are to take a bus trip together. Tom arrives at the bus station at 12:00 PM and is the first one to arrive. Mike arrives at the bus stop at 12:30 PM. They will board the first bus to depart after the arrival of Mike. What is the the probability that zero buses depart from this bus station while Tom is waiting for Mike?
  • Answer the same question for one bus, and two buses?
  • What is the probability that there are at least three buses leaving the station while Tom is waiting?

Because the inter-departure times are independent and exponential with the same mean, the random events (bus departures) occur according to a Poisson process with rate \frac{1}{10} per minute, or 1 bus per 10 minutes. The number of bus departures in a 30-minute period is a Poisson random variable with mean 3 (per 30 minutes). Let N be the number of buses leaving the bus station between 12:00 PM and 12:30 PM. Thus the answers are:

    P[N=0]=e^{-3}=0.05

    P[N=1]=e^{-3} \times 3=0.149

    P[N=2]=e^{-3} \times \frac{3^2}{2}=0.224

    P[N \ge 3]=1-P[n=0]-P[n=1]-P[n=2]=0.5768 \square

Example 2
Taxi arrives at a certain street corner according to a Poisson process at the rate of two taxi for every 15 minutes. Suppose that you are waiting for a taxi at this street corner and you are third in line. Assume that the people waiting for taxi do not know each other and each one will have his own taxi. What is the probability that you will board a taxi within 30 minutes?

The number of arrivals of taxi in a 30-minute period has a Poisson distribution with a mean of 4 (per 30 minutes). If there are at least 3 taxi arriving, then you are fine. Let N be the number of arrivals of taxi in a 30-minute period. The answer is

    \displaystyle \begin{aligned}P[N \ge 3]&=1-P[N=0] -P[N=1]-P[N=2]\\&=1-e^{-4}-e^{-4} \times 4-e^{-4} \times \frac{4^2}{2} \\&=1-0.2381 \\&=0.7619 \end{aligned}

_______________________________________________________________________________________________

Remarks

In this post, we present a view of the exponential distribution through the view point of the Poisson process. Any counting process that satisfies the three axioms of a Poisson process has independent and exponential waiting time between any two consecutive events and gamma waiting time between any two events. On the other hand, given a sequence of independent and identically distributed exponential interarrival times, a Poisson process can be derived. The memoryless property of the exponential distribution plays a central role in the interplay between Poisson and exponential.

_______________________________________________________________________________________________
\copyright \ 2016 - \text{Dan Ma}

The gamma distribution from the point of view of a Poisson process

In the previous post, the gamma distribution is defined from the gamma function. This post shows that the gamma distribution can arise from a Poisson process.

_______________________________________________________________________________________________

The Poisson Process

Consider an experiment in which events that are of interest occur at random in a time interval. The goal here is to derive two families of random variables, one continuous and one discrete. Starting at time 0, record the time of the occurrence of the first event. Then record the time at which the second random event occurs and so on (these are the continuous random variables). Out of these measurements, we can derive discrete random variables by counting the number of random events in a fixed time interval.

The recording of times of the occurrences of the random events is like placing markings on a time line to denote the arrivals of the random events. We are interested in counting the number of markings in a fixed interval. We are also interested in measuring the length from the starting point to the first marking and to the second marking and so on. Because of this interpretation, the random process discussed here can also describe random events occurring along a spatial interval, i.e. intervals in terms of distance or volume or other spatial measurements.

A Poisson process is a random process described above in which several criteria are satisfied. We show that in a Poisson process, the number of occurrences of random events in a fixed time interval follows a Poisson distribution and the time until the nth random event follows a Gamma distribution.

A good example of a Poisson process is the well known experiment in radioactivity conducted by Rutherford and Geiger in 1910. In this experiment, \alpha-particles were emitted from a polonium source and the number of \alpha-particles were counted during an interval of 7.5 seconds (2,608 many such time intervals were observed). In these 2,608 intervals, a total of 10,097 particles were observed. Thus the mean count per period of 7.5 seconds is 10097 / 2608 = 3.87.

In the Rutherford and Geiger experiment in 1910, a random event is the observation of an \alpha-particle. The random events occur at an average of 3.87 per unit time interval (7.5 seconds).

One of the criteria in a Poisson process is that in a very short time interval, the chance of having more than one random event is essentially zero. So either one random event will occur or none will occur in a very short time interval. Considering the occurrence of a random event as a success, there is either a success or a failure in a very short time interval. So a very short time interval in a Poisson process can be regarded as a Bernoulli trial.

The second criterion is that the experiment remains constant over time. Specifically this means that the probability of a random event occurring in a given subinterval is proportional to the length of that subinterval and not on where the subinterval is in the original interval. Any counting process that satisfies this criterion is said to possess stationary increments. For example, in the 1910 radioactivity study, \alpha-particles were emitted at the rate of \lambda= 3.87 per 7.5 seconds. So the probability of one \alpha-particle emitted from the radioactive source in a one-second interval is 3.87/7.5 = 0.516. Then the probability of observing one \alpha-particle in a half-second interval is 0.516/2 = 0.258. For a quarter-second interval, the probability is 0.258/2 = 0.129. So if we observe half as long, it will be half as likely to observe the occurrence of a random event. On the other hand, it does not matter when the quarter-second subinterval is, whether at the beginning or toward the end of the original interval of 7.5 seconds.

The third criterion is that non-overlapping subintervals are mutually independent in the sense that what happens in one subinterval (i.e. the occurrence or non-occurrence of a random event) will have no influence on the occurrence of a random event in another subinterval. Any counting process that satisfies this criterion is said to possess independent increments. In the Rutherford and Geiger experiment, the observation of one particle in one half-second period does not imply that a particle will necessarily be observed in the next half-second.

To summarize, the following are the three criteria of a Poisson process:

    Suppose that on average \lambda random events occur in a time interval of length 1.

    1. The probability of having more than one random event occurring in a very short time interval is essentially zero.
    2. For a very short subinterval of length \frac{1}{n} where n is a sufficiently large integer, the probability of a random event occurring in this subinterval is \frac{\lambda}{n}.
    3. The numbers of random events occurring in non-overlapping time intervals are independent.

_______________________________________________________________________________________________

The Poisson Distribution

We are now ready to derive the Poisson distribution from a Poisson random process.

Consider random events generated in a Poisson process and let Y be the number of random events observed in a unit time interval. Break up the unit time interval into n non-overlapping subintervals of equal size where n is a large integer. Each subinterval can have one or no random event. The probability of one random event in a subinterval is \lambda/n. The subintervals are independent. In other words, the three criteria of a Poisson process described above ensure that the n subintervals are independent Bernoulli trials. As a result, the number of events occurring in these n subintervals is a binomial distribution with n trials and probability of success \lambda/n. This binomial distribution is an approximation of the random variable Y. The binomial distribution can get more and more granular. The resulting limit is a Poisson distribution, which coincides with the distribution for Y. The fact that the Poisson distribution is the limiting case of the binomial distribution is discussed here and here.

It follows that Y follows the Poisson distribution with mean \lambda. The following is the probability function.

    \displaystyle P(Y=y)=\frac{e^{-\lambda} \ \lambda^y}{y!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ y=0,1,2,\cdots

In the 1910 radioactivity study, the number of \alpha-particles observed in a 7.5-second period has a Poisson distribution with the mean of \lambda=3.87 particles per 7.5 seconds.

Sometimes it may be necessary to count the random events not in a unit time interval but in a smaller or larger time interval of length t. In a sense, the new unit time is t and the new average rate of the Poisson process is then \lambda t. Then the idea of taking granular binomial distributions will lead to a Poisson distribution. Let Y_t be the number of occurrences of the random events in a time interval of length t. Then Y_t follows a Poisson distribution with mean \lambda t. The following is the probability function.

    \displaystyle P(Y_t=x)=\frac{e^{-\lambda t} \ (\lambda t)^y}{y!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ y=0,1,2,\cdots

In the 1910 radioactivity study, the number of \alpha-particles observed in a 3.75-second period has a Poisson distribution with the mean of \lambda=3.87/2=1.935 particles per 3.75 seconds.

_______________________________________________________________________________________________

The Gamma Distribution as Derived from a Poisson Process

With the Poisson process and Poisson distribution properly set up and defined, we can now derive the gamma distribution. As before, we work with a Poisson process in which the random events arrive at an average rate of \lambda per unit time. Let W_1 be the waiting time until the occurrence of the first random event, W_2 be the waiting time until the occurrence of the second random event and so on. First examine the random variable W_1.

Consider the probability P(W_1>t). The event W_1>t means that the first random event takes place after time t. This means that there must be no occurrence of the random event in question from time 0 to time t. It follows that

    P(W_1>t)=P(Y_t=0)=e^{-\lambda t}

As a result, P(W_1 \le t)=1-e^{-\lambda t}, which is the cumulative distribution of W_1. Taking the derivative, the probability density function of W_1 is f_{W_1}(t)=\lambda e^{-\lambda t}. This is the density function of the exponential distribution with mean \frac{1}{\lambda}. Recall that \lambda is the rate of the Poisson process, i.e. the random events arrive at the mean rate of \lambda per unit time. Then the mean time between two consecutive events is \frac{1}{\lambda}.

Consider the probability P(W_2>t). The event W_2>t means that the second random event takes place after time t. This means that there can be at most one occurrence of the random events in question from time 0 to time t. It follows that

    P(W_2>t)=P(Y_t=0)+P(Y_t=1)=e^{-\lambda t}+\lambda t \ e^{-\lambda t}

As a result, P(W_2 \le t)=1-e^{-\lambda t}-\lambda t \ e^{-\lambda t}, which is the cdf of the waiting time W_2. Taking the derivative, the probability density function of W_2 is f_{W_2}(t)=\lambda^2 \ t \ e^{-\lambda t}. This is the density function of the gamma distribution with shape parameter 2 and rate parameter \lambda

By the same reasoning, the waiting time until the n^{th} random event, W_n, follows a gamma distribution with shape parameter n and rate parameter \lambda. The survival function, cdf and the density function are:

    \displaystyle P(W_n>t)=\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}

    \displaystyle P(W_n \le t)=1-\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}=\sum \limits_{k=n}^{\infty} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}

    \displaystyle f_{W_n}(t)=\frac{1}{(n-1)!} \ \lambda^n \ t^{n-1} \ e^{-\lambda t}

The survival function P(W_n>t) is identical to P(Y_t \le n-1). The equivalence is through the translation: the event W_n>t is equivalent to the event that there can be at most n-1 random events occurring from time 0 to time t.

Example 1
Let’s have a quick example of calculation for the gamma distribution. In the study by Rutherford and Geiger in 1910, the average rate of arrivals of \alpha-particles is 3.87 per 7.5-second period, giving the average rate of 0.516 particles per second. On average, it takes 0.516^{-1} = 1.94 seconds to wait for the next particle. The probability that it takes more than 3 seconds of waiting time for the first particle to arrive is e^{-0.516 (3)} = 0.213.

How long would it take to wait for the second particle? On average it would take 2 \cdot 0.516^{-1} = 3.88 seconds. The probability that it takes more than 5 seconds of waiting time for the second particle to arrive is

    e^{-0.516 (5)}+0.516 \cdot 5 e^{-0.516 (5)}=3.58 \cdot e^{-2.58} = 0.271

_______________________________________________________________________________________________

Remarks

The above discussion shows that the gamma distribution arises naturally from a Poisson process, a random experiment that satisfies three assumptions that deal with independence and uniformity in time. The gamma distribution derived from a Poisson process has two parameters n and \lambda where n is a positive integer and is the shape parameter and \lambda is the rate parameter. If the random variable W follows this distribution, its pdf is:

    \displaystyle f_W(w)=\frac{1}{(n-1)!} \ \lambda^n \ w^{n-1} \ e^{-\lambda w} \ \ \ \ \ \ \ \ \ \ \ \ w>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

The above pdf can be interpreted as the density function for the waiting time until the arrival of the nth random event in a Poisson process with an average rate of arrivals at \lambda per unit time. The density function may be derived from an actual Poisson process or it may be just describing some random quantity that has nothing to do with any Poisson process. But the Poisson process interpretation is still useful. One advantage of the Poisson interpretation is that the survival function and the cdf would have an expression in closed form.

    \displaystyle P(W>w)=\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda w} \ (\lambda w)^k}{k!} \ \ \ \ \ \ \ \ \ \ \ \ w>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

    \displaystyle \begin{aligned} P(W \le w)&=1-\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda w} \ (\lambda w)^k}{k!} \\&=\sum \limits_{k=n}^{\infty} \frac{e^{-\lambda w} \ (\lambda w)^k}{k!} \ \ \ \ \ \ \ \ \ \ \ \ w>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3) \end{aligned}

In the Poisson process interpretation, P(W>w) is the probability that the nth random event occurs after time w. This means that in the interval (0, w), there are at most n-1 random events. Thus the gamma survival function is identical to the cdf of a Poisson distribution. Even when W is simply a model of some random quantity that has nothing to do with a Poisson process, such interpretation can still be used to derive the survival function and the cdf of such a gamma distribution.

The gamma distribution described in the density function (1) has a shape parameter that is a positive integer. This special case of the gamma distribution sometimes go by the name Erlang distribution and is important in queuing theory.

In general the shape parameter does not have to be integers; it can be any positive real number. For the more general gamma distribution, see the previous post.

_______________________________________________________________________________________________
\copyright \ 2016 - \text{Dan Ma}