# The hyperexponential and hypoexponential distributions

This post continues with the discussion on the exponential distribution. The previous posts on the exponential distribution are an introduction, a post on the relation with the Poisson process and a post on more properties. This post discusses the hyperexponential distribution and the hypoexponential distribution.

Suppose $X_1,X_2,\cdots,X_n$ are independent exponential random variables. If they are identically distributed where the common mean is $\frac{1}{\lambda}$ (the common rate is $\lambda$), then the sum $X_1+X_2+\cdots + X_n$ follows the gamma distribution with shape parameter $n$ and rate parameter $\lambda$. Since the shape parameter in this case is an integer $n$, the gamma distribution is also called the Erlang distribution, a sub family of the gamma family.

What if the exponential $X_1,X_2,\cdots,X_n$ are not identically distributed? Suppose that the rate parameter of $X_i$ is $\lambda_i$ such that $\lambda_i \ne \lambda_j$ for $i \ne j$. Then the sum $X_1+X_2+\cdots + X_n$ is said to follow a hypoexponential distribution.

To contrast, a hyperexponential distribution is also a “sum” of independent exponential random variables $X_1,X_2,\cdots,X_n$. The notion of “sum” is not the arithmetic sum but rather is the notion of a mixture. The random variable $X$ can be one of the $n$ independent exponential random variables $X_1,X_2,\cdots,X_n$ such that $X$ is $X_i$ with probability $p_i$ with $p_1+\cdots+p_n=1$. Such a random variable $X$ is said to follow a hyperexponential distribution.

The Erlang distribution, the hypoexponential distribution and the hyperexponential distribution are special cases of phase-type distributions that are useful in queuing theory. These distributions are models for interarrival times or service times in queuing systems. They are obtained by breaking down the total time into a number of phases, each having an exponential distribution, where the parameters of the exponential distributions may be identical or may be different. Furthermore, the phases may be in series or in parallel (or both). The phases are in series means that the phases are run sequentially, one at a time. The Erlang distribution is one example of a phase-type distribution where the $n$ phases are in series and that the phases have the same parameter for the expoenential distributions. The hypoexponential distribution is an example of a phase-type distribution where the $n$ phases are in series and that the phases have distinct exponential parameters. The hyperexponential distribution is an example of a phase-type distribution where the phases are in parallel, which means that the system randomly selects one of the phases to process each time according to specified probabilities.

The Erlang distribution is a special case of the gamma distribution. For basic properties of the Erlang distribution, see the previous posts on the gamma distribution, starting with this post. The remainder of the post focuses on some basic properties of the hyper and hypo exponential distributions.

_______________________________________________________________________________________________

The Hyperexponential Distribution

The hyperexponential distribution is the mixture of a set of independent exponential distributions. To generate a hyperexponential distribution, let $X_1,X_2,\cdots,X_n$ be independent exponential random variables with rates parameters $\lambda_1, \lambda_2,\cdots,\lambda_n$, and with weights $p_1,p_2,\cdots,p_n$, respectively. Then $X$ is a hyperexponential random variable if $X$ is $X_i$ with probability $p_i$. Thus the hyperexponential distribution has a variable number of parameters. There are $2 \times n$ many parameters, $n$ for the exponential rate parameters and $n$ for their respective weights. We use the same notations in the discussion of the hyperexponential distribution that follows.

Basic facts about mixture distributions can give a great deal of information on the hyperexponential distribution. For example, many distributional quantities of a mixture distribution are obtained by taking weighted averages of the corresponding quantities of the individual components. Specifically, the density function, the cumulative distribution function (CDF), the survival function, along with the mean, the higher moments can all be obtained by taking weighted averages. However, the the variance of a mixture is not the weighted average of the individual variances. The following shows the probability density function $f_X(x)$, the survival function $S_X(x)$ and the cumulative distribution function $F_X(x)$ of the hyperexponential distribution.

$\displaystyle f_X(x)=p_1 \cdot (\lambda_1 e^{-\lambda_1 \ x})+p_2 \cdot (\lambda_2 e^{-\lambda_2 \ x})+\cdots+p_n \cdot (\lambda_n e^{-\lambda_n \ x}) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

$\displaystyle S_X(x)=p_1 \cdot e^{-\lambda_1 \ x}+p_2 \cdot e^{-\lambda_2 \ x}+\cdots+p_n \cdot e^{-\lambda_n \ x} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

$\displaystyle F_X(x)=p_1 \cdot (1- e^{-\lambda_1 \ x})+p_2 \cdot (1- e^{-\lambda_2 \ x})+\cdots+p_n \cdot (1- e^{-\lambda_n \ x}) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)$

The above three distributional quantities are the weighted averages of the corresponding quantities of the individual exponential distributions. The hyperexponential distribution is sometimes called a finite mixture since there are a finite number of components in the weighted average. The raw moments of the hyperexponential distribution are also weighted averages of the corresponding exponential raw moments. The following shows the mean and the higher moments $E[X^k]$ where $k$ is any positive integer.

$\displaystyle E[X]=p_1 \cdot \biggl(\frac{1}{\lambda_1} \biggr)+p_2 \cdot \biggl(\frac{1}{\lambda_2} \biggr)+\cdots+p_n \cdot \biggl(\frac{1}{\lambda_n} \biggr) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)$

$\displaystyle E[X^k]=p_1 \cdot \biggl(\frac{k!}{\lambda_1^k} \biggr)+p_2 \cdot \biggl(\frac{k!}{\lambda_2^k} \biggr)+\cdots+p_n \cdot \biggl(\frac{k!}{\lambda_n^k} \biggr) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (5)$

Interestingly, the hyperexponential variance is not the weighted average of the individual variances.

$\displaystyle Var[X] \ne \sum \limits_{i=1}^n \ p_i \biggl(\frac{1}{\lambda_i^2} \biggr) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (6)$

The following gives the correct variance of the hyperexponential distribution.

$\displaystyle Var[X]=E[X^2]-E[X]^2=\sum \limits_{i=1}^n \ p_i \biggl(\frac{2}{\lambda_i^2} \biggr)-\biggl(\sum \limits_{i=1}^n \ p_i \frac{1}{\lambda_i} \biggr)^2 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (7)$

It is instructive to rearrange the above variance as follows:

\displaystyle \begin{aligned} Var[X]&=\sum \limits_{i=1}^n \ p_i \biggl(\frac{2}{\lambda_i^2} \biggr)-\biggl(\sum \limits_{i=1}^n \ p_i \frac{1}{\lambda_i} \biggr)^2 \\&=\sum \limits_{i=1}^n \ p_i \biggl(\frac{1}{\lambda_i^2} \biggr)+\biggl[ \sum \limits_{i=1}^n \ p_i \biggl(\frac{1}{\lambda_i^2} \biggr)-\biggl(\sum \limits_{i=1}^n \ p_i \frac{1}{\lambda_i} \biggr)^2 \biggr] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (8) \end{aligned}

Compare $(8)$ with the right hand side of $(6)$. The difference is the content within the squared brackets in $(8)$, which represents the added uncertainty in a mixture. Note that the content within the squared brackets is a positive value and is the variance of the random variable that has value $\frac{1}{\lambda_i}$ with probability $p_i$. In other words, the content inside the squared brackets is the variance of the conditional exponential means (see the random variable $W$ described below).

The derivation in $(8)$ shows that the hyperexponential variance is more than the weighted average of the individual exponential variances. This is because the mixing of the exponential random variables has added uncertainty. There are two levels of uncertainty in a mixture – the uncertain in each component (e.g. in each exponential distribution) and the uncertainty in not knowing which component will be realized, which is the added uncertainty. The content inside the squared brackets quantifies the added uncertainty. See here for a more detailed discussion on the variance of a mixture.

The coefficient of variation of a probability distribution is the ratio of its standard deviation to its mean (we only consider the case where the mean is positive). It is sometimes called the relative standard deviation and is a standardized measure dispersion of a probability distribution. For the exponential distribution, the coefficient of variation is always one. For the hyperexponential distribution, the coefficient of variation is always more than one. To see why, consider the random variable $W$ as defined below (note that this is the same random variable that is used above to explain the added undertainty in a mixture).

$\displaystyle W = \left\{ \begin{array}{cc} \displaystyle \frac{1}{\lambda_1} & \ \ \ \ \text{probability } p_1 \\ \text{ } & \text{ } \\ \displaystyle \frac{1}{\lambda_2} &\ \ \ \ \text{probability } p_2 \\ \text{ } & \text{ } \\ \cdots & \cdots \\ \text{ } & \text{ } \\ \displaystyle \frac{1}{\lambda_n} &\ \ \ \ \text{probability } p_n \end{array} \right.$

The variance of $W$ is given by the following:

$\displaystyle Var[W]=\sum \limits_{i=1}^n \ p_i \biggl(\frac{1}{\lambda_i^2} \biggr)-\biggl(\sum \limits_{i=1}^n \ p_i \frac{1}{\lambda_i} \biggr)^2$

We now rearrange $Var[W]$ to show that the coefficient of variation of the hyperexponential distribution is more than one.

\displaystyle \begin{aligned} &\sum \limits_{i=1}^n \ p_i \biggl(\frac{1}{\lambda_i^2} \biggr)-\biggl(\sum \limits_{i=1}^n \ p_i \frac{1}{\lambda_i} \biggr)^2 > 0 \\&\text{ } \\&\sum \limits_{i=1}^n \ p_i \biggl(\frac{2}{\lambda_i^2} \biggr)-2 \times \biggl(\sum \limits_{i=1}^n \ p_i \frac{1}{\lambda_i} \biggr)^2 > 0 \\&\text{ } \\&\sum \limits_{i=1}^n \ p_i \biggl(\frac{2}{\lambda_i^2} \biggr)-\biggl(\sum \limits_{i=1}^n \ p_i \frac{1}{\lambda_i} \biggr)^2 > \biggl(\sum \limits_{i=1}^n \ p_i \frac{1}{\lambda_i} \biggr)^2 \\&\text{ } \\&Var[X] > E[X]^2 \\&\text{ } \\& \sigma_X > E[X] \\&\text{ } \\& \frac{\sigma_X}{E[X]} > 1 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (9) \end{aligned}

The variance $Var[W]$ is identical to the content within the squared brackets in $(8)$. The above derivation shows that the variance of the hyperexponential distribution is always greater than the mean, hence making the coefficient of variation always more than one.

The following is the failure rate of the hyperexponential distribution.

$\displaystyle \mu_X(t)=\frac{f_X(t)}{S_X(t)}=\frac{\sum \limits_{i=1}^n \ p_i \cdot \lambda_i e^{-\lambda_i \ t}}{\sum \limits_{i=1}^n \ p_i \cdot e^{-\lambda_i \ t}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (10)$

The failure rate (also called the hazard rate) can be interpreted as the rate of failure at the instant right after the life has survived to age $t$. If the lifetime of a system is modeled by an exponential distribution, then the failure rate is constant, which is another way to state the memoryless property of the exponential distribution. Since the exponential distribution is the only continuous distribution with the memoryless property, the failure rate in $(10)$ is obviously not constant. However, as the life goes up in age, the failure is approaching a constant failure rate as shown by the following:

$\displaystyle \lim \limits_{t \rightarrow \infty} \ \mu_X(t)=\lambda_k \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (11)$

where $\lambda_k=\text{min}(\lambda_1,\cdots,\lambda_n)$. That is, as the life (or system or process) goes up in age, the hyperexponential failure rate approaches that of the exponential component with the smallest failure rate (the longest lifetime). This makes intuitive sense. As the system or process survives to a high age, it is more likely the hyperexponential random variable is the exponential component that lasts the longest.

To see why $(11)$ is true, multiple both the numerator and the denominator in $(10)$ by $e^{\lambda_k t}$. Then the terms with $i=k$ become $p_k \lambda_k$ (in the numerator) and $p_k$ (in the denominator). As $t \rightarrow \infty$, the terms with $i \ne k$ go to zero. As a result, the ratio goes to $\lambda_k$.

Though not discussed here, several distributional quantities that are based on moments can be calculated, properties such as skewness and kurtosis, using the moments obtained in $(5)$.

_______________________________________________________________________________________________

The Hypoexponential Density Function

The remainder of the post discusses the basic properties of the hypoexponential distribution. We first examine the probability density function of a hypoexponential distribution. Since such a distribution is an independent sum, the concept of convolution can be used. The following example show how it is done for the hypoexponential distribution with 2 phases. It turns out that the density function in general has a nice form.

Example 1
Suppose the rate parameters are $\lambda_1$ and $\lambda_2$ for the two independent exponential random variables $X_1$ and $X_2$ where $\lambda_1 \ne \lambda_2$. Determine the density function of the independent sum $X=X_1+X_2$.

The integral for the density function is:

\displaystyle \begin{aligned} f_{X_1+X_2}(x)&=\int_0^x \ f_{X_1}(t) \ f_{X_2}(x-t) \ dt \\&=\int_0^x \ \lambda_1 e^{-\lambda_1 \ t} \ \lambda_2 e^{-\lambda_2 \ (x-t)} \ dt \\&=\lambda_1 \ \lambda_2 \ e^{-\lambda_2 \ x} \ \int_0^x \ e^{-(\lambda_1-\lambda_2) \ t} \ dt \\&=\lambda_1 \ \lambda_2 \ e^{-\lambda_2 \ x} \ \frac{1}{\lambda_1-\lambda_2} \ (1-e^{-(\lambda_1-\lambda_2) \ x}) \\&=\frac{\lambda_1}{\lambda_1-\lambda_2} \ (\lambda_2 \ e^{-\lambda_2 \ x}) + \frac{\lambda_2}{\lambda_2-\lambda_1} \ (\lambda_1 \ e^{-\lambda_1 \ x}) \end{aligned}

Note that the density $f_{X_1+X_2}(x)$ appears to be a weighted average of the two individual exponential density functions, even though one of the weights is negative. $\square$

Example 2
Suppose that the hypoexponential distribution has three phases. The parameters of the three phases are $\lambda_1$, $\lambda_2$ and $\lambda_3$. Determine the density of the independent sum $X=X_1+X_2+X_3$.

The idea is to apply the convolution of the density obtained in Example 1 and the exponential density $f_{X_3}(x)=\lambda_3 \ e^{-\lambda_3 \ x}$. The convolution integral is:

$\displaystyle f_{X_1+X_2+X_3}(x)=\int_0^x \ f_{X_1+X_2}(t) \ f_{X_3}(x-t) \ dt$

The result of the integral is:

\displaystyle \begin{aligned} f_{X_1+X_2+X_3}(x)&=(\lambda_1 \ e^{-\lambda_1 \ x}) \ \biggl( \frac{\lambda_2}{\lambda_2-\lambda_1} \times \frac{\lambda_3}{\lambda_3-\lambda_1} \biggr) \\& \ \ \ +(\lambda_2 \ e^{-\lambda_2 \ x}) \ \biggl( \frac{\lambda_1}{\lambda_1-\lambda_2} \times \frac{\lambda_3}{\lambda_3-\lambda_2} \biggr) \\& \ \ \ +(\lambda_3 \ e^{-\lambda_3 \ x}) \ \biggl( \frac{\lambda_1}{\lambda_1-\lambda_3} \times \frac{\lambda_2}{\lambda_2-\lambda_3} \biggr) \end{aligned}

The density function $f_{X_1+X_2+X_3}(x)$ also appears to be a weighted average of the individual exponential density functions with some of the weights being negative. $\square$

Now, the general form of the density function for a hypoexponential density function.

$\displaystyle f_{X_1+X_2+\cdots+X_{n}}(x)=\sum \limits_{i=1}^n \ W_{n,i} \ (\lambda_i \ e^{-\lambda_i \ x}) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (12)$

where the weight $W_{n,i}$ is defined as:

$\displaystyle W_{n,i}=\prod \limits_{j \ne i} \frac{\lambda_j}{\lambda_j-\lambda_i}$

The density function in the general case of $n$ phases can be established by an induction proof.

_______________________________________________________________________________________________

More Hypoexponential Properties

As before, the hypoexponential random variable is the sum $X=X_1+X_2+\cdots+X_n$ where $X_1,X_2,\cdots,X_n$ are independent exponential random variables with $\lambda_i$ being the rate parameter for $X_i$. Since it is an independent sum, the mean and variance are easily obtained.

$\displaystyle E[X]=\frac{1}{\lambda_1}+\frac{1}{\lambda_2}+\cdots+\frac{1}{\lambda_2} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (13)$

$\displaystyle Var[X]=\frac{1}{\lambda_1^2}+\frac{1}{\lambda_2^2}+\cdots+\frac{1}{\lambda_2^2} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (14)$

The coefficient of variation is the ratio of the standard deviation to the mean. For the exponential distribution, the coefficient of variation is always 1. For the hypoexponential distribution, the coefficient of variation is always less than 1. The following is the square of the coefficient of variation for a hypoexponential random variable.

$\displaystyle \text{CV}^2=\frac{Var[X]}{E[X]^2}=\frac{\frac{1}{\lambda_1^2}+\frac{1}{\lambda_2^2}+\cdots+\frac{1}{\lambda_2^2}}{\biggl(\frac{1}{\lambda_1}+\frac{1}{\lambda_2}+\cdots+\frac{1}{\lambda_2}\biggr)^2}<1 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (15)$

$\displaystyle \text{CV}=\frac{\sigma_X}{E[X]}<1 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (16)$

Note that the quantity in the denominator in $(15)$ is always larger than the numerator, showing that $\text{CV}^2<1$. As a result, $\text{CV}<1$. In contrast, the coefficient of variation of the hyperexponential distribution is always greater than 1 (see $(9)$).

The survival function for the hypoexponential distribution has the following form:

$\displaystyle S_X(x)=S_{X_1+X_2+\cdots+X_{n}}(x)=\sum \limits_{i=1}^n \ W_{n,i} \ e^{-\lambda_i \ x} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (17)$

where the weights $W_{n,i}$ are defined in $(12)$. Then the CDF is $F_X(x)=1-S_X(x)$. As an example, the survival function for the hypoexponential distribution in Example 1 is

$\displaystyle S_X(x)=\frac{\lambda_1}{\lambda_1-\lambda_2} \ (e^{-\lambda_2 \ x}) + \frac{\lambda_2}{\lambda_2-\lambda_1} \ (e^{-\lambda_1 \ x})$

With the density function in $(12)$, several distributional quantities that are based on moments can be calculated, properties such as skewness and kurtosis.

The following is the failure rate of the hypoexponential distribution.

$\displaystyle \mu_X(t)=\frac{f_{X}(t)}{S_X(t)}=\frac{\sum \limits_{i=1}^n \ W_{n,i} \ (\lambda_i \ e^{-\lambda_i \ x})}{\sum \limits_{i=1}^n \ W_{n,i} \ e^{-\lambda_i \ x}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (18)$

The hypoexponential failure rate is obviously not a constant rate since only the exponential distribution has constant failure rate. However, as the system reaches high ages, the failure rate approaches that of the smallest exponential rate parameters that define the hypoexponential distribution. The same observation is made above in $(11)$, that is,

$\displaystyle \lim \limits_{t \rightarrow \infty} \mu_X(t)=\lambda_k \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (19)$

where $\lambda_k=\text{min}(\lambda_1,\cdots,\lambda_n)$. Suppose that the lifetime of a life or system is modeled by a hypoexponential distribution. Then when the system is in a high age, the failure rate is approximately the smallest rate parameter of the exponential distributions defining the hypoexponential lifetime. In other words, as the hypoexponentially distributed lifetime becomes large, the remaining lifetime of the system is approximately the exponential lifetime with a rate parameter equal to the smallest of the exponential rate parameters. The proof of $(19)$ is similar to the proof for $(11)$. Simply multiply the numerator and denominator of $(3)$ by $e^{\lambda_k}$. Then the terms with $i=k$ become constants. As $t \rightarrow \infty$, the terms with $i \ne k$ go to zero and the ratio goes to $\lambda_k$.

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$