A catalog of parametric severity models

Various parametric continuous probability models have been presented and discussed in this blog. The number of parameters in these models ranges from one to two, and in a small number of cases three. They are all potential candidates for models of severity in insurance applications and in other actuarial applications. This post highlights these models. The list presented here is not exhaustive; it is only a brief catalog. There are other models that are also suitable for actuarial applications but not accounted for here. However, the list is a good place to begin. This post also serves a navigation device (the table shown below contains links to the blog posts).

A Catalog

Many of the models highlighted here are related to gamma distribution either directly or indirectly. So the catalog starts with the gamma distribution at the top and then branches out to the other related models. Mathematically, the gamma distribution is a two-parameter continuous distribution defined using the gamma function. The gamma sub family includes the exponential distribution, Erlang distribution and chi-squared distribution. These are distributions that are gamma distributions with certain restrictions on the one or both of the gamma parameters. Other distributions are obtained by raising a distribution to a power. Others are obtained by mixing distributions.

Here’s a listing of the models. Click on the links to find out more about the distributions.

……Derived From ………………….Model
Gamma function
Gamma sub families
Independent sum of gamma
Exponentiation
Raising to a power Raising exponential to a positive power

Raising exponential to a power

Raising gamma to a power

Raising Pareto to a power

Burr sub families
Mixture
Others

The above table categorizes the distributions according to how they are mathematically derived. For example, the gamma distribution is derived from the gamma function. The Pareto distribution is mathematically an exponential-gamma mixture. The Burr distribution is a transformed Pareto distribution, i.e. obtained by raising a Pareto distribution to a positive power. Even though these distributions can be defined simply by giving the PDF and CDF, knowing how their mathematical origins informs us of the specific mathematical properties of the distributions. Organizing according to the mathematical origin gives us a concise summary of the models.

$\text{ }$

$\text{ }$

From a mathematical standpoint, the gamma distribution is defined using the gamma function.

$\displaystyle \Gamma(\alpha)=\int_0^\infty t^{\alpha-1} \ e^{-t} \ dt$

In this above integral, the argument $\alpha$ is a positive number. The expression $t^{\alpha-1} \ e^{-t}$ in the integrand is always positive. The area in between the curve $t^{\alpha-1} \ e^{-t}$ and the x-axis is $\Gamma(\alpha)$. When this expression is normalized, i.e. divided by $\Gamma(\alpha)$, it becomes a density function.

$\displaystyle f(t)=\frac{1}{\Gamma(\alpha)} \ t^{\alpha-1} \ e^{-t}$

The above function $f(t)$ is defined over all positive $t$. The integral of $f(t)$ over all positive $t$ is 1. Thus $f(t)$ is a density function. It only has one parameter, the $\alpha$, which is the shape parameter. Adding the scale parameter $\theta$ making it a two-parameter distribution. The result is called the gamma distribution. The following is the density function.

$\displaystyle f(x)=\frac{1}{\Gamma(\alpha)} \ \biggl(\frac{1}{\theta}\biggr)^\alpha \ x^{\alpha-1} \ e^{-\frac{x}{\theta}} \ \ \ \ \ \ \ x>0$

Both parameters $\alpha$ and $\theta$ are positive real numbers. The first parameter $\alpha$ is the shape parameter and $\theta$ is the scale parameter.

As mentioned above, many of the distributions listed in the above table is related to the gamma distribution. Some of the distributions are sub families of gamma. For example, when $\alpha$ are positive integers, the resulting distributions are called Erlang distribution (important in queuing theory). When $\alpha=1$, the results are the exponential distributions. When $\alpha=\frac{k}{2}$ and $\theta=2$ where $k$ is a positive integer, the results are the chi-squared distributions (the parameter $k$ is referred to the degrees of freedom). The chi-squared distribution plays an important role in statistics.

Taking independent sum of $n$ independent and identically distributed exponential random variables produces the Erlang distribution, a sub gamma family of distribution. Taking independent sum of $n$ exponential random variables, with pairwise distinct means, produces the hypoexponential distributions. On the other hand, the mixture of $n$ independent exponential random variables produces the hyperexponential distribution.

The Pareto distribution (Pareto Type II Lomax) is the mixture of exponential distributions with gamma mixing weights. Despite the connection with the gamma distribution, the Pareto distribution is a heavy tailed distribution. Thus the Pareto distribution is suitable for modeling extreme losses, e.g. in modeling rare but potentially catastrophic losses.

As mentioned earlier, raising a Pareto distribution to a positive power generates the Burr distribution. Restricting the parameters in a Burr distribution in a certain way will produces the paralogistic distribution. The table indicates the relationships in a concise way. For details, go into the blog posts to get more information.

Tail Weight

Another informative way to categorize the distributions listed in the table is through looking at the tail weight. At first glance, all the distributions may look similar. For example, the distributions in the table are right skewed distributions. Upon closer look, some of the distributions put more weights (probabilities) on the larger values. Hence some of the models are more suitable for models of phenomena with significantly higher probabilities of large or extreme values.

When a distribution significantly puts more probabilities on larger values, the distribution is said to be a heavy tailed distribution (or said to have a larger tail weight). In general tail weight is a relative concept. For example, we say model A has a larger tail weight than model B (or model A has a heavier tail than model B). However, there are several ways to check for tail weight of a given distribution. Here are the four criteria.

Tail Weight Measure What to Look for
1 Existence of moments The existence of more positive moments indicates a lighter tailed distribution.
2 Hazard rate function An increasing hazard rate function indicates a lighter tailed distribution.
3 Mean excess loss function An increasing mean excess loss function indicates a heavier tailed distribution.
4 Speed of decay of survival function A survival function that decays rapidly to zero (as compared to another distribution) indicates a lighter tailed distribution.

Existence of moments
For a positive real number $k$, the moment $E(X^k)$ is defined by the integral $\int_0^\infty x^k \ f(x) \ dx$ where $f(x)$ is the density function of the distribution in question. If the distribution puts significantly more probabilities in the larger values in the right tail, this integral may not exist (may not converge) for some $k$. Thus the existence of moments $E(X^k)$ for all positive $k$ is an indication that the distribution is a light tailed distribution.

In the above table, the only distributions for which all positive moments exist are gamma (including all gamma sub families such as exponential), Weibull, lognormal, hyperexponential, hypoexponential and beta. Such distributions are considered light tailed distributions.

The existence of positive moments exists only up to a certain value of a positive integer $k$ is an indication that the distribution has a heavy right tail. All the other distributions in the table are considered heavy tailed distribution as compared to gamma, Weibull and lognormal. Consider a Pareto distribution with shape parameter $\alpha$ and scale parameter $\theta$. Note that the existence of the Pareto higher moments $E(X^k)$ is capped by the shape parameter $\alpha$. If the Pareto distribution is to model a random loss, and if the mean is infinite (when $\alpha=1$), the risk is uninsurable! On the other hand, when $\alpha \le 2$, the Pareto variance does not exist. This shows that for a heavy tailed distribution, the variance may not be a good measure of risk.

Hazard rate function
The hazard rate function $h(x)$ of a random variable $X$ is defined as the ratio of the density function and the survival function.

$\displaystyle h(x)=\frac{f(x)}{S(x)}$

The hazard rate is called the force of mortality in a life contingency context and can be interpreted as the rate that a person aged $x$ will die in the next instant. The hazard rate is called the failure rate in reliability theory and can be interpreted as the rate that a machine will fail at the next instant given that it has been functioning for $x$ units of time.

Another indication of heavy tail weight is that the distribution has a decreasing hazard rate function. On the other hand, a distribution with an increasing hazard rate function has a light tailed distribution. If the hazard rate function is decreasing (over time if the random variable is a time variable), then the population die off at a decreasing rate, hence a heavier tail for the distribution in question.

The Pareto distribution is a heavy tailed distribution since the hazard rate is $h(x)=\alpha/x$ (Pareto Type I) and $h(x)=\alpha/(x+\theta)$ (Pareto Type II Lomax). Both hazard rates are decreasing function.

The Weibull distribution is a flexible model in that when its shape parameter is $0<\tau<1$, the Weibull hazard rate is decreasing and when $\tau>1$, the hazard rate is increasing. When $\tau=1$, Weibull is the exponential distribution, which has a constant hazard rate.

The point about decreasing hazard rate as an indication of a heavy tailed distribution has a connection with the fourth criterion. The idea is that a decreasing hazard rate means that the survival function decays to zero slowly. This point is due to the fact that the hazard rate function generates the survival function through the following.

$\displaystyle S(x)=e^{\displaystyle -\int_0^x h(t) \ dt}$

Thus if the hazard rate function is decreasing in $x$, then the survival function will decay more slowly to zero. To see this, let $H(x)=\int_0^x h(t) \ dt$, which is called the cumulative hazard rate function. As indicated above, $S(x)=e^{-H(x)}$. If $h(x)$ is decreasing in $x$, $H(x)$ has a lower rate of increase and consequently $S(x)=e^{-H(x)}$ has a slower rate of decrease to zero.

In contrast, the exponential distribution has a constant hazard rate function, making it a medium tailed distribution. As explained above, any distribution having an increasing hazard rate function is a light tailed distribution.

The mean excess loss function
The mean excess loss is the conditional expectation $e_X(d)=E(X-d \lvert X>d)$. If the random variable $X$ represents insurance losses, mean excess loss is the expected loss in excess of a threshold conditional on the event that the threshold has been exceeded. Suppose that the threshold $d$ is an ordinary deductible that is part of an insurance coverage. Then $e_X(d)$ is the expected payment made by the insurer in the event that the loss exceeds the deductible.

Whenever $e_X(d)$ is an increasing function of the deductible $d$, the loss $X$ is a heavy tailed distribution. If the mean excess loss function is a decreasing function of $d$, then the loss $X$ is a lighter tailed distribution.

The Pareto distribution can also be classified as a heavy tailed distribution based on an increasing mean excess loss function. For a Pareto distribution (Type I) with shape parameter $\alpha$ and scale parameter $\theta$, the mean excess loss is $e(X)=d/(\alpha-1)$, which is increasing. The mean excess loss for Pareto Type II Lomax is $e(X)=(d+\theta)/(\alpha-1)$, which is also decreasing. They are both increasing functions of the deductible $d$! This means that the larger the deductible, the larger the expected claim if such a large loss occurs! If the underlying distribution for a random loss is Pareto, it is a catastrophic risk situation.

In general, an increasing mean excess loss function is an indication of a heavy tailed distribution. On the other hand, a decreasing mean excess loss function indicates a light tailed distribution. The exponential distribution has a constant mean excess loss function and is considered a medium tailed distribution.

Speed of decay of the survival function to zero
The survival function $S(x)=P(X>x)$ captures the probability of the tail of a distribution. If a distribution whose survival function decays slowly to zero (equivalently the cdf goes slowly to one), it is another indication that the distribution is heavy tailed. This point is touched on when discussing hazard rate function.

The following is a comparison of a Pareto Type II survival function and an exponential survival function. The Pareto survival function has parameters ($\alpha=2$ and $\theta=2$). The two survival functions are set to have the same 75th percentile, which is $x=2$. The following table is a comparison of the two survival functions.

$\displaystyle \begin{array}{llllllll} \text{ } &x &\text{ } & \text{Pareto } S_X(x) & \text{ } & \text{Exponential } S_Y(x) & \text{ } & \displaystyle \frac{S_X(x)}{S_Y(x)} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{ } &2 &\text{ } & 0.25 & \text{ } & 0.25 & \text{ } & 1 \\ \text{ } &10 &\text{ } & 0.027777778 & \text{ } & 0.000976563 & \text{ } & 28 \\ \text{ } &20 &\text{ } & 0.008264463 & \text{ } & 9.54 \times 10^{-7} & \text{ } & 8666 \\ \text{ } &30 &\text{ } & 0.00390625 & \text{ } & 9.31 \times 10^{-10} & \text{ } & 4194304 \\ \text{ } &40 &\text{ } & 0.002267574 & \text{ } & 9.09 \times 10^{-13} & \text{ } & 2.49 \times 10^{9} \\ \text{ } &60 &\text{ } & 0.001040583 & \text{ } & 8.67 \times 10^{-19} & \text{ } & 1.20 \times 10^{15} \\ \text{ } &80 &\text{ } & 0.000594884 & \text{ } & 8.27 \times 10^{-25} & \text{ } & 7.19 \times 10^{20} \\ \text{ } &100 &\text{ } & 0.000384468 & \text{ } & 7.89 \times 10^{-31} & \text{ } & 4.87 \times 10^{26} \\ \text{ } &120 &\text{ } & 0.000268745 & \text{ } & 7.52 \times 10^{-37} & \text{ } & 3.57 \times 10^{32} \\ \text{ } &140 &\text{ } & 0.000198373 & \text{ } & 7.17 \times 10^{-43} & \text{ } & 2.76 \times 10^{38} \\ \text{ } &160 &\text{ } & 0.000152416 & \text{ } & 6.84 \times 10^{-49} & \text{ } & 2.23 \times 10^{44} \\ \text{ } &180 &\text{ } & 0.000120758 & \text{ } & 6.53 \times 10^{-55} & \text{ } & 1.85 \times 10^{50} \\ \text{ } & \text{ } \\ \end{array}$

Note that at the large values, the Pareto right tails retain much more probabilities. This is also confirmed by the ratio of the two survival functions, with the ratio approaching infinity. Using an exponential distribution to model a Pareto random phenomenon would be a severe modeling error even though the exponential distribution may be a good model for describing the loss up to the 75th percentile (in the above comparison). It is the large right tail that is problematic (and catastrophic)!

Since the Pareto survival function and the exponential survival function have closed forms, We can also look at their ratio.

$\displaystyle \frac{\text{pareto survival}}{\text{exponential survival}}=\frac{\displaystyle \frac{\theta^\alpha}{(x+\theta)^\alpha}}{e^{-\lambda x}}=\frac{\theta^\alpha e^{\lambda x}}{(x+\theta)^\alpha} \longrightarrow \infty \ \text{ as } x \longrightarrow \infty$

In the above ratio, the numerator has an exponential function with a positive quantity in the exponent, while the denominator has a polynomial in $x$. This ratio goes to infinity as $x \rightarrow \infty$.

In general, whenever the ratio of two survival functions diverges to infinity, it is an indication that the distribution in the numerator of the ratio has a heavier tail. When the ratio goes to infinity, the survival function in the numerator is said to decay slowly to zero as compared to the denominator.

It is important to examine the tail behavior of a distribution when considering it as a candidate for a model. The four criteria discussed here provide a crucial way to classify parametric models according to the tail weight.

severity models
math

Daniel Ma
mathematics

$\copyright$ 2017 – Dan Ma

Transformed exponential distributions

The processes of creating distributions from existing ones are an important topic in the study of probability models. Such processes expand the tool kit in the modeling process. Two examples: new distributions can be generated by taking independent sum of old ones or by mixing distributions (the result would be called a mixture). Another way to generate distributions is through raising a distribution to a power, which is the subject of this post. Start with a random variable $X$ (the base distribution). Then raising it to a constant generates a new distribution. In this post, the base distribution is the exponential distribution. The next post discusses transforming the gamma distribution.

______________________________________________________________________________

Raising to a Power

Let $X$ be a random variable. Let $\tau$ be a nonzero constant. The new distribution is generated when $X$ is raised to the power of $1 / \tau$. Thus the random variable $Y=X^{1 / \tau}$ is the subject of the discussion in this post.

When $\tau >0$, the distribution for $Y=X^{1 / \tau}$ is called transformed. When $\tau=-1$, the distribution for $Y=X^{1 / \tau}$ is called inverse. When $\tau <0$ and $\tau \ne -1$, the distribution for $Y=X^{1 / \tau}$ is called inverse transformed.

If the base distribution is exponential, then raising it to $1 / \tau$ would produce a transformed exponential distribution for the case of $\tau >0$, an inverse exponential distribution for the case of $\tau=-1$ and an inverse transformed exponential distribution for the case $\tau <0$ with $\tau \ne -1$. If the base distribution is a gamma distribution, the three new distributions would be transformed gamma distribution, inverse gamma distribution and inverse transformed gamma distribution.

For the case of inverse transformed, we make the random variable $Y=X^{-1 / \tau}$ by letting $\tau >0$. The following summarizes the definition.

Name of Distribution Parameter $\tau$ Random Variable
Transformed $\tau >0$ $Y=X^{1 / \tau}$
Inverse $\tau=-1$ $Y=X^{1 / \tau}$
Inverse Transformed $\tau >0$ $Y=X^{-1 / \tau}$

______________________________________________________________________________

Transforming Exponential

The “transformed” distributions discussed here have two parameters, $\tau$ and $\theta$ ($\tau=1$ for inverse exponential). The parameter $\tau$ is the shape parameter, which comes from the exponent $1 / \tau$. The scale parameter $\theta$ is added after raising the base distribution to a power.

Let $X$ be the random variable for the base exponential distribution. The following shows the information on the base exponential distribution.

Base Exponential
Density Function $f_X(x)=e^{-x} \ \ \ \ \ \ \ \ \ \ x>0$
CDF $F_X(x)=1-e^{-x} \ \ \ \ x>0$
Survival Function $S_X(x)=e^{-x} \ \ \ \ \ \ \ \ \ \ x>0$

Note that the above density function and CDF do not have the scale parameter. Once the base distribution is raised to a power, the scale parameter will be added to the newly created distribution.

The following gives the CDF and the density function of the transformed exponential distribution. The density function is obtained by taking the derivative of the CDF.

\displaystyle \begin{aligned} F_Y(y)&=P(Y \le y) \\&=P(X^{1 / \tau} \le y) \\&=P(X \le y^\tau)\\&=F_X(y^\tau) \\&=1-e^{- y^\tau} \ \ \ \ \ \ \ \ \ \ \ \ \ y>0 \end{aligned}

$\displaystyle f_Y(y)=\tau \ y^{\tau-1} \ e^{- y^\tau} \ \ \ \ \ \ \ \ \ y>0$

The following gives the CDF and the density function of the inverse exponential distribution.

\displaystyle \begin{aligned} F_Y(y)&=P(Y \le y) \\&=P(X^{-1} \le y) \\&=P(X \ge 1/y)\\&=S_X(1/y) \\&=e^{- 1/y} \ \ \ \ \ \ \ \ \ \ \ \ \ y>0 \end{aligned}

$\displaystyle f_Y(y)=\frac{1}{y^2} \ e^{- 1/y} \ \ \ \ \ \ \ \ \ y>0$

The following gives the CDF and the density function of the inverse transformed exponential distribution.

\displaystyle \begin{aligned} F_Y(y)&=P(Y \le y) \\&=P(X^{- 1 / \tau} \le y) \\&=P(X \ge y^{- \tau})\\&=S_X(y^{- \tau}) \\&=e^{- y^{- \tau}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ y>0 \end{aligned}

$\displaystyle f_Y(y)= \frac{\tau}{y^{\tau+1}} \ e^{- 1/y^\tau} \ \ \ \ \ \ \ \ \ y>0$

The above derivation does not involve the scale parameter. Now it is added to the results.

Transformed Distribution
Transformed Exponential
CDF $F_Y(y)=1-e^{- (y/\theta)^\tau}$ $y>0$
Survival Function $S_Y(y)=e^{- (y/\theta)^\tau}$ $y>0$
Density Function $f_Y(y)=(\tau / \theta) \ (y/\theta)^{\tau-1} \ e^{- (y/\theta)^\tau}$ $y>0$
Inverse Exponential
CDF $F_Y(y)=e^{- \theta/y}$ $y>0$
Survival Function $S_Y(y)=1-e^{- \theta/y}$ $y>0$
Density Function $f_Y(y)=\frac{\theta}{y^2} \ e^{- \theta/y}$ $y>0$
Inverse Transformed Exponential
CDF $F_Y(y)=e^{- (\theta/y)^{\tau}}$ $y>0$
Survival Function $S_Y(y)=1-e^{- (\theta/y)^{\tau}}$ $y>0$
Density Function $f_Y(y)=\tau ( \theta / y )^\tau \ (1/y) \ e^{- (\theta/y)^{\tau}}$ $y>0$

The transformed exponential distribution and the inverse transformed distribution have two parameters $\tau$ and $\theta$. The inverse exponential distribution has only one parameter $\theta$. The parameter $\theta$ is the scale parameter. The parameter $\tau$, when there is one, is the shape parameter and it comes from the exponent when the exponential is raised to a power.

The above transformation starts with the exponential distribution with mean 1 (without the scale parameter) and the scale parameter $\theta$ is added back in at the end. We can also accomplish the same result by starting with an exponential variable $X$ with mean (scale parameter) $\theta^\tau$. Then raising $X$ to $1/\tau$, -1, and $-1/\tau$ would generate the three distributions described in the above table. In this process, the scale parameter $\theta$ is baked into the base distribution. This makes it easier to obtain the moments of the “transformed” exponential distributions since the moments would be derived from exponential moments.
______________________________________________________________________________

Connection with Weibull Distribution

Compare the density function for the transformed exponential distribution with the density of the Weibull distribution discussed here. Note that the two are identical. Thus raising an exponential distribution to $1 / \tau$ where $\tau >0$ produces a Weibull distribution.

On the other hand, raising a Weillbull distribution to -1 produces an inverse Weillbull distribution (by definition). Let $F_X(x)=1-e^{- x^\tau}$ be the CDF of the base Weibull distribution where $\tau >0$. Let’s find the CDF of $Y=X^{-1}$. Then add the scale parameter.

$\displaystyle F_Y(y)=P(Y \le y)=P(X \ge 1 / y)=e^{- (1/y)^\tau}$

$\displaystyle F_Y(y)=e^{- (\theta/y)^\tau}$ (scale parameter added)

Note the the CDF of inverse Weibull distribution is identical to the one for inverse transformed exponential distribution. Thus transformed exponential distribution is identical to a Weibull distribution and inverse transformed exponential distribution is identical to an inverse Weibull distribution.

Since Weibull distribution is the same as transformed exponential distribution, the previous post on Weibull distribution can inform us on transformed exponential distribution. For example, assuming that the Weibull distribution (or transformed exponential) is a model for the time until death of a life, varying the shape parameter $\tau$ yields different mortality patterns. The following are two graphics from the proevious post.

Figure 1

Figure 2

Figure 1 shows the Weibull density functions for different values of the shape parameter (the scale parameter $\theta$ is fixed at 1). The curve for $\tau=1$ is the exponential density curve. It is clear that the green density curve ($\tau=2$) approaches the x-axis at a faster rate then the other two curves and thus has a lighter tail than the other two density curves. In general, the Weibull (transformed exponential) distribution with shape parameter $\tau >1$ has a lighter tail than the Weibull with shape parameter $0<\tau <1$.

Figure 2 shows the failure rates for the Weibull (transformed exponential) distributions with the same three values of $\tau$. Note that the failure rate for $\tau=0.5$ (blue) decreases over time and the failure rate for $\tau=2$ increases over time. The failure rate for $\tau=1$ is constant since it is the exponential distribution.

What is being displayed in Figure 2 describes a general pattern. When the shape parameter is $0<\tau<0.5$, the failure rate decreases as time increases and the Weibull (transformed exponential) distribution is a model for infant mortality, or early-life failures. Hence these Weibull distributions have a thicker tail as shown in Figure 1.

When the shape parameter is $\tau >1$, the failure rate increases as time increases and the Weibull (transformed exponential) distribution is a model for wear-out failures. As times go by, the lives are fatigued and “die off.” Hence these Weibull distributions have a lighter tail as shown in Figure 1.

When $\tau=1$, the resulting Weibull (transformed exponential) distribution is exponential. The failure rate is constant and it is a model for random failures (failures that are independent of age).

Thus the transformed exponential family has a great deal of flexibility for modeling the failures of objects (machines, devices).

______________________________________________________________________________

Moments and Other Distributional Quantities

The moments for the three “transformed” exponential distributions are based on the gamma function. The two inverse distributions have limited moments. Since the transformed exponential distribution is identical to Weibull, its moments are identical to that of the Weibull distribution. The moments of the “transformed” exponential distributions are $E(Y)=E(X^{1 / \tau})$ where $X$ has an exponential distribution with mean (scale parameter) $\theta^\tau$. See here for the information on exponential moments. The following shows the moments of the “transformed” exponential distributions.

Name of Distribution Moment
Transformed Exponential $E(Y^k)=\theta^k \Gamma(1+k/\tau)$ $k >- \tau$
Inverse Exponential $E(Y^k)=\theta^k \Gamma(1-k)$ $k <1$
Inverse Transformed Exponential $E(Y^k)=\theta^k \Gamma(1-k/\tau)$ $k <\tau$

The function $\Gamma(\cdot)$ is the Gamma function. The transformed exponential moment $E(Y^k)$ exists for all $k >- \tau$. The moments are limited for the other two distributions. The first moment $E(Y)$ does not exist for the inverse exponential distribution. The inverse transformed exponential moment $E(Y^k)$ exist only for $k<\tau$. Thus the inverse transformed exponential mean and variance exist only if the shape parameter $\tau$ is larger than 2.

The distributional quantities that are based on moments can be calculated (e.g. variance, skewness and kurtosis) when the moments are available. For all three "transformed" exponential distributions, percentiles are easily computed since the CDFs contain only one instance of the unknown $y$. The following gives the mode of the three distributions.

Name of Distribution Mode
Transformed Exponential $\displaystyle \theta \biggl(\frac{\tau-1}{\tau} \biggr)^{1/\tau}$ for $\tau >1$, else 0
Inverse Exponential $\theta / 2$
Inverse Transformed Exponential $\displaystyle \theta \biggl(\frac{\tau}{\tau+1} \biggr)^{1/\tau}$

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 – Dan Ma

The Weibull distribution

Mathematically, the Weibull distribution has a simple definition. It is mathematically tractable. It is also a versatile model. The Weibull distribution is widely used in life data analysis, particularly in reliability engineering. In addition to analysis of fatigue data, the Weibull distribution can also be applied to other engineering problems, e.g. for modeling the so called weakest link model.This post gives an introduction to the Weibull distribution.

_______________________________________________________________________________________________

Defining the Weibull Distribution

A random variable $Y$ is said to follow a Weibull distribution if $Y$ has the following density function

$\displaystyle f(y)=\frac{\tau}{\lambda} \ \biggl( \frac{y}{\lambda} \biggr)^{\tau-1} \text{exp} \biggl[- \biggl(\frac{y}{\lambda}\biggr)^\tau \ \biggr] \ \ \ ;y>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

where $\tau>0$ and $\lambda>0$ are some fixed constants. The notation $\text{exp}[x]$ refers to the exponential function $e^{x}$. As defined here, the Weibull distribution is a two-parameter distribution with $\tau$ being the shape parameter and $\lambda$ being the scale parameter. The following is the cumulative distribution function (CDF).

$\displaystyle F(y)=1- \text{exp} \biggl[- \biggl(\frac{y}{\lambda}\biggr)^\tau \ \biggr] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

_______________________________________________________________________________________________

Connection with the Exponential Distribution

The Weibull distribution can also arise naturally from the random sampling of an exponential random variable. A better way to view Weibull is through the lens of exponential. Taking an observation from an exponential distribution and raising it to a positive power will result in a Weibull observation. Specifically, the random variable $\displaystyle Y=X^{\frac{1}{\tau}}$ has the same CDF as in $(2)$ if $X$ is an exponential random variable with mean $\lambda^\tau$. To see this, consider the following:

\displaystyle \begin{aligned} P[Y \le y]&=P[X^{\frac{1}{\tau}} \le y] \\&=P[X \le y^{\tau}] \\&=1-e^{-\frac{y^{\tau}}{\lambda^\tau}} \\&=1-e^{-(\frac{y}{\lambda})^\tau} \end{aligned}

_______________________________________________________________________________________________

Basic Properties

The idea of the Weibull distribution as a power of an exponential distribution simplifies certain calculation on the Weibull distribution. For example, a raw moment of the Weibull distribution is simply another raw moment of the exponential distribution. For an exponential random variable $X$ with mean $\theta$, the raw moments $E[X^k]$ are (details can be found here):

$\displaystyle E[X^k]=\left\{ \begin{array}{ll} \displaystyle \Gamma(1+k) \ \theta^k &\ k>-1 \\ \text{ } & \text{ } \\ \displaystyle k! \ \theta^k &\ k \text{ is a positive integer} \end{array} \right.$

where $\Gamma(\cdot)$ is the gamma function. To see how the gamma function can be evaluated in Microsoft Excel, see the last section of this post.

For the Weibull random variable $Y$ with parameters $\tau$ and $\lambda$, i.e. $Y=X^{1 / \tau}$ where $X$ is the exponential random variable with mean $\lambda^\tau$, the following shows the mean and higher moments.

$\displaystyle E[Y]=E[X^{\frac{1}{\tau}}]=\Gamma \biggl(1+\frac{1}{\tau} \biggr) \ \lambda \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)$

$\displaystyle E[Y^k]=E[X^{\frac{k}{\tau}}]=\Gamma \biggl(1+\frac{k}{\tau} \biggr) \ \lambda^k \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)$

With the moments established, several other distributional quantities that are based on moments can also be established. The following shows the variance, skewness and kurtosis.

$\displaystyle Var[Y]=E[Y^2]-E[Y]^2=\biggl[ \Gamma \biggl(1+\frac{2}{\tau} \biggr)-\Gamma \biggl(1+\frac{1}{\tau} \biggr)^2 \biggr] \ \lambda^2 \ \ \ \ \ \ \ \ \ \ \ \ \ (5)$

\displaystyle \begin{aligned} \gamma_1&=\frac{E[(Y-\mu)^3}{\sigma^3} \\&=\frac{E[Y^3]-3 \ \mu \ E[Y^2]+2 \ \mu^3}{\sigma^3} \\&\displaystyle =\frac{\Gamma \biggl(1+\frac{3}{\tau} \biggr) \lambda^3-3 \ \mu \ \Gamma \biggl(1+\frac{2}{\tau} \biggr) \lambda^2+2 \ \mu^3}{(\sigma^2)^{\frac{3}{2}}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (6) \end{aligned}

\displaystyle \begin{aligned} \gamma_2&=\frac{E[(Y-\mu)^4}{\sigma^4} \\&=\frac{E[Y^4]-4 \ \mu \ E[Y^3]+6 \ \mu^2 \ E[Y^2]-3 \ \mu^4}{\sigma^4} \\&\displaystyle =\frac{\Gamma \biggl(1+\frac{4}{\tau} \biggr) \lambda^4-4 \ \mu \ \Gamma \biggl(1+\frac{3}{\tau} \biggr) \lambda^3+6 \ \mu^2 \ \Gamma \biggl(1+\frac{2}{\tau} \biggr) \lambda^2-3 \ \mu^4}{\sigma^4} \ \ \ \ \ (7) \end{aligned}

The notation $\gamma_1$ here is the skewness. The notation $\gamma_2$ is kurtosis. The excess kurtosis is $\gamma_2-3$. In some sources, the notation $\gamma_2$ is to denote excess kurtosis. Of course $\mu$ and $\sigma^2$ are the mean and variance, respectively.

Another calculation that is easily accessible for the Weibull distribution is that of the percentiles. It is easy to solve for $y$ in the CDF in $(2)$. For example, to find the median, set the CDF equals to 0.5 and solves for $y$, producing the following.

$\displaystyle \text{median}=\lambda \ \biggl(\text{ln}(2) \biggr)^{\frac{1}{\tau}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (8)$

Another basic and important property to examine is the failure rate. The failure rate of a distribution is the ratio of the density function to its survival function. The following is the failure of the Weibull distribution.

$\displaystyle \mu(t)=\frac{f(t)}{1-F(t)}=\frac{\tau}{\lambda} \ \biggl( \frac{t}{\lambda} \biggr)^{\tau-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (9)$

See here for a discussion of the failure rate in conjunction with the exponential distribution. Suppose that the distribution in question is a lifetime distribution (time until termination or death). Then the failure rate $\mu(t)$ can be interpreted as the rate of failure at the next instant given that the life has survived to time $t$.

_______________________________________________________________________________________________

When the Parameters Vary

The discussion in the previous section might give the impression that all Weibull distributions (when the parameters vary) behave in the same way. We now look at examples showing that as $\tau$ (shape parameter) and/or $\lambda$ (scale parameter) vary, the distribution will exhibit markedly different behavior. Note that when $\tau=1$, the Weibull distribution is reduced to the exponential distribution.

Example 1
The following diagram shows the PDFs of the Weibull distribution with $\tau=0.5$, $\tau=1$ and $\tau=2$ where $\lambda=1$ in all three cases.

Figure 1

Figure 1 shows the effect of the shape parameter taking on different values while keeping the scale parameter fixed. The effect is very pronounced on the skewness. All three density curves are right skewed. The PDF with $\tau=0.5$ (the blue curve) has a very strong right skew. The PDF with $\tau=1$ (the red curve) is exponential and has, by comparison, a much smaller skewness. The PDF with $\tau=2$ looks almost symmetric, though there is a clear and small right skew. This observation is borne out by the calculation. The skewness coefficients are $\gamma_1=6.62$ (blue curve), $\gamma_1=2$ (red curve) and $\gamma_1=0.6311$ (green curve).

Another clear effect of the shape parameter is the thickness of the tail (in this case the right tail). Figure 1 suggests that the PDF with $\tau=0.5$ (the blue curve) is higher than the other two density curves on the interval $x>2$. As a result, the blue curve has more probability mass in the right tail. Thus the blue curve has a thicker tail comparing to the other two PDFs. For a numerical confirmation, the following table compares the probabilities in the right tail.

$\displaystyle \begin{array}{lllllll} \text{ } &\text{ } & \tau=0.5 & \text{ } & \tau=1 & \text{ } & \tau=2 \\ \text{ } & \text{ } &\text{Blue Curve} & \text{ } & \text{Red Curve} & \text{ }& \text{Green Curve} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ P[X>2] &\text{ } & 0.2431 & \text{ } & 0.1353 & \text{ } & 0.0183\\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ P[X>3] &\text{ } & 0.1769 & \text{ } & 0.0498 & \text{ } & 0.0003355 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ P[X>4] &\text{ } & 0.1353 & \text{ } & 0.0183 & \text{ } & \displaystyle 1.125 \times 10^{-7} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ P[X>5] &\text{ } & 0.1069 & \text{ } & 0.00674 & \text{ } & 1.389 \times 10^{-11} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ P[X>10] &\text{ } & 0.0423 & \text{ } & 0.0000454 & \text{ } & 3.72 \times 10^{-44} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ P[X>25] &\text{ } & 0.00673 & \text{ } & 1.39 \times 10^{-11} & \text{ } & 3.68 \times 10^{-272} \end{array}$

It is clear from the above table that the Weibull distribution with the blue curve assigns more probabilities to the higher values. The mean of the distribution for blue curve is 2. The right tail $x>10$ (over 5 times the mean) contains 4.23% of the probability mass (a small probability for sure but not negligible). The right tail $x>25$ (over 12.5 times the mean) still has a small probability of 0.00673 that cannot be totally ignored. On the other hand, the Weibull distribution for the green curve has a light tail. The mean of the distribution for the green curve is about 0.89. At $x>4$ (over 4.5 times of its mean), the tail probability is already negligible at $1.125 \times 10^{-7}$. At $x>10$ (over 11 times of its mean), the tail probability is $3.72 \times 10^{-44}$, practically zero.

Let’s compare the density curves in Figure 1 with their failure rates. The following figure shows the failure rates for these three Weibull distributions.

Figure 2

According to the definition in $(9)$, the following shows the failure rate function for the three Weibull distributions.

$\mu(y)=\frac{0.5}{y^{0.5}} \ \ \ \ \ y>0 \ \ \ \ \ \ \ \tau=0.5 \ \ \text{(blue curve)}$

$\mu(y)=1 \ \ \ \ \ \ \ \ y>0 \ \ \ \ \ \ \ \tau=1 \ \ \text{(red curve)}$

$\mu(y)=2y \ \ \ \ \ \ y>0 \ \ \ \ \ \ \ \tau=2 \ \ \text{(green curve)}$

The blue curve in Figure 1 ($\tau=0.5$) has a decreasing failure rate as shown in Figure 2. The failure rate function is constant for the case of $\tau=1$ (the exponential case). It is an increasing function for the case of $\tau=2$. This comparison shows that the Weibull distribution is particularly useful for engineers and researchers who study the reliability of machines and devices. If the engineers believe that the failure rate is constant, then an exponential model is appropriate. If they believe that the failure rate increases with time or age, then a Weibull distribution with shape parameter $\tau>1$ is more appropriate. If the engineers believe that the failure rate decreases with time, then a Weibull distribution with shape parameter $\tau<1$ is more appropriate.

Example 2
We now compare Weibull distributions with various values for $\lambda$ (scale parameter) while keeping the shape parameter $\tau$ fixed. The following shows the density curves for the Weibull distributions with $\lambda=1, 2, 3$ while keeping $\tau=2$.

Figure 3

The effect of the scale parameter $\lambda$ is to compress or stretch out the standard Weibull density curve, i.e. the Weibull distribution with $\lambda=1$. For example, the density function for $\lambda=2$ is obtained by stretching out the density curve for the one with $\lambda=1$. The same overall shape is maintained while the density curve is being stretched or compressed. According to $(3)$, the mean of the transformed distribution is increased (stretching) or decreased (compressing). For example, as $\lambda$ is increased from 1 to 2, the mean has a two-fold increase. As the density curve is stretched, the resulting distribution is more spread out and the peak of the density curve decreases. The overall effect of changing the scale parameter is essentially a change in the scale in the x-axis.

The next example is a computational exercise.

Example 3
The time until failure (in months) of a semiconductor device has a Weibull distribution with shape parameter $\tau=2.2$ and scale parameter $\lambda=400$.

• Give the density function and the survival function.
• Determine the probability that the device will last at least 500 hours.
• Determine the probability that the device will last at least 600 hours given that it has been running for over 500 hours.
• Find the mean and standard deviation of the time until failure.
• Determine the failure rate function of the Weibull time until failure.

To obtain the density function, the survival function and the failure rate, follow the relationships in $(1)$, $(2)$ and $(9)$.

$\displaystyle f(y)=\frac{2.2}{400} \ \biggl( \frac{y}{400} \biggr)^{1.2} \text{exp} \biggl[- \biggl(\frac{y}{400}\biggr)^{2.2} \ \biggr]$

$\displaystyle S(y)=\text{exp} \biggl[- \biggl(\frac{y}{400}\biggr)^{2.2} \biggr]$

$\displaystyle \mu(y)=\frac{2.2}{400} \ \biggl( \frac{y}{400} \biggr)^{1.2}$

Note that the Weibull failure rate is the ratio of the density function to the survival function. In this case, the failure is an increasing function of $y$. Since $Y$ is a time scale, then this is a model for machines that wear out over time (see the next section).

The probability that the device will last over 500 hours is $e^{-(\frac{500}{400})^{2.2}}=0.1952$. The unconditional probability that the device will last over 600 hours is $e^{-(\frac{600}{400})^{2.2}}=0.0872$ The conditional probability that the device will last more than 600 hours given that it has lasted more 500 hours is the ratio $\frac{S(600)}{S(500)}=0.4465$.

To find the mean and variance, we need to evaluate the gamma function. Using Excel, we obtain the following two values of the gamma function (as shown here):

$\Gamma(1+\frac{1}{2.2})=0.88562476$

$\Gamma(1+\frac{2}{2.2})=0.964912489$

The mean and standard deviation of the time until failure are:

$E[Y]=\Gamma(1+\frac{1}{2.2}) \times 400=354.2499042$

$E[Y^2]=\Gamma(1+\frac{2}{2.2}) \times 400^2=154385.9982$

$Var[Y]=E[Y^2]-E[Y]^2=28893.00363$

$\sigma_Y=\sqrt{Var[Y]}=169.9794212$

_______________________________________________________________________________________________

The Weibull Failure Rates

Looking at the failure rate function indicated in $(9)$ and looking at Figure 2, it is clear that when the shape parameter $0<\tau<1$, the failure rate decreases with time (if the distribution is a model for the time until death of a life). When the shape parameter $\tau=1$, the failure rate is constant. When the shape parameter $\tau>1$, the failure rate increases with time. As a result, the Weibull family of distribution has a great deal of flexibility for modeling the failures of objects (machines, devices).

When the shape parameter $0<\tau<1$, the failure rate decreases with time. Such a Weibull distribution is a model for infant mortality, or early-life failures. When the shape parameter $\tau=1$ (or near 1), the failure rate is constant or near constant. The resulting Weibull distribution (an exponential model) is a model for random failures (failures that are independent of age). When the shape parameter $\tau>1$, the failure rate increases with time. The resulting Weibull distribution is a model for wear-out failures.

In some applications, it may be necessary to model each phase of a lifetime separately, e.g. the early phase with a Weibull distribution with $0<\tau<1$, the useful phase with a Weibull distribution with $\tau$ close to 1 and the wear-out phase with a Weibull distribution with $\tau>1$. The resulting failure rate curve resembles a bathtub curve. The following is an idealized bathtub curve.

Figure 4

The blue part of the bathtub curve is the early phase of the lifetime, which is characterized by decreasing failure rate. This is the early-life period in which the defective products die off and are taken out of the study. The next period is the useful-life period, the red part of the curve, in which the failures are random that are independent of age. In this phase, the failure rate is constant or near constant. The green part of the bathtub curve is characterized by increasing failure rates, which is the wear-out phase of the lifetime being studied.

_______________________________________________________________________________________________

Another attractiveness of the Weibull model is that it can be used to model the so called the weakest link model. Consider a machine or device that has multiple components. Suppose that the device dies or fails when any one of the components fails. The lifetime of such a machine or device is the time to the first failure. Such a lifetime model is called the weakest link model. It can be shown that under these conditions a Weibull distribution is a good model for the distribution of the lifetime of such a machine or device.

If the time until failure of the individual components are indpendent and identically distributed Weibull random variables, then it follows that the minimum of of the Weibull random variables is also a Weibull random variable. To see this, let $X_1,X_2,\cdots,X_n$ be independent and identically distributed Weibull random varaibles. Let $\tau$ and $\lambda$ be the parameters for the common Weibull distribution. Let $Y=\text{min}(X_1,X_2,\cdots,X_n)$. The following gives the survival function of $Y$.

\displaystyle \begin{aligned} P[Y >y]&=P[\text{all } X_i >y] \\&=\biggl(e^{-(\frac{y}{\lambda})^\tau} \biggr)^n \\&=e^{-n (\frac{y}{\lambda})^\tau} \\&=e^{-(\frac{y}{\lambda_1})^\tau} \end{aligned}

where $\lambda_1=\frac{\lambda}{t}$ and $t=n^{\frac{1}{\tau}}$. This shows that $Y$ has a Weibull distribution with shape parameter $\tau$ (as before) and scale parameter $\lambda_1$. Under the condition that the times to failure for the multiple components are indentically Weibull distributed, the lifetime of the device is also a Weibull model.

_______________________________________________________________________________________________

The Moment Generating Function

The post is concluded with a comment on the moment generating function for the Weibull distribution. Note that relationship $(4)$ indicates that all positive moments for the Weibull distribution exist. A natural question is on whether the moment generating function (MGF) exists. It turns out that MGF does not always exist for the Weibull distribution. The MGF exists for the Weibull distribution whenever the shape parameter $\tau \ge 1$. However the MGF cannot be expressed in terms of any familiar functions. Instead, the Weibull MGF can be expressed as a power series.

$\displaystyle M(t)=\sum \limits_{n=0}^\infty \ \frac{(\lambda t)^n}{n!} \ \Gamma \biggl(1+\frac{n}{\tau} \biggr) \ \ \ \ \ \tau \ge 1$

To see this, start with the power series for $e^x$.

$\displaystyle e^X=\sum \limits_{n=0}^\infty \ \frac{X^n}{n!}$

$\displaystyle e^{t X}=\sum \limits_{n=0}^\infty \ \frac{(t X)^n}{n!}$

$\displaystyle M(t)=E[e^{t X}]=\sum \limits_{n=0}^\infty \ \frac{t^n}{n!} \ E[X^n]$

$\displaystyle M(t)=\sum \limits_{n=0}^\infty \ \frac{(\lambda t)^n}{n!} \ \Gamma \biggl(1+\frac{n}{\tau} \biggr)$

Since the positive moments exist for the Weibull distribution, the higher moments from $(4)$ are plugged into the power series. It can be shown that the last series converges when $\tau \ge 1$.

When $0<\tau<1$, the power series does not converge. For a specific example, let $\tau=0.5$. Then the term in the series is simplified to $2 \lambda t (2n+1)$, when goes to infinity when $n \rightarrow \infty$. Thus the Weilbull distribution with the shape parameter $\tau=0.5$ is an example of a distribution where all the positive moments exist but the MGF does not exist.

_______________________________________________________________________________________________

Further Information

Further information can be found here and here.

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$