A catalog of parametric severity models

Various parametric continuous probability models have been presented and discussed in this blog. The number of parameters in these models ranges from one to two, and in a small number of cases three. They are all potential candidates for models of severity in insurance applications and in other actuarial applications. This post highlights these models. The list presented here is not exhaustive; it is only a brief catalog. There are other models that are also suitable for actuarial applications but not accounted for here. However, the list is a good place to begin. This post also serves a navigation device (the table shown below contains links to the blog posts).

A Catalog

Many of the models highlighted here are related to gamma distribution either directly or indirectly. So the catalog starts with the gamma distribution at the top and then branches out to the other related models. Mathematically, the gamma distribution is a two-parameter continuous distribution defined using the gamma function. The gamma sub family includes the exponential distribution, Erlang distribution and chi-squared distribution. These are distributions that are gamma distributions with certain restrictions on the one or both of the gamma parameters. Other distributions are obtained by raising a distribution to a power. Others are obtained by mixing distributions.

Here’s a listing of the models. Click on the links to find out more about the distributions.

……Derived From ………………….Model
Gamma function
Gamma sub families
Independent sum of gamma
Exponentiation
Raising to a power Raising exponential to a positive power

Raising exponential to a power

Raising gamma to a power

Raising Pareto to a power

Burr sub families
Mixture
Others

The above table categorizes the distributions according to how they are mathematically derived. For example, the gamma distribution is derived from the gamma function. The Pareto distribution is mathematically an exponential-gamma mixture. The Burr distribution is a transformed Pareto distribution, i.e. obtained by raising a Pareto distribution to a positive power. Even though these distributions can be defined simply by giving the PDF and CDF, knowing how their mathematical origins informs us of the specific mathematical properties of the distributions. Organizing according to the mathematical origin gives us a concise summary of the models.

\text{ }

\text{ }

Further Comments on the Table

From a mathematical standpoint, the gamma distribution is defined using the gamma function.

    \displaystyle \Gamma(\alpha)=\int_0^\infty t^{\alpha-1} \ e^{-t} \ dt

In this above integral, the argument \alpha is a positive number. The expression t^{\alpha-1} \ e^{-t} in the integrand is always positive. The area in between the curve t^{\alpha-1} \ e^{-t} and the x-axis is \Gamma(\alpha). When this expression is normalized, i.e. divided by \Gamma(\alpha), it becomes a density function.

    \displaystyle f(t)=\frac{1}{\Gamma(\alpha)} \ t^{\alpha-1} \ e^{-t}

The above function f(t) is defined over all positive t. The integral of f(t) over all positive t is 1. Thus f(t) is a density function. It only has one parameter, the \alpha, which is the shape parameter. Adding the scale parameter \theta making it a two-parameter distribution. The result is called the gamma distribution. The following is the density function.

    \displaystyle f(x)=\frac{1}{\Gamma(\alpha)} \ \biggl(\frac{1}{\theta}\biggr)^\alpha \ x^{\alpha-1} \ e^{-\frac{x}{\theta}} \ \ \ \ \ \ \ x>0

Both parameters \alpha and \theta are positive real numbers. The first parameter \alpha is the shape parameter and \theta is the scale parameter.

As mentioned above, many of the distributions listed in the above table is related to the gamma distribution. Some of the distributions are sub families of gamma. For example, when \alpha are positive integers, the resulting distributions are called Erlang distribution (important in queuing theory). When \alpha=1, the results are the exponential distributions. When \alpha=\frac{k}{2} and \theta=2 where k is a positive integer, the results are the chi-squared distributions (the parameter k is referred to the degrees of freedom). The chi-squared distribution plays an important role in statistics.

Taking independent sum of n independent and identically distributed exponential random variables produces the Erlang distribution, a sub gamma family of distribution. Taking independent sum of n exponential random variables, with pairwise distinct means, produces the hypoexponential distributions. On the other hand, the mixture of n independent exponential random variables produces the hyperexponential distribution.

The Pareto distribution (Pareto Type II Lomax) is the mixture of exponential distributions with gamma mixing weights. Despite the connection with the gamma distribution, the Pareto distribution is a heavy tailed distribution. Thus the Pareto distribution is suitable for modeling extreme losses, e.g. in modeling rare but potentially catastrophic losses.

As mentioned earlier, raising a Pareto distribution to a positive power generates the Burr distribution. Restricting the parameters in a Burr distribution in a certain way will produces the paralogistic distribution. The table indicates the relationships in a concise way. For details, go into the blog posts to get more information.

Tail Weight

Another informative way to categorize the distributions listed in the table is through looking at the tail weight. At first glance, all the distributions may look similar. For example, the distributions in the table are right skewed distributions. Upon closer look, some of the distributions put more weights (probabilities) on the larger values. Hence some of the models are more suitable for models of phenomena with significantly higher probabilities of large or extreme values.

When a distribution significantly puts more probabilities on larger values, the distribution is said to be a heavy tailed distribution (or said to have a larger tail weight). In general tail weight is a relative concept. For example, we say model A has a larger tail weight than model B (or model A has a heavier tail than model B). However, there are several ways to check for tail weight of a given distribution. Here are the four criteria.

Tail Weight Measure What to Look for
1 Existence of moments The existence of more positive moments indicates a lighter tailed distribution.
2 Hazard rate function An increasing hazard rate function indicates a lighter tailed distribution.
3 Mean excess loss function An increasing mean excess loss function indicates a heavier tailed distribution.
4 Speed of decay of survival function A survival function that decays rapidly to zero (as compared to another distribution) indicates a lighter tailed distribution.

Existence of moments
For a positive real number k, the moment E(X^k) is defined by the integral \int_0^\infty x^k \ f(x) \ dx where f(x) is the density function of the distribution in question. If the distribution puts significantly more probabilities in the larger values in the right tail, this integral may not exist (may not converge) for some k. Thus the existence of moments E(X^k) for all positive k is an indication that the distribution is a light tailed distribution.

In the above table, the only distributions for which all positive moments exist are gamma (including all gamma sub families such as exponential), Weibull, lognormal, hyperexponential, hypoexponential and beta. Such distributions are considered light tailed distributions.

The existence of positive moments exists only up to a certain value of a positive integer k is an indication that the distribution has a heavy right tail. All the other distributions in the table are considered heavy tailed distribution as compared to gamma, Weibull and lognormal. Consider a Pareto distribution with shape parameter \alpha and scale parameter \theta. Note that the existence of the Pareto higher moments E(X^k) is capped by the shape parameter \alpha. If the Pareto distribution is to model a random loss, and if the mean is infinite (when \alpha=1), the risk is uninsurable! On the other hand, when \alpha \le 2, the Pareto variance does not exist. This shows that for a heavy tailed distribution, the variance may not be a good measure of risk.

Hazard rate function
The hazard rate function h(x) of a random variable X is defined as the ratio of the density function and the survival function.

    \displaystyle h(x)=\frac{f(x)}{S(x)}

The hazard rate is called the force of mortality in a life contingency context and can be interpreted as the rate that a person aged x will die in the next instant. The hazard rate is called the failure rate in reliability theory and can be interpreted as the rate that a machine will fail at the next instant given that it has been functioning for x units of time.

Another indication of heavy tail weight is that the distribution has a decreasing hazard rate function. On the other hand, a distribution with an increasing hazard rate function has a light tailed distribution. If the hazard rate function is decreasing (over time if the random variable is a time variable), then the population die off at a decreasing rate, hence a heavier tail for the distribution in question.

The Pareto distribution is a heavy tailed distribution since the hazard rate is h(x)=\alpha/x (Pareto Type I) and h(x)=\alpha/(x+\theta) (Pareto Type II Lomax). Both hazard rates are decreasing function.

The Weibull distribution is a flexible model in that when its shape parameter is 0<\tau<1, the Weibull hazard rate is decreasing and when \tau>1, the hazard rate is increasing. When \tau=1, Weibull is the exponential distribution, which has a constant hazard rate.

The point about decreasing hazard rate as an indication of a heavy tailed distribution has a connection with the fourth criterion. The idea is that a decreasing hazard rate means that the survival function decays to zero slowly. This point is due to the fact that the hazard rate function generates the survival function through the following.

    \displaystyle S(x)=e^{\displaystyle -\int_0^x h(t) \ dt}

Thus if the hazard rate function is decreasing in x, then the survival function will decay more slowly to zero. To see this, let H(x)=\int_0^x h(t) \ dt, which is called the cumulative hazard rate function. As indicated above, S(x)=e^{-H(x)}. If h(x) is decreasing in x, H(x) has a lower rate of increase and consequently S(x)=e^{-H(x)} has a slower rate of decrease to zero.

In contrast, the exponential distribution has a constant hazard rate function, making it a medium tailed distribution. As explained above, any distribution having an increasing hazard rate function is a light tailed distribution.

The mean excess loss function
The mean excess loss is the conditional expectation e_X(d)=E(X-d \lvert X>d). If the random variable X represents insurance losses, mean excess loss is the expected loss in excess of a threshold conditional on the event that the threshold has been exceeded. Suppose that the threshold d is an ordinary deductible that is part of an insurance coverage. Then e_X(d) is the expected payment made by the insurer in the event that the loss exceeds the deductible.

Whenever e_X(d) is an increasing function of the deductible d, the loss X is a heavy tailed distribution. If the mean excess loss function is a decreasing function of d, then the loss X is a lighter tailed distribution.

The Pareto distribution can also be classified as a heavy tailed distribution based on an increasing mean excess loss function. For a Pareto distribution (Type I) with shape parameter \alpha and scale parameter \theta, the mean excess loss is e(X)=d/(\alpha-1), which is increasing. The mean excess loss for Pareto Type II Lomax is e(X)=(d+\theta)/(\alpha-1), which is also decreasing. They are both increasing functions of the deductible d! This means that the larger the deductible, the larger the expected claim if such a large loss occurs! If the underlying distribution for a random loss is Pareto, it is a catastrophic risk situation.

In general, an increasing mean excess loss function is an indication of a heavy tailed distribution. On the other hand, a decreasing mean excess loss function indicates a light tailed distribution. The exponential distribution has a constant mean excess loss function and is considered a medium tailed distribution.

Speed of decay of the survival function to zero
The survival function S(x)=P(X>x) captures the probability of the tail of a distribution. If a distribution whose survival function decays slowly to zero (equivalently the cdf goes slowly to one), it is another indication that the distribution is heavy tailed. This point is touched on when discussing hazard rate function.

The following is a comparison of a Pareto Type II survival function and an exponential survival function. The Pareto survival function has parameters (\alpha=2 and \theta=2). The two survival functions are set to have the same 75th percentile, which is x=2. The following table is a comparison of the two survival functions.

    \displaystyle \begin{array}{llllllll} \text{ } &x &\text{ } & \text{Pareto } S_X(x) & \text{ } & \text{Exponential } S_Y(x) & \text{ } & \displaystyle \frac{S_X(x)}{S_Y(x)} \\  \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\  \text{ } &2 &\text{ } & 0.25 & \text{ } & 0.25 & \text{ } & 1  \\    \text{ } &10 &\text{ } & 0.027777778 & \text{ } & 0.000976563 & \text{ } & 28  \\  \text{ } &20 &\text{ } & 0.008264463 & \text{ } & 9.54 \times 10^{-7} & \text{ } & 8666  \\   \text{ } &30 &\text{ } & 0.00390625 & \text{ } & 9.31 \times 10^{-10} & \text{ } & 4194304  \\  \text{ } &40 &\text{ } & 0.002267574 & \text{ } & 9.09 \times 10^{-13} & \text{ } & 2.49 \times 10^{9}  \\  \text{ } &60 &\text{ } & 0.001040583 & \text{ } & 8.67 \times 10^{-19} & \text{ } & 1.20 \times 10^{15}  \\  \text{ } &80 &\text{ } & 0.000594884 & \text{ } & 8.27 \times 10^{-25} & \text{ } & 7.19 \times 10^{20}  \\  \text{ } &100 &\text{ } & 0.000384468 & \text{ } & 7.89 \times 10^{-31} & \text{ } & 4.87 \times 10^{26}  \\  \text{ } &120 &\text{ } & 0.000268745 & \text{ } & 7.52 \times 10^{-37} & \text{ } & 3.57 \times 10^{32}  \\  \text{ } &140 &\text{ } & 0.000198373 & \text{ } & 7.17 \times 10^{-43} & \text{ } & 2.76 \times 10^{38}  \\  \text{ } &160 &\text{ } & 0.000152416 & \text{ } & 6.84 \times 10^{-49} & \text{ } & 2.23 \times 10^{44}  \\  \text{ } &180 &\text{ } & 0.000120758 & \text{ } & 6.53 \times 10^{-55} & \text{ } & 1.85 \times 10^{50}  \\  \text{ } & \text{ } \\    \end{array}

Note that at the large values, the Pareto right tails retain much more probabilities. This is also confirmed by the ratio of the two survival functions, with the ratio approaching infinity. Using an exponential distribution to model a Pareto random phenomenon would be a severe modeling error even though the exponential distribution may be a good model for describing the loss up to the 75th percentile (in the above comparison). It is the large right tail that is problematic (and catastrophic)!

Since the Pareto survival function and the exponential survival function have closed forms, We can also look at their ratio.

    \displaystyle \frac{\text{pareto survival}}{\text{exponential survival}}=\frac{\displaystyle \frac{\theta^\alpha}{(x+\theta)^\alpha}}{e^{-\lambda x}}=\frac{\theta^\alpha e^{\lambda x}}{(x+\theta)^\alpha} \longrightarrow \infty \ \text{ as } x \longrightarrow \infty

In the above ratio, the numerator has an exponential function with a positive quantity in the exponent, while the denominator has a polynomial in x. This ratio goes to infinity as x \rightarrow \infty.

In general, whenever the ratio of two survival functions diverges to infinity, it is an indication that the distribution in the numerator of the ratio has a heavier tail. When the ratio goes to infinity, the survival function in the numerator is said to decay slowly to zero as compared to the denominator.

It is important to examine the tail behavior of a distribution when considering it as a candidate for a model. The four criteria discussed here provide a crucial way to classify parametric models according to the tail weight.

severity models
math

Daniel Ma
mathematics

\copyright 2017 – Dan Ma

Examples of mixtures

The notion of mixtures is discussed in this previous post. Many probability distributions useful for actuarial modeling are mixture distributions. The previous post touches on some examples – negative binomial distribution (a Poisson-Gamma mixture), Pareto distribution (an exponential-gamma mixture) and the normal-normal mixture. In this post we present additional examples. We discuss the following examples.

  1. Poisson-Gamma mixture = Negative Binomial.
  2. Normal-Normal mixture = Normal.
  3. Exponential-Gamma mixture = Pareto.
  4. Exponential-Inverse Gamma mixture = Pareto.
  5. Gamma-Gamma mixture = Generalized Pareto.
  6. Weibull-Exponential mixture = Loglogistic.
  7. Gamma-Geometric mixture = Exponential.
  8. Normal-Gamma mixture = Student t.

The first three examples are discussed in the previous post. We discuss the remaining examples in this post.

The Pareto Family

Examples 3 and 4 show that Pareto distributions are mixtures of exponential distributions with either gamma or inverse gamma mixing weights. In Example 3, X \lvert \Theta is an exponential distribution with \Theta being a rate parameter. When \Theta follows a gamma distribution, the resulting mixture is a (Type I Lomax) Pareto distribution. In Example 4, X \lvert \Theta is an exponential distribution with \Theta being a scale parameter. When \Theta follows an inverse gamma distribution, the resulting mixture is also a (Type I Lomax) Pareto distribution.

As a mixture, Example 5 is like Example 3, except that it is a gamma-gamma mixture resulting in a generalized Pareto distribution. Example 3 has been discussed in the previous post. We now discuss Example 4 and Example 5.

Example 4. Suppose that X \lvert \Theta has an exponential distribution where \Theta is a scale parameter.
Further suppose that the random parameter \Theta follows an inverse gamma distribution with parameters \alpha and \beta. Then the unconditional distribution for X is a (Type I Lomax) Pareto distribution with shape parameter \alpha and scale parameter \beta.

The following gives the cumulative distribution function (CDF) and survival function of the conditional random variable X \lvert \Theta.

    F(x \lvert \Theta)=1-e^{- x/\Theta}

    S(x \lvert \Theta)=e^{- x/\Theta}

The random parameter \Theta follows an inverse gamma distribution with parameters \alpha and \beta. The following is the pdf of \Theta:

    \displaystyle g(\theta)=\frac{1}{\Gamma(\alpha)} \ \biggl[\frac{\beta}{\theta}\biggr]^\alpha \ \frac{1}{\theta} \ e^{-\frac{\beta}{ \theta}} \ \ \ \ \ \theta>0

We show that the unconditional survival function for X is the survival function for the Pareto distribution with parameters \alpha (shape parameter) and \beta (scale parameter).

    \displaystyle \begin{aligned} S(x)&=\int_0^\infty S(x \lvert \theta) \ g(\theta) \ d \theta \\&=\int_0^\infty e^{- x/\theta} \ \frac{1}{\Gamma(\alpha)} \ \biggl[\frac{\beta}{\theta}\biggr]^\alpha \ \frac{1}{\theta} \ e^{-\beta / \theta} \ d \theta \\&=\int_0^\infty \frac{1}{\Gamma(\alpha)} \ \biggl[\frac{\beta}{\theta}\biggr]^\alpha \ \frac{1}{\theta} \ e^{-(x+\beta) / \theta} \ d \theta \\&=\frac{\beta^\alpha}{(x+\beta)^\alpha} \ \int_0^\infty \frac{1}{\Gamma(\alpha)} \ \biggl[\frac{x+\beta}{\theta}\biggr]^\alpha \ \frac{1}{\theta} \ e^{-(x+\beta) / \theta} \ d \theta \\&=\biggl(\frac{\beta}{x+\beta} \biggr)^\alpha \end{aligned}

Note that the the integrand in the last integral is a density function for an inverse gamma distribution. Thus the integral is 1 and can be eliminated. The result that remains is the survival function for a Pareto distribution with parameters \alpha and \beta. The following gives the CDF and density function of this Pareto distribution.

    \displaystyle F(x)=1-\biggl(\frac{\beta}{x+\beta} \biggr)^\alpha

    \displaystyle f(x)=\frac{\alpha \ \beta^{\alpha}}{(x+\beta)^{\alpha+1}}

See here for further information on Pareto Type I Lomax distribution.

Example 5. Suppose that X \lvert \Theta has a gamma distribution with shape parameter k (a known constant) and rate parameter \Theta. Further suppose that the random parameter \Theta follows a gamma distribution with shape parameter \alpha and rate parameter \beta. Then the unconditional distribution for X is a generalized Pareto distribution with parameters \alpha, \beta and k.

Conditional on \Theta=\theta, the following is the density function of X.

    \displaystyle f(x \lvert \theta)=\frac{1}{\Gamma(k)} \ \theta^k \ x^{k-1} \ e^{-\theta x}  \ \ \ \ \ x>0

The following is the density function of the random parameter \Theta.

    \displaystyle g(\theta)=\frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta} \ \ \ \ \ \ \theta>0

The following gives the unconditional density function for X.

    \displaystyle \begin{aligned} f(x)&=\int_0^\infty  f(x \lvert \theta) \ g(\theta) \ d \theta \\&=\int_0^\infty  \frac{1}{\Gamma(k)} \ \theta^k \ x^{k-1} \ e^{-\theta x} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta} \ d \theta \\&=\int_0^\infty \frac{1}{\Gamma(k)} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ x^{k-1} \ \theta^{\alpha+k-1} \ e^{-(x+\beta) \theta} \ d \theta \\&= \frac{1}{\Gamma(k)} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ x^{k-1} \frac{\Gamma(\alpha+k)}{(x+\beta)^{\alpha+k}} \int_0^\infty \frac{1}{\Gamma(\alpha+k)} \ (x+\beta)^{\alpha+k} \ \theta^{\alpha+k-1} \ e^{-(x+\beta) \theta} \ d \theta \\&=\frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \frac{\beta^\alpha \ x^{k-1}}{(x+\beta)^{\alpha+k}} \end{aligned}

Any distribution that has a density function described above is said to be a generalized Pareto distribution with the parameters \alpha, \beta and k. Its CDF cannot be written in closed form but can be expressed using the incomplete beta function.

    \displaystyle \begin{aligned} F(x)&=\int_0^x  \frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \frac{\beta^\alpha \ t^{k-1}}{(t+\beta)^{\alpha+k}} \ dt \\&=\int_0^x  \frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \biggl(\frac{t}{t+\beta} \biggr)^{k-1} \ \biggl(\frac{\beta}{t+\beta} \biggr)^{\alpha-1} \ \frac{\beta}{(t+\beta)^2} \ dt \\&=\frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \int_0^{\frac{x}{x+\beta}} u^{k-1} \ (1-u)^{\alpha-1} \ du, \ \ \ u=\frac{t}{t+\beta} \\&=\frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \int_0^{w} t^{k-1} \ (1-t)^{\alpha-1} \ dt, \ \ \ w=\frac{x}{x+\beta}   \end{aligned}

The moments can be easily derived for the generalized Pareto distribution but on a limited basis. Since it is a mixture distribution, the unconditional mean is the weighted average of the conditional means.

    \displaystyle \begin{aligned} E(X^w)&=\int_0^\infty  E(X \lvert \theta) \ g(\theta) \ d \theta \\&=\int_0^\infty  \frac{\Gamma(k+w)}{\theta^w \Gamma(k)} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta} \ d \theta \\&=\frac{\beta^w \ \Gamma(k+w) \ \Gamma(\alpha-w)}{\Gamma(k) \ \Gamma(\alpha)} \int_0^\infty \frac{1}{\Gamma(\alpha-w)} \ \beta^{\alpha-w} \ \theta^{\alpha-w-1} \ e^{-\beta \theta} \ d \theta \\&=\frac{\beta^w \ \Gamma(k+w) \ \Gamma(\alpha-w)}{\Gamma(k) \ \Gamma(\alpha)} \ \ \ \ -k<w<\alpha   \end{aligned}

Note that E(X) has a simple expression E(X)=\frac{k \beta}{\alpha-1} when 1<\alpha.

When the parameter k=1, the conditional distribution for X \lvert \Theta is an exponential distribution. Then the situation reverts back to Example 3, leading to a Pareto distribution. Thus the Pareto distribution is a special case of the generalized Pareto distribution. Both the Pareto distribution and the generalized Pareto distribution have thicker and longer tails than the original conditional gamma distribution.

It turns out that the F distribution is also a special case of the generalized Pareto distribution. The F distribution with r_1 and r_2 degrees of freedom is the generalized Pareto distribution with parameters k=r_1/2, \alpha=r_2/2 and \beta=r_2/r_1. As a result, the following is the density function.

    \displaystyle \begin{aligned} h(x)&=\frac{\Gamma(r_1/2 + r_2/2)}{\Gamma(r_1/2) \ \Gamma(r_2/2)} \ \frac{(r_2/r_1)^{r_2/2} \ x^{r_1/2-1}}{(x+r_2/r_1)^{r_1/2+r_2/2}} \\&=\frac{\Gamma(r_1/2 + r_2/2)}{\Gamma(r_1/2) \ \Gamma(r_2/2)} \ \frac{(r_1/r_2)^{r_1/2} \ x^{r_1/2-1}}{(1+(r_1/r_2)x)^{r_1/2+r_2/2}}  \ \ \ \ 0<x<\infty   \end{aligned}

Another way to generate the F distribution is from taking a ratio of two chi-squared distributions (see Theorem 9 in this previous post). Of course, there is no need to use the explicit form of the density function of the F distribution. In a statistical application, the F distribution is accessed using tables or software.

The Loglogistic Distribution

The loglogistic distribution can be derived as a mixture of Weillbull distribution with exponential mixing weights.

Example 6. Suppose that X \lvert \Lambda has a Weibull distribution with shape parameter \gamma (a known constant) and a parameter \Lambda such that the CDF of X \lvert \Lambda is F(x \lvert \Lambda)=1-e^{-\Lambda \ x^\gamma}. Further suppose that the random parameter \Lambda follows an exponential distribution with rate parameter \theta^{\gamma}. Then the unconditional distribution for X is a loglogistic distribution with shape parameter \gamma and scale parameter \theta.

The following gives the conditional survival function for X \lvert \Lambda and the exponential mixing weight.

    \displaystyle S(x \lvert \lambda)=e^{-\lambda \ x^\gamma}

    \displaystyle g(\lambda)=\theta^\gamma \ e^{-\theta^\gamma \ \lambda}

The following gives the unconditional survival function and CDF of X as well as the PDF.

    \displaystyle \begin{aligned} S(x)&=\int_0^\infty S(x \lvert \lambda) \ g(\lambda) \ d \lambda \\&=\int_0^\infty e^{-\lambda \ x^\gamma} \ \theta^\gamma \ e^{-\theta^\gamma \ \lambda} \ d \lambda \\&=\int_0^\infty  \theta^\gamma \ e^{-(x^\gamma+\theta^\gamma) \ \lambda} \ d \lambda \\&=\frac{\theta^\gamma}{(x^\gamma+\theta^\gamma)} \int_0^\infty   (x^\gamma+\theta^\gamma) \ e^{-(x^\gamma+\theta^\gamma) \ \lambda} \ d \lambda \\&=\frac{\theta^\gamma}{x^\gamma+\theta^\gamma} \end{aligned}

    \displaystyle \begin{aligned} F(x)&=1-S(x)=1-\frac{\theta^\gamma}{x^\gamma+\theta^\gamma} =\frac{x^\gamma}{x^\gamma+\theta^\gamma} =\frac{(x/\theta)^\gamma}{1+(x/\theta)^\gamma} \end{aligned}

    \displaystyle f(x)=\frac{d}{dx} \biggl( \frac{x^\gamma}{x^\gamma+\theta^\gamma} \biggr)=\frac{\gamma \ (x/\theta)^\gamma}{x [1+(x/\theta)^\gamma]^2}

Any distribution that has any one of the above three distributional quantities is said to be a loglogistic distribution with shape parameter \gamma and scale parameter \theta.

One interesting point about loglogistic distribution that an inverse loglogistic distribution is another loglogistic distribution. Suppose that X has a loglogistic distribution with shape parameter \gamma and scale parameter \theta. Let Y=\frac{1}{X}. Then Y has a loglogistic distribution with shape parameter \gamma and scale parameter \theta^{-1}.

    \displaystyle \begin{aligned} P[Y \le y]&=P[\frac{1}{X} \le y] =P[X \ge y^{-1}] =\frac{\theta^\gamma}{y^{-\gamma}+\theta^\gamma} \\&=\frac{\theta^\gamma \ y^\gamma}{1+\theta^\gamma \ y^\gamma} \\&=\frac{y^\gamma}{(\theta^{-1})^\gamma+y^\gamma} \end{aligned}

The above is a survival function for the loglogistic distribution with the desired parameters. Thus there is no need to specially call out the inverse loglogistic distribution.

In order to find the mean and higher moments of the loglogistic distribution, we take the approach of identifying the conditional Weibull means and the weight these means by the exponential mixing weights. Note that the parameter \Lambda in the conditional CDF F(x \lvert \Lambda)=1-e^{-\Lambda \ x^\gamma} is not a scale parameter. The Weibull distribution in this conditional CDF is equivalent to a Weibull distribution with shape parameter \gamma and scale parameter \Lambda^{-1/\gamma}. According to formula (4) in this previous post, the kth moment of this Weillbull distribution is

    \displaystyle E[ (X \lvert \Lambda)^k]=\Gamma \biggl(1+\frac{k}{\gamma} \biggr) \Lambda^{-k/\gamma}

The following gives the unconditional kth moment of the Weibull-exponential mixure.

    \displaystyle \begin{aligned} E[X^k]&=\int_0^\infty E[ (X \lvert \Lambda)^k] \ g(\lambda) \ d \lambda \\&=\int_0^\infty \Gamma \biggl(1+\frac{k}{\gamma} \biggr) \lambda^{-k/\gamma} \ \theta^\gamma \ e^{-\theta^\gamma \ \lambda} \ d \lambda\\&=\Gamma \biggl(1+\frac{k}{\gamma} \biggr) \ \theta^\gamma \int_0^\infty  \lambda^{-k/\gamma} \ e^{-\theta^\gamma \ \lambda} \ d \lambda \\&=\theta^k \ \Gamma \biggl(1+\frac{k}{\gamma} \biggr)  \int_0^\infty  t^{-k/\gamma} \ e^{-t} \ dt \ \ \text{ where } t=\theta^\gamma \lambda \\&=\theta^k \ \Gamma \biggl(1+\frac{k}{\gamma} \biggr) \int_0^\infty  t^{[(\gamma-k)/\gamma]-1} \ e^{-t} \ dt   \\&=\theta^k \ \Gamma \biggl(1+\frac{k}{\gamma} \biggr) \ \Gamma \biggl(1-\frac{k}{\gamma} \biggr) \ \ \ \ -\gamma<k<\gamma  \end{aligned}

The range \gamma<k<\gamma follows from the fact that the arguments of the gamma function must be positive. Thus the kth moments of the loglogistic distribution are limited by its shape parameter \gamma. If \gamma=1, then E(X) does not exist. For a larger \gamma, more moments exist but always a finite number of moments. This is an indication that the loglogistic distribution has a thick (right) tail. This is not surprising since mixture distributions (loglogistic in this case) tend to have thicker tails than the conditional distributions (Weibull in this case). The thicker tail is a result of the uncertainty in the random parameter in the conditional distribution (the Weibull \Lambda in this case).

Another Way to Obtain Exponential Distribution

We now consider Example 7. The following is a precise statement of the gamma-geometric mixture.

Example 7. Suppose that X \lvert \alpha has a gamma distribution with shape parameter \alpha that is a positive integer and rate parameter \beta (a known constant). Further suppose that the random parameter \alpha follows a geometric distribution with probability function P[Y=\alpha]=p (1-p)^{\alpha-1} where \alpha=1,2,3,\cdots. Then the unconditional distribution for X is an exponential distribution with rate parameter \beta p.

The conditional gamma distribution has an uncertain shape parameter \alpha that can take on positive integers. The parameter \alpha follows a geometric distribution. Here’s the ingredients that go into the mixture.

    \displaystyle f(x \lvert \alpha)=\frac{1}{(\alpha-1)!} \ \beta^\alpha \ x^{\alpha-1} \ e^{-\beta x}

    P[Y=\alpha]=p (1-p)^{\alpha-1}

The following is the unconditional probability density function of X.

    \displaystyle \begin{aligned} f(x)&=\sum \limits_{\alpha=1}^\infty f(x \lvert \alpha) \ P[Y=\alpha] \\&=\sum \limits_{\alpha=1}^\infty \frac{1}{(\alpha-1)!} \ \beta^\alpha \ x^{\alpha-1} \ e^{-\beta x} \ p (1-p)^{\alpha-1} \\&=\beta p \ e^{-\beta x} \sum \limits_{\alpha=1}^\infty \frac{[\beta(1-p) x]^{\alpha-1}}{(\alpha-1)!} \\&=\beta p \ e^{-\beta x} \sum \limits_{\alpha=0}^\infty \frac{[\beta(1-p) x]^{\alpha}}{(\alpha)!} \\&=\beta p \ e^{-\beta x} \ e^{\beta(1-p) x} \end{aligned}

The above density function is that of an exponential distribution with rate parameter \beta p.

Student t Distribution

Example 3 (discussed in the previous post) involves a normal distribution with a random mean. Example 8 involves a normal distribution with mean 0 and an uncertain variance, which follows a gamma distribution such that the two gamma parameters are related to a common parameter r, which will be the degrees of freedom of the student t distribution. The following is a precise description of the normal-gamma mixture.

Example 8. Suppose that X \lvert \Lambda has a normal distribution with mean 0 and variance 1/\Lambda. Further suppose that the random parameter \Lambda follows a gamma distribution with shape parameter \alpha and scale parameter \theta such that 2 \alpha=\frac{2}{\theta}=r is a positive integer. Then the unconditional distribution for X is a student t distribution with r degrees of freedom.

The following gives the ingredients of the normal-gamma mixture. The first item is the conditional density function of X given \Lambda. The second is the density function of the mixing weight \Lambda.

    \displaystyle f(x \lvert \lambda)=\frac{1}{\sqrt{1/\lambda} \ \sqrt{2 \pi}} \ e^{-(\lambda/2) \  x^2}=\sqrt{\frac{\lambda}{2 \pi}} \ e^{-(\lambda/2) \  x^2}

    \displaystyle g(\lambda)=\frac{1}{\Gamma(\alpha)} \biggl( \frac{1}{\theta} \biggr)^\alpha \ \lambda^{\alpha-1} \ e^{-\lambda/\theta}

The following calculation derives the unconditional density function of X.

    \displaystyle \begin{aligned} f(x)&=\int_{0}^\infty f(x \lvert \lambda) \ g(\lambda) \ d \lambda \\&=\int_{0}^\infty \sqrt{\frac{\lambda}{2 \pi}} \ e^{-(\lambda/2) \  x^2} \ \frac{1}{\Gamma(\alpha)} \biggl( \frac{1}{\theta} \biggr)^\alpha \ \lambda^{\alpha-1} \ e^{-\lambda/\theta} \ d \lambda \\&=\frac{1}{\Gamma(\alpha)} \ \biggl( \frac{1}{\theta} \biggr)^\alpha \ \frac{1}{\sqrt{2 \pi}} \int_0^\infty \lambda^{\alpha+\frac{1}{2}-1} e^{-(\frac{x^2}{2}+\frac{1}{\theta} ) \lambda} \ d \lambda \\&=\frac{\Gamma(\alpha+\frac{1}{2})}{\Gamma(\alpha)} \ \biggl( \frac{1}{\theta} \biggr)^\alpha \ \frac{1}{\sqrt{2 \pi}} \ \biggl(\frac{2 \theta}{\theta x^2+2} \biggr)^{\alpha+\frac{1}{2}} \\& \times \int_0^\infty \frac{1}{\Gamma(\alpha+\frac{1}{2})} \ \biggl(\frac{\theta x^2+2}{2 \theta} \biggr)^{\alpha+\frac{1}{2}} \lambda^{\alpha+\frac{1}{2}-1} e^{-\frac{\theta x^2+2}{2 \theta} \lambda} \ d \lambda \\&=\frac{\Gamma(\alpha+\frac{1}{2})}{\Gamma(\alpha)} \ \biggl( \frac{1}{\theta} \biggr)^\alpha \ \frac{1}{\sqrt{2 \pi}} \ \biggl(\frac{2 \theta}{\theta x^2+2} \biggr)^{\alpha+\frac{1}{2}} \ \ \ \ \ -\infty<x<\infty \end{aligned}

The above density function is in terms of the two parameters \alpha and \theta. In the assumptions, the two parameters are related to a common parameter r such that \alpha=\frac{r}{2} and \theta=\frac{2}{r}. The following derivation converts to the common r.

    \displaystyle \begin{aligned} f(x)&=\frac{\Gamma(\frac{r}{2}+\frac{1}{2})}{\Gamma(\frac{r}{2})} \ \biggl( \frac{r}{2} \biggr)^{\frac{r}{2}} \ \frac{1}{\sqrt{2 \pi}} \ \biggl(\frac{2 \frac{2}{r}}{\frac{2}{r} x^2+2} \biggr)^{\frac{r}{2}+\frac{1}{2}} \\&=\frac{\Gamma(\frac{r}{2}+\frac{1}{2})}{\Gamma(\frac{r}{2})} \ \frac{r^{r/2}}{2^{r/2}} \ \frac{1}{2^{1/2} \sqrt{\pi}} \ \biggl(\frac{2/r}{x^2/r+1} \biggr)^{(r+1)/2} \\&=\frac{\Gamma \biggl(\displaystyle \frac{r+1}{2} \biggr)}{\Gamma \biggl(\displaystyle \frac{r}{2} \biggr)} \ \frac{1}{\sqrt{\pi r}} \ \frac{1 \ \ \ \ \ }{\biggl(1+\displaystyle \frac{x^2}{r} \biggr)^{(r+1)/2}} \ \ \ \ \ -\infty<x<\infty \end{aligned}

The above density function is that of a student t distribution with r degrees of freedom. Of course, in performing test of significance, the t distribution is accessed by using tables or software. A usual textbook definition of the student t distribution is the ratio of a normal distribution and a chi-squared distribution (see Theorem 6 in this previous post.

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

Transformed Pareto distribution

One way to generate new probability distributions from old ones is to raise a distribution to a power. Two previous posts are devoted on this topic – raising exponential distribution to a power and raising a gamma distribution to a power. Many familiar and useful models can be generated in this fashion. For example, Weibull distribution is generated by raising an exponential distribution to a positive power. This post discusses the raising of a Pareto distribution to a power, as a result generating Burr distribution and inverse Burr distribution.

Raising to a Power

Let X be a random variable. Let \tau be a positive constant. The random variables Y=X^{1/\tau}, Y=X^{-1} and Y=X^{-1/\tau} are called transformed, inverse and inverse transformed, respectively.

Let f_X(x), F_X(x) and S_X(x)=1-F_X(x) be the probability density function (PDF), the cumulative distribution function (CDF) and the survival function of the random variable X (the base distribution). The goal is to express the CDFs of the “transformed” variables in terms of the base CDF F_X(x). The following table shows how.

Name of Distribution Random Variable CDF
Transformed Y=X^{1 / \tau}, \ \tau >0 F_Y(y)=F_X(y^\tau)
Inverse Y=X^{-1} F_Y(y)=1-F_X(y^{-1})
Inverse Transformed Y=X^{-1 / \tau}, \ \tau >0 F_Y(y)=1-F_X(y^{-\tau})

If the CDF of the base distribution, as represented by the random variable X, is known, then the CDF of the “transformed” distribution can be derived using F_X(x) as shown in this table. Thus the CDF, in many cases, is a good entry point of the transformed distribution.

Pareto Information

Before the transformation, we first list out the information on the Pareto distribution. The Pareto distribution of interest here is the Type II Lomax distribution (discussed here). The following table gives several distributional quantities for a Pareto distribution with shape parameter \alpha and scale parameter \theta.

Pareto Type II Lomax
Survival Function S(x)=\displaystyle  \biggl( \frac{\theta}{x+\theta} \biggr)^\alpha  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x >0
Cumulative Distribution Function F(x)=1-\displaystyle  \biggl( \frac{\theta}{x+\theta} \biggr)^\alpha \ \ \ \ \ \ \ \ \ \ \ \ \ x >0
Probability Density Function \displaystyle f(x)=\frac{\alpha \ \theta^\alpha}{(x+\theta)^{\alpha+1}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \  x >0
Mean \displaystyle E(X)=\frac{\theta}{\alpha-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>1
Median \displaystyle \theta \ 2^{\frac{\alpha}{2}}-\theta
Mode 0
Variance \displaystyle Var(X)=\frac{\theta^2 \ \alpha}{(\alpha-1)^2 \ (\alpha-2)} \ \ \ \ \ \ \ \alpha>2
Higher Moments \displaystyle E(X^k)=\frac{k! \ \theta^k}{(\alpha-1) \cdots (\alpha-k)} \ \ \ \ \ \ \alpha>k \ \ \ k is integer
Higher Moments \displaystyle E(X^k)=\frac{\theta^k \ \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} \ \ \ \ \alpha>k

The higher moments in the general case use \Gamma(\cdot), which is the gamma function.

The Distributions Derived from Pareto

Let X be a random variable that has a Pareto distribution (as described in the table in the preceding section). Assume that X has a shape parameter \alpha and scale parameter \theta. Let \tau be a positive number. When raising X to the power 1/\tau, the resulting distribution is a transformed Pareto distribution and is also called a Burr distribution, which then is a distribution with three parameters – \alpha, \theta and \tau.

When raising X to the power -1/\tau, the resulting distribution is an inverse transformed Pareto distribution and it is also called an inverse Burr distribution. When raising X to the power -1, the resulting distribution is an inverse Pareto distribution (it does not have a special name other than inverse Pareto).

The paralogistic family of distributions is created from the Burr distribution by collapsing two of the parameters into one. Let \alpha, \theta and \tau be the parameters of a Burr distribution. By equating \tau=\alpha, the resulting distribution is a paralogistic distribution. By equating \tau=\alpha in the corresponding inverse Burr distribution, the resulting distribution is an inverse paralogistic distribution.

Transformed Pareto = Burr

There are two ways to create the transformed Pareto distribution. One is to start with a base Pareto with shape parameter \alpha and scale parameter 1 and then raise it to 1/\tau. The scale parameter \theta is added at the end. Another way is to start with a base Pareto distribution with shape parameter \alpha and scale parameter \theta^\tau and then raise it to the power 1/\tau. Both ways would generate the same CDF. We take the latter approach since it generates both the CDF and moments quite conveniently.

Let X be a Pareto distribution with shape parameter \alpha and scale parameter \theta^\tau. The following table gives the distribution information on Y^{1/\tau}.

Burr Distribution
CDF F_Y(y)=\displaystyle  1-\biggl( \frac{1}{(y/\theta )^\tau+1} \biggr)^\alpha y >0
Survival Function S_Y(x)=\displaystyle \biggl( \frac{1}{(y/\theta )^\tau+1} \biggr)^\alpha y >0
Probability Density Function \displaystyle f_Y(y)=\frac{\alpha \ \tau \ (y/\theta)^\tau}{y \ [(y/\theta)^\tau+1 ]^{\alpha+1}} y >0
Mean \displaystyle E(Y)=\frac{\theta \ \Gamma(1/\tau+1) \Gamma(\alpha-1/\tau)}{\Gamma(\alpha)} 1 <\alpha \ \tau
Median \displaystyle \theta \ (2^{1/\alpha}-1)^{1/\tau}
Mode \displaystyle \theta \ \biggl(\frac{\tau-1}{\alpha \tau+1} \biggr)^{1/\tau} \tau >1, else 0
Higher Moments \displaystyle E(Y^k)=\frac{\theta^k \ \Gamma(k/\tau+1) \Gamma(\alpha-k/\tau)}{\Gamma(\alpha)} -\tau<k <\alpha \ \tau

The distribution displayed in the above table is a three-parameter distribution. It is called the Burr distribution with parameters \alpha (shape), \theta (scale) and \tau (power).

To obtain the moments, note that E(Y^k)=E(X^{k/\tau}), which is derived using the Pareto moments. The Burr CDF has a closed form that is relatively easy to compute. Thus percentiles are very accessible. The moments rely on the gamma function and are usually calculated by software.

Inverse Transformed Pareto = Inverse Burr

One way to generate inverse transformed Pareto distribution is to raise a Pareto distribution with shape parameter \alpha and scale parameter 1 to the power of -1 and then add the scale parameter. Another way is to raise a Pareto distribution with shape parameter \alpha and scale parameter \theta^{-\tau}. Both ways derive the same CDF. As in the preceding case, we take the latter approach.

Let X be a Pareto distribution with shape parameter \alpha and scale parameter \theta^{-\tau}. The following table gives the distribution information on Y^{-1/\tau}.

Inverse Burr Distribution
CDF F_Y(y)=\displaystyle  \biggl( \frac{(y/\theta)^\tau}{(y/\theta )^\tau+1} \biggr)^\alpha y >0
Survival Function S_Y(x)=\displaystyle 1-\biggl( \frac{(y/\theta)^\tau}{(y/\theta )^\tau+1} \biggr)^\alpha y >0
Probability Density Function \displaystyle f_Y(y)=\frac{\alpha \ \tau \ (y/\theta)^{\tau \alpha}}{y \ [1+(y/\theta)^\tau]^{\alpha+1}} y >0
Mean \displaystyle E(Y)=\frac{\theta \ \Gamma(1-1/\tau) \Gamma(\alpha+1/\tau)}{\Gamma(\alpha)} 1 <\tau
Median \displaystyle \theta \ \biggl[\frac{1}{ 2^{1/\alpha}-1} \biggr]^{1/\tau}
Mode \displaystyle \theta \ \biggl(\frac{\alpha \tau-1}{\tau+1} \biggr)^{1/\tau} \alpha \tau >1, else 0
Higher Moments \displaystyle E(Y^k)=\frac{\theta^k \ \Gamma(1-k/\tau) \Gamma(\alpha+k/\tau)}{\Gamma(\alpha)} -\alpha \tau<k <\tau

The distribution displayed in the above table is a three-parameter distribution. It is called the Inverse Burr distribution with parameters \alpha (shape), \theta (scale) and \tau (power).

Note that both the moments for Burr and inverse Burr distributions are limited, the Burr limited by the product of the parameters \alpha and \tau and the inverse Burr limited by the parameter \tau. This is not surprising since the base Pareto distribution has limited moments. This is one indication that all of these distributions have a heavy right tail.

The Paralogistic Family

With the facts of the Burr distribution and the inverse Burr distribution established, paralogistic and inverse paralogistic distributions can now be obtained. A paralogistic distribution is simply a Burr distribution with \tau=\alpha. An inverse paralogistic distribution is simply an inverse Burr distribution with \tau=\alpha. In the above tables for Burr and inverse Burr, replacing \tau by \alpha gives the following table.

Paralogistic Distribution
CDF F_Y(y)=\displaystyle  1-\biggl( \frac{1}{(y/\theta )^\alpha+1} \biggr)^\alpha y >0
Survival Function S_Y(x)=\displaystyle \biggl( \frac{1}{(y/\theta )^\alpha+1} \biggr)^\alpha y >0
Probability Density Function \displaystyle f_Y(y)=\frac{\alpha^2 \ \ (y/\theta)^\alpha}{y \ [(y/\theta)^\alpha+1 ]^{\alpha+1}} y >0
Mean \displaystyle E(Y)=\frac{\theta \ \Gamma(1/\alpha+1) \Gamma(\alpha-1/\alpha)}{\Gamma(\alpha)} 1 <\alpha^2
Median \displaystyle \theta \ (2^{1/\alpha}-1)^{1/\alpha}
Mode \displaystyle \theta \ \biggl(\frac{\alpha-1}{\alpha^2+1} \biggr)^{1/\alpha} \alpha >1, else 0
Higher Moments \displaystyle E(Y^k)=\frac{\theta^k \ \Gamma(k/\alpha+1) \Gamma(\alpha-k/\alpha)}{\Gamma(\alpha)} -\alpha<k <\alpha^2
Inverse Paralogistic Distribution
CDF F_Y(y)=\displaystyle  \biggl( \frac{(y/\theta)^\alpha}{(y/\theta )^\alpha+1} \biggr)^\alpha y >0
Survival Function S_Y(x)=\displaystyle 1-\biggl( \frac{(y/\theta)^\alpha}{(y/\theta )^\alpha+1} \biggr)^\alpha y >0
Probability Density Function \displaystyle f_Y(y)=\frac{\alpha^2 \ (y/\theta)^{\alpha^2}}{y \ [1+(y/\theta)^\alpha]^{\alpha+1}} y >0
Mean \displaystyle E(Y)=\frac{\theta \ \Gamma(1-1/\alpha) \Gamma(\alpha+1/\alpha)}{\Gamma(\alpha)} 1 <\alpha
Median \displaystyle \theta \ \biggl[\frac{1}{ 2^{1/\alpha}-1} \biggr]^{1/\alpha}
Mode \displaystyle \theta \ (\alpha-1)^{1/\alpha} \alpha^2 >1, else 0
Higher Moments \displaystyle E(Y^k)=\frac{\theta^k \ \Gamma(1-k/\alpha) \Gamma(\alpha+k/\alpha)}{\Gamma(\alpha)} -\alpha^2<k <\alpha

Inverse Pareto Distribution

The distribution that has not been discussed is the inverse Pareto. Again, we have the option of deriving it by raising to a base Pareto with just the shape parameter to -1 and then add the scale parameter. We take the approach of raising a base Pareto distribution with shape parameter \alpha and scale parameter \theta^{-1}. Both approaches lead to the same CDF.

Inverse Pareto Distribution
CDF F_Y(y)=\displaystyle  \biggl( \frac{y}{\theta+y} \biggr)^\alpha y >0
Survival Function S_Y(x)=\displaystyle 1-\biggl( \frac{y}{\theta+y} \biggr)^\alpha y >0
Probability Density Function \displaystyle f_Y(y)=\frac{\alpha \ \theta \ y^{\alpha-1}}{[\theta+y ]^{\alpha+1}} y >0
Median \displaystyle \frac{\theta}{2^{1/\alpha}-1}
Mode \displaystyle \theta \ \frac{\alpha-1}{2} \alpha >1, else 0
Higher Moments \displaystyle E(Y^k)=\frac{\theta^k \ \Gamma(1-k) \Gamma(\alpha+k)}{\Gamma(\alpha)} -\alpha<k <1

The distribution described in the above table is an inverse Pareto distribution with parameters \alpha (shape) and \theta (scale). Note that the moments are even more limited than the Burr and inverse Burr distributions. For inverse Pareto, even the mean E(Y) is nonexistent.

Remarks

The Burr and paralogistic families of distributions are derived from the Pareto family (Pareto Type II Lomax). The Pareto connection helps put Burr and paralogistic distributions in perspective. The Pareto distribution itself can be generated as a mixture of exponential distributions with gamma mixing weight (see here). Thus from basic building blocks (exponential and gamma), vast families of distributions can be created, thus expanding the toolkit for modeling. The distributions discussed here are found in the appendix that is found in this link.

\copyright 2017 – Dan Ma

Pareto Distribution

The Pareto distribution is a power law probability distribution. It was named after the Italian civil engineer, economist and sociologist Vilfredo Pareto, who was the first to discover that income follows what is now called Pareto distribution, and who was also known for the 80/20 rule, according to which 20% of all the people receive 80% of all income. This post is a discussion on the mathematical properties of this distribution and its applications.

_______________________________________________________________________________________________

Pareto Distribution of Type I

There are several types of the Pareto distribution. Let’s start with Type I. The random variable X is said to follow a Type I Pareto distribution if the following is the survival function,

    \displaystyle  S(x)=P(X>x)=\left\{ \begin{array}{ll}                     \displaystyle  \biggl( \frac{x_m}{x} \biggr)^\alpha &\ x \ge x_m \\           \text{ } & \text{ } \\           \displaystyle  1 &\ x<x_m           \end{array} \right.

where x_m and \alpha are both positive parameters. The support of the distribution is the interval [x_m,\infty). The parameter x_m is a scale parameter and \alpha is a shape parameter. The parameter \alpha is also known as the tail index. When the Pareto distribution is used as a model of wealth or income, \alpha is also known as the Pareto index, which is a measure of the breath of the wealth distribution.

The following table lists out the cumulative distribution function (CDF) and the probability density function (PDF).

__________________________________________________________________________________________
Pareto Type I – Probability Functions

Survival Function S(x)=\displaystyle  \biggl( \frac{x_m}{x} \biggr)^\alpha \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x \ge x_m>0
Cumulative Distribution Function F(x)=1-\displaystyle  \biggl( \frac{x_m}{x} \biggr)^\alpha \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x \ge x_m>0
Probability Density Function \displaystyle f(x)=\frac{\alpha \ x_m^\alpha}{x^{\alpha+1}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \  x \ge x_m>0

__________________________________________________________________________________________

The following figure shows the graphs of the PDFs for the shape parameters \alpha=1,2,3.

Figure 1 – Pareto PDFs (Type I)

All the density curves in Figure 1 are skewed to the right and have a long tail. However, some tails are thicker than the others. It is noticeable that the curve with a higher value of \alpha approaches the x-axis faster, hence has a lighter tail comparing to the density curve with a lower value of \alpha. The role of \alpha is discussed further below. The following table lists out several more Pareto distributional quantities.

__________________________________________________________________________________________
Pareto Type I – Additional Distributional Quantities

Mean \displaystyle E(X)=\frac{\alpha \ x_m}{\alpha-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>1
Median \displaystyle x_m \ 2^{\frac{\alpha}{2}}
Mode x_m
Variance \displaystyle Var(X)=\frac{x_m^2 \ \alpha}{(\alpha-1)^2 \ (\alpha-2)} \ \ \ \ \ \ \ \alpha>2
Higher Moments \displaystyle E(X^k)=\frac{\alpha \ x_m^k}{\alpha-k} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>k
Skewness \displaystyle \frac{2(1+\alpha)}{\alpha-3} \ \sqrt{\frac{\alpha-2}{\alpha}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>3
Excess Kurtosis \displaystyle \frac{6(\alpha^3+\alpha^2-6 \alpha-2)}{\alpha (\alpha-3) (\alpha-4)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>4

__________________________________________________________________________________________

Given the survival function, it is straightforward to derive the CDF and the PDF. The mean and higher moments can also be derived by evaluating the integral \int_{x_m}^\infty x^k f(x) dx. Once the moments are obtained, other quantities that depend on moments can be derived (e.g. variance, skewness and excess kurtosis). The following gives the definition for these distributional quantities.

Distributional Quantity Definition
Moments \displaystyle E(X^k)=\int_{x_m}^\infty x^k f(x) \ dx
Variance Var(X)=E(X^2)-E(X)^2
Skewness \displaystyle \gamma_1=\frac{E[(X-\mu)^3]}{\sigma^3}=\frac{E(X^3)-3 \mu \sigma^2 - \mu^3}{(\sigma^2)^{\frac{3}{2}}}
Kurtosis \displaystyle \frac{E[(X-\mu)^4]}{\sigma^4}
Excess Kurtosis Kurtosis – 3

In these definitions, \mu and \sigma are the mean and standard deviation of a given distribution, respectively. Then \sigma^2 is the variance. The skewness of a distribution is the ratio of the third central moment E[(X-\mu)^3] to the cube of the standard deviation. The kurtosis is the ratio of the fourth central moment E[(X-\mu)^4] to the square of the variance. This previous post has a detailed discussion on the skewness.

_______________________________________________________________________________________________

A Closer Look at the Shape Parameter

The above tables show that the Pareto distribution is mathematically tractable, especially when it comes to the calculation of moments. Another observation is that the mean and other moments do not always exist. This stems from the fact that when the shape parameter \alpha is too small, the integral for the moment E(X^k) may not converge.

The mean exists only when the shape parameter \alpha is greater than 1. The variance exists only when the shape parameter \alpha is greater than 2. In general, the kth moment exists only when the shape parameter \alpha is greater than k. The larger the shape parameter \alpha, the more moments that can be calculated. All kth moments where k<\alpha can be calculated. However, the kth moment for any k>\alpha cannot be calculated.

Having moments that cannot exist is a sign that the distribution has a heavy tail. Let’s examine the graphs of Pareto survival functions and CDFs.

Figure 2 – Pareto Survival Functions (Type I)

Figure 2 shows the survival function S(x)=P(X>x) for three values of the shape parameter \alpha where x>1 (the scale parameter is 1). The following figure shows the corresponding cumulative distributions F(x)=1-S(x)=P(X \le x).

Figure 3 – Pareto CDFs (Type I)

The survival function S(x) is the probability of the right tail (x,\infty). On the other hand, the CDF F(x) is the probability put on the initial interval (x_m,x]. The sum of the two is obviously 1.0. One thing that stands out in Figure 2 is that the larger the \alpha, the faster the survival curve approaches zero and thus less probabilities are put on the right tail. In other words, more probabilities are attached to the lower values and thus the integral for the moments is more likely to converge when \alpha is larger. This explains that it is possible for more of the moments to exist for a Pareto distribution with a larger \alpha. Thus kth moments exists for the lower k when \alpha is larger, confirming the earlier observation.

Another thing to point out in Figure 2 is that the distribution with the larger \alpha has a lighter right tail and the one with a smaller \alpha has a heavier right tail. So within the Pareto family, a lower \alpha means a distribution with a heavier tail and a larger \alpha means a lighter tail.

A comparison with other families of distributions is also instructive. All moments exist for the gamma distributions (including exponential distributions) and for the lognormal distribution as well as the normal distribution. Moment generating functions also exist for all these distributions. In contrast, the moment generating function does not exist for Pareto distributions (otherwise all moments would exist). These are signs that the Pareto distributions are heavy tailed distributions. For a more in depth discussion of the tail weight of the Pareto family, see this blog post in an affiliated blog. The Pareto distribution discussed there is of Pareto Type II.

When the Pareto model is used as a model of lifetime of systems (machines or devices), a larger value of the shape parameter \alpha would mean that less “lives” surviving to old ages, equivalently more lives die off in relatively young ages (as discussed above this means a lighter right tail). If the Pareto model is used as a model of income or wealth of individuals, then a higher \alpha would mean a smaller proportion of the people are in the higher income brackets (or more people in the lower income ranges). Thus the shape parameter \alpha is called the Pareto index, which is a measure of the breath of income/wealth. The higher this measure, the less inequality in income.

_______________________________________________________________________________________________

Log-Linear Model

We now discuss the motivation behind the Pareto survival function. The Pareto distribution is a power law distribution. It is a model that can describe phenomena that behave in a log-linear fashion. Let’s revisit the original reasoning for using the Pareto survival function as a model of income.

Let N(x) be the number of people with income greater than x. Suppose that x_0 be the minimum income in the population in question. Then N(x_0) be the size of the entire population. Pareto proposed that N(x) can be modeled in a log-linear fashion:

    \log N(x)=\log C - \alpha \log x

where log is logarithm to the base e, C is a constant and \alpha is a positive parameter. If this relation holds, it would hold at the minimum income level x_0.

    \log N(x_0)=\log C - \alpha \log x_0

Letting the first relation subtract the second gives the following:

    \displaystyle \log \biggl[ \frac{N(x)}{N(x_0)} \biggr]=- \alpha \log \biggl[ \frac{x}{x_0} \biggr]

Raising the natural log constant e to each side gives the following:

    \displaystyle \frac{N(x)}{N(x_0)}=\biggl( \frac{x}{x_0} \biggr)^{- \alpha}

    \displaystyle \frac{N(x)}{N(x_0)}=\biggl( \frac{x_0}{x} \biggr)^{\alpha}

Note that the left hand side of the last equation is the proportion of the people having income greater than x, which of course is the survival function described at the beginning.

_______________________________________________________________________________________________

The Hierarchy of Pareto Distribution

The Pareto survival function discussed above is of Type I. The following lists out the survival functions of the other types.

__________________________________________________________________________________________
Pareto Distributions

Pareto Type Survival Function Support Parameters
Type I \displaystyle  \biggl[ \frac{x_m}{x} \biggr]^\alpha x>x_m \alpha>0, x_m>0
Lomax \displaystyle  \biggl[ \frac{x_m}{x+x_m} \biggr]^\alpha x>0 \alpha>0, x_m>0
Type II \displaystyle  \biggl[\frac{x_m}{(x-\mu)+x_m} \biggr]^\alpha x>\mu \mu, \alpha>0, x_m>0
Type III \displaystyle  \biggl[\frac{(x_m)^{\frac{1}{\gamma}}  }{(x-\mu)^{\frac{1}{\gamma}}+(x_m)^{\frac{1}{\gamma}}} \biggr] x>\mu \mu, \gamma>0, x_m>0
Type IV \displaystyle  \biggl[\frac{(x_m)^{\frac{1}{\gamma}}  }{(x-\mu)^{\frac{1}{\gamma}}+(x_m)^{\frac{1}{\gamma}}} \biggr]^\alpha x>\mu \mu, \alpha>0, \gamma>0, x_m>0

__________________________________________________________________________________________

The lower types are special cases of the higher types. For example, Lomax is Type I shifted to the left by the amount x_m. Type II with \mu=0 becomes Lomax. Type III with \gamma=1 becomes Type II with \alpha=1. Type IV with \alpha=1 becomes Type III.

We discuss Type II Lomax in the next section. For the other types, see the Pareto Wikipedia entry.

_______________________________________________________________________________________________

Pareto Type II Lomax

The Pareto distribution of Lomax type is the result of shifting Type I to the left by the amount x_m, the scale parameter in Pareto Type I. As a result, the support is now the entire positive x-axis. Some of the mathematical properties of the Lomax Type can be derived by making the appropriate shifting. For the sake of completeness, the following table lists out some of the basic distributional quantities. The scale parameter x_m is renamed \theta.

__________________________________________________________________________________________
Pareto Type I Lomax – Distributional Quantities

Survival Function S(x)=\displaystyle  \biggl( \frac{\theta}{x+\theta} \biggr)^\alpha  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x >0
Cumulative Distribution Function F(x)=1-\displaystyle  \biggl( \frac{\theta}{x+\theta} \biggr)^\alpha \ \ \ \ \ \ \ \ \ \ \ \ \ x >0
Probability Density Function \displaystyle f(x)=\frac{\alpha \ \theta^\alpha}{(x+\theta)^{\alpha+1}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \  x >0
Mean \displaystyle E(X)=\frac{\theta}{\alpha-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>1
Median \displaystyle \theta \ 2^{\frac{\alpha}{2}}-\theta
Mode 0
Variance \displaystyle Var(X)=\frac{\theta^2 \ \alpha}{(\alpha-1)^2 \ (\alpha-2)} \ \ \ \ \ \ \ \alpha>2
Higher Moments \displaystyle E(X^k)=\frac{k! \ \theta^k}{(\alpha-1) \cdots (\alpha-k)} \ \ \ \ \ \ \alpha>k \ \ \ k is integer

To help see the shifting, let Y be a Pareto Type I random variable with shape parameter \alpha and scale parameter \theta. Then X=Y-\theta is a Pareto Type II Lomax random variable. Immediately, E(X)=E(Y)-\theta, which simplified to \frac{\theta}{\alpha-1}. On the other hand, shifting by a constant does not change the variance. If S_Y(x) and S_X(x) represent the survival functions for Y and X, respectively, then S_X(x)=S_Y(x+\theta). The same can be said about the CDFs and PDFs.

Another interesting fact about Pareto Lomax type is that it is the mixture of exponential distributions with gamma mixing weight. An insurance interpretation is a good motivation. Suppose that the loss arising from an insured randomly selected from a large group of insureds follow an exponential distribution with the following probability density function:

    f_{X \lvert \Lambda}(x \lvert \lambda)= \lambda \ e^{-\lambda x} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x>0

The above density function is from an exponential distribution. However it is conditional one since the parameter \Lambda=\lambda is uncertain. Since the density function f_{X \lvert \Lambda}(x \lvert \lambda) is that of an exponential distribution, the mean claim cost for this insured is \frac{1}{\lambda}. So the parameter \Lambda=\lambda reflects the risk characteristics of the insured. Suppose this is a large pool of insureds. Then there is uncertainty in the parameter \Lambda=\lambda. It is more appropriate to regard \Lambda as a random variable in order to capture the wide range of risk characteristics across the individuals in the population. As a result, the pdf indicated above is not an unconditional pdf, but, rather, a conditional pdf of X. Suppose that the uncertain parameter \Lambda follows a gamma distribution with shape parameter \alpha and scale parameter \theta with the following PDF.

    \displaystyle g(\lambda)=\frac{\theta^\alpha}{\Gamma(\alpha)} \ \lambda^{\alpha-1} \ e^{-\theta \lambda} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \lambda>0

where \Gamma(\cdot) is the gamma function. The gamma distribution has been written extensively in this blog. Here is a post on the gamma function ans here is an introduction on the gamma distribution.

The unconditional density function of X is then the weighted average of the conditional density f_{X \lvert \Lambda}(x \lvert \lambda) weighted by the above gamma density function.

    \displaystyle \begin{aligned} f_X(x)&=\int_0^\infty f_{X \lvert \Lambda}(x \lvert \lambda) \ g(\lambda) \ d \lambda \\&=\int_0^\infty \biggl( \lambda \ e^{-\lambda x} \biggr) \ \biggl( \frac{\theta^\alpha}{\Gamma(\alpha)} \ \lambda^{\alpha-1} \ e^{-\theta \lambda} \biggr) \ d \lambda \\&=\int_0^\infty \frac{\theta^\alpha}{\Gamma(\alpha)} \lambda^\alpha e^{-(\theta+x) \lambda} \ d \lambda \\&=\frac{\theta^\alpha}{\Gamma(\alpha)} \frac{\Gamma(\alpha+1)}{(\theta+x)^{\alpha+1}} \int_0^\infty \frac{(\theta+x)^{\alpha+1}}{\Gamma(\alpha+1)} \ \lambda^{\alpha+1-1} \ e^{-(\theta+x) \lambda} \ d \lambda \\&=\frac{\alpha \theta^{\alpha}}{(\theta+x)^{\alpha+1}} \end{aligned}

The above derivation shows that the unconditional density function of X is a Pareto Lomax density function. Thus if each individual insured in a large pool of insureds has an exponential claim cost distribution where the rate parameter \Lambda is distributed according to a gamma distribution, then the unconditional claim cost for a randomly selected insured is distributed according to a Pareto Lomax distribution. Mathematically speaking, the Pareto Lomax distribution is a mixture of exponential distributions with gamma mixing weights.

In the above discussion, we comment that Pareto Type I distribution has a heavy tail as compared to other distribution. One of the tell tale signs is that not all moments exist in a Pareto distribution. Th Pareto Lomax distribution is also a heavy tailed distribution. This blog post in an affiliated blog has a detailed discussion. The discussion in that blog post examines Pareto Lomax as a heavy tailed distribution in four perspectives: existence of moments, speed of decay of the survival function to zero, hazard rate function, and mean excess loss function. Another blog post discusses the Pareto Lomax distribution as a mixture of exponential distributions with gamma mixing weights.

_______________________________________________________________________________________________

Remarks

The Pareto distribution is positively skewed and has a heavy tail on the right. It is an excellent model for extreme phenomena, e.g. the long tail contains 80% or more of the probabilities. It is originally applied as a model to describe income and wealth of a country. In insurance applications, heavy-tailed distributions such as Pareto are essential tools for modeling extreme loss, especially for the more risky types of insurance such as medical malpractice insurance. In financial applications, the study of heavy-tailed distributions provides information about the potential for financial fiasco or financial ruin.

For more information on the mathematical aspects of the Pareto distribution, refer to the text by Johnson and Kotz. For an actuarial perspective, refer to the text Loss Models.

_______________________________________________________________________________________________

Reference

  1. Johnson N. L., Kotz S., Continuous Univariate Distributions – I, Hougton Mifflin Company, Boston, 1970
  2. Klugman S.A., Panjer H. H., Wilmot G. E., Loss Models, From Data to Decisions, Fourth Edition, Wiley-Interscience, a John Wiley & Sons, Inc., New York, 2012.

_______________________________________________________________________________________________
\copyright 2017 – Dan Ma