Examples of mixtures

The notion of mixtures is discussed in this previous post. Many probability distributions useful for actuarial modeling are mixture distributions. The previous post touches on some examples – negative binomial distribution (a Poisson-Gamma mixture), Pareto distribution (an exponential-gamma mixture) and the normal-normal mixture. In this post we present additional examples. We discuss the following examples.

  1. Poisson-Gamma mixture = Negative Binomial.
  2. Normal-Normal mixture = Normal.
  3. Exponential-Gamma mixture = Pareto.
  4. Exponential-Inverse Gamma mixture = Pareto.
  5. Gamma-Gamma mixture = Generalized Pareto.
  6. Weibull-Exponential mixture = Loglogistic.
  7. Gamma-Geometric mixture = Exponential.
  8. Normal-Gamma mixture = Student t.

The first three examples are discussed in the previous post. We discuss the remaining examples in this post.

The Pareto Family

Examples 3 and 4 show that Pareto distributions are mixtures of exponential distributions with either gamma or inverse gamma mixing weights. In Example 3, X \lvert \Theta is an exponential distribution with \Theta being a rate parameter. When \Theta follows a gamma distribution, the resulting mixture is a (Type I Lomax) Pareto distribution. In Example 4, X \lvert \Theta is an exponential distribution with \Theta being a scale parameter. When \Theta follows an inverse gamma distribution, the resulting mixture is also a (Type I Lomax) Pareto distribution.

As a mixture, Example 5 is like Example 3, except that it is a gamma-gamma mixture resulting in a generalized Pareto distribution. Example 3 has been discussed in the previous post. We now discuss Example 4 and Example 5.

Example 4. Suppose that X \lvert \Theta has an exponential distribution where \Theta is a scale parameter.
Further suppose that the random parameter \Theta follows an inverse gamma distribution with parameters \alpha and \beta. Then the unconditional distribution for X is a (Type I Lomax) Pareto distribution with shape parameter \alpha and scale parameter \beta.

The following gives the cumulative distribution function (CDF) and survival function of the conditional random variable X \lvert \Theta.

    F(x \lvert \Theta)=1-e^{- x/\Theta}

    S(x \lvert \Theta)=e^{- x/\Theta}

The random parameter \Theta follows an inverse gamma distribution with parameters \alpha and \beta. The following is the pdf of \Theta:

    \displaystyle g(\theta)=\frac{1}{\Gamma(\alpha)} \ \biggl[\frac{\beta}{\theta}\biggr]^\alpha \ \frac{1}{\theta} \ e^{-\frac{\beta}{ \theta}} \ \ \ \ \ \theta>0

We show that the unconditional survival function for X is the survival function for the Pareto distribution with parameters \alpha (shape parameter) and \beta (scale parameter).

    \displaystyle \begin{aligned} S(x)&=\int_0^\infty S(x \lvert \theta) \ g(\theta) \ d \theta \\&=\int_0^\infty e^{- x/\theta} \ \frac{1}{\Gamma(\alpha)} \ \biggl[\frac{\beta}{\theta}\biggr]^\alpha \ \frac{1}{\theta} \ e^{-\beta / \theta} \ d \theta \\&=\int_0^\infty \frac{1}{\Gamma(\alpha)} \ \biggl[\frac{\beta}{\theta}\biggr]^\alpha \ \frac{1}{\theta} \ e^{-(x+\beta) / \theta} \ d \theta \\&=\frac{\beta^\alpha}{(x+\beta)^\alpha} \ \int_0^\infty \frac{1}{\Gamma(\alpha)} \ \biggl[\frac{x+\beta}{\theta}\biggr]^\alpha \ \frac{1}{\theta} \ e^{-(x+\beta) / \theta} \ d \theta \\&=\biggl(\frac{\beta}{x+\beta} \biggr)^\alpha \end{aligned}

Note that the the integrand in the last integral is a density function for an inverse gamma distribution. Thus the integral is 1 and can be eliminated. The result that remains is the survival function for a Pareto distribution with parameters \alpha and \beta. The following gives the CDF and density function of this Pareto distribution.

    \displaystyle F(x)=1-\biggl(\frac{\beta}{x+\beta} \biggr)^\alpha

    \displaystyle f(x)=\frac{\alpha \ \beta^{\alpha}}{(x+\beta)^{\alpha+1}}

See here for further information on Pareto Type I Lomax distribution.

Example 5. Suppose that X \lvert \Theta has a gamma distribution with shape parameter k (a known constant) and rate parameter \Theta. Further suppose that the random parameter \Theta follows a gamma distribution with shape parameter \alpha and rate parameter \beta. Then the unconditional distribution for X is a generalized Pareto distribution with parameters \alpha, \beta and k.

Conditional on \Theta=\theta, the following is the density function of X.

    \displaystyle f(x \lvert \theta)=\frac{1}{\Gamma(k)} \ \theta^k \ x^{k-1} \ e^{-\theta x}  \ \ \ \ \ x>0

The following is the density function of the random parameter \Theta.

    \displaystyle g(\theta)=\frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta} \ \ \ \ \ \ \theta>0

The following gives the unconditional density function for X.

    \displaystyle \begin{aligned} f(x)&=\int_0^\infty  f(x \lvert \theta) \ g(\theta) \ d \theta \\&=\int_0^\infty  \frac{1}{\Gamma(k)} \ \theta^k \ x^{k-1} \ e^{-\theta x} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta} \ d \theta \\&=\int_0^\infty \frac{1}{\Gamma(k)} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ x^{k-1} \ \theta^{\alpha+k-1} \ e^{-(x+\beta) \theta} \ d \theta \\&= \frac{1}{\Gamma(k)} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ x^{k-1} \frac{\Gamma(\alpha+k)}{(x+\beta)^{\alpha+k}} \int_0^\infty \frac{1}{\Gamma(\alpha+k)} \ (x+\beta)^{\alpha+k} \ \theta^{\alpha+k-1} \ e^{-(x+\beta) \theta} \ d \theta \\&=\frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \frac{\beta^\alpha \ x^{k-1}}{(x+\beta)^{\alpha+k}} \end{aligned}

Any distribution that has a density function described above is said to be a generalized Pareto distribution with the parameters \alpha, \beta and k. Its CDF cannot be written in closed form but can be expressed using the incomplete beta function.

    \displaystyle \begin{aligned} F(x)&=\int_0^x  \frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \frac{\beta^\alpha \ t^{k-1}}{(t+\beta)^{\alpha+k}} \ dt \\&=\int_0^x  \frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \biggl(\frac{t}{t+\beta} \biggr)^{k-1} \ \biggl(\frac{\beta}{t+\beta} \biggr)^{\alpha-1} \ \frac{\beta}{(t+\beta)^2} \ dt \\&=\frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \int_0^{\frac{x}{x+\beta}} u^{k-1} \ (1-u)^{\alpha-1} \ du, \ \ \ u=\frac{t}{t+\beta} \\&=\frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \int_0^{w} t^{k-1} \ (1-t)^{\alpha-1} \ dt, \ \ \ w=\frac{x}{x+\beta}   \end{aligned}

The moments can be easily derived for the generalized Pareto distribution but on a limited basis. Since it is a mixture distribution, the unconditional mean is the weighted average of the conditional means.

    \displaystyle \begin{aligned} E(X^w)&=\int_0^\infty  E(X \lvert \theta) \ g(\theta) \ d \theta \\&=\int_0^\infty  \frac{\Gamma(k+w)}{\theta^w \Gamma(k)} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta} \ d \theta \\&=\frac{\beta^w \ \Gamma(k+w) \ \Gamma(\alpha-w)}{\Gamma(k) \ \Gamma(\alpha)} \int_0^\infty \frac{1}{\Gamma(\alpha-w)} \ \beta^{\alpha-w} \ \theta^{\alpha-w-1} \ e^{-\beta \theta} \ d \theta \\&=\frac{\beta^w \ \Gamma(k+w) \ \Gamma(\alpha-w)}{\Gamma(k) \ \Gamma(\alpha)} \ \ \ \ -k<w<\alpha   \end{aligned}

Note that E(X) has a simple expression E(X)=\frac{k \beta}{\alpha-1} when 1<\alpha.

When the parameter k=1, the conditional distribution for X \lvert \Theta is an exponential distribution. Then the situation reverts back to Example 3, leading to a Pareto distribution. Thus the Pareto distribution is a special case of the generalized Pareto distribution. Both the Pareto distribution and the generalized Pareto distribution have thicker and longer tails than the original conditional gamma distribution.

It turns out that the F distribution is also a special case of the generalized Pareto distribution. The F distribution with r_1 and r_2 degrees of freedom is the generalized Pareto distribution with parameters k=r_1/2, \alpha=r_2/2 and \beta=r_2/r_1. As a result, the following is the density function.

    \displaystyle \begin{aligned} h(x)&=\frac{\Gamma(r_1/2 + r_2/2)}{\Gamma(r_1/2) \ \Gamma(r_2/2)} \ \frac{(r_2/r_1)^{r_2/2} \ x^{r_1/2-1}}{(x+r_2/r_1)^{r_1/2+r_2/2}} \\&=\frac{\Gamma(r_1/2 + r_2/2)}{\Gamma(r_1/2) \ \Gamma(r_2/2)} \ \frac{(r_1/r_2)^{r_1/2} \ x^{r_1/2-1}}{(1+(r_1/r_2)x)^{r_1/2+r_2/2}}  \ \ \ \ 0<x<\infty   \end{aligned}

Another way to generate the F distribution is from taking a ratio of two chi-squared distributions (see Theorem 9 in this previous post). Of course, there is no need to use the explicit form of the density function of the F distribution. In a statistical application, the F distribution is accessed using tables or software.

The Loglogistic Distribution

The loglogistic distribution can be derived as a mixture of Weillbull distribution with exponential mixing weights.

Example 6. Suppose that X \lvert \Lambda has a Weibull distribution with shape parameter \gamma (a known constant) and a parameter \Lambda such that the CDF of X \lvert \Lambda is F(x \lvert \Lambda)=1-e^{-\Lambda \ x^\gamma}. Further suppose that the random parameter \Lambda follows an exponential distribution with rate parameter \theta^{\gamma}. Then the unconditional distribution for X is a loglogistic distribution with shape parameter \gamma and scale parameter \theta.

The following gives the conditional survival function for X \lvert \Lambda and the exponential mixing weight.

    \displaystyle S(x \lvert \lambda)=e^{-\lambda \ x^\gamma}

    \displaystyle g(\lambda)=\theta^\gamma \ e^{-\theta^\gamma \ \lambda}

The following gives the unconditional survival function and CDF of X as well as the PDF.

    \displaystyle \begin{aligned} S(x)&=\int_0^\infty S(x \lvert \lambda) \ g(\lambda) \ d \lambda \\&=\int_0^\infty e^{-\lambda \ x^\gamma} \ \theta^\gamma \ e^{-\theta^\gamma \ \lambda} \ d \lambda \\&=\int_0^\infty  \theta^\gamma \ e^{-(x^\gamma+\theta^\gamma) \ \lambda} \ d \lambda \\&=\frac{\theta^\gamma}{(x^\gamma+\theta^\gamma)} \int_0^\infty   (x^\gamma+\theta^\gamma) \ e^{-(x^\gamma+\theta^\gamma) \ \lambda} \ d \lambda \\&=\frac{\theta^\gamma}{x^\gamma+\theta^\gamma} \end{aligned}

    \displaystyle \begin{aligned} F(x)&=1-S(x)=1-\frac{\theta^\gamma}{x^\gamma+\theta^\gamma} =\frac{x^\gamma}{x^\gamma+\theta^\gamma} =\frac{(x/\theta)^\gamma}{1+(x/\theta)^\gamma} \end{aligned}

    \displaystyle f(x)=\frac{d}{dx} \biggl( \frac{x^\gamma}{x^\gamma+\theta^\gamma} \biggr)=\frac{\gamma \ (x/\theta)^\gamma}{x [1+(x/\theta)^\gamma]^2}

Any distribution that has any one of the above three distributional quantities is said to be a loglogistic distribution with shape parameter \gamma and scale parameter \theta.

One interesting point about loglogistic distribution that an inverse loglogistic distribution is another loglogistic distribution. Suppose that X has a loglogistic distribution with shape parameter \gamma and scale parameter \theta. Let Y=\frac{1}{X}. Then Y has a loglogistic distribution with shape parameter \gamma and scale parameter \theta^{-1}.

    \displaystyle \begin{aligned} P[Y \le y]&=P[\frac{1}{X} \le y] =P[X \ge y^{-1}] =\frac{\theta^\gamma}{y^{-\gamma}+\theta^\gamma} \\&=\frac{\theta^\gamma \ y^\gamma}{1+\theta^\gamma \ y^\gamma} \\&=\frac{y^\gamma}{(\theta^{-1})^\gamma+y^\gamma} \end{aligned}

The above is a survival function for the loglogistic distribution with the desired parameters. Thus there is no need to specially call out the inverse loglogistic distribution.

In order to find the mean and higher moments of the loglogistic distribution, we take the approach of identifying the conditional Weibull means and the weight these means by the exponential mixing weights. Note that the parameter \Lambda in the conditional CDF F(x \lvert \Lambda)=1-e^{-\Lambda \ x^\gamma} is not a scale parameter. The Weibull distribution in this conditional CDF is equivalent to a Weibull distribution with shape parameter \gamma and scale parameter \Lambda^{-1/\gamma}. According to formula (4) in this previous post, the kth moment of this Weillbull distribution is

    \displaystyle E[ (X \lvert \Lambda)^k]=\Gamma \biggl(1+\frac{k}{\gamma} \biggr) \Lambda^{-k/\gamma}

The following gives the unconditional kth moment of the Weibull-exponential mixure.

    \displaystyle \begin{aligned} E[X^k]&=\int_0^\infty E[ (X \lvert \Lambda)^k] \ g(\lambda) \ d \lambda \\&=\int_0^\infty \Gamma \biggl(1+\frac{k}{\gamma} \biggr) \lambda^{-k/\gamma} \ \theta^\gamma \ e^{-\theta^\gamma \ \lambda} \ d \lambda\\&=\Gamma \biggl(1+\frac{k}{\gamma} \biggr) \ \theta^\gamma \int_0^\infty  \lambda^{-k/\gamma} \ e^{-\theta^\gamma \ \lambda} \ d \lambda \\&=\theta^k \ \Gamma \biggl(1+\frac{k}{\gamma} \biggr)  \int_0^\infty  t^{-k/\gamma} \ e^{-t} \ dt \ \ \text{ where } t=\theta^\gamma \lambda \\&=\theta^k \ \Gamma \biggl(1+\frac{k}{\gamma} \biggr) \int_0^\infty  t^{[(\gamma-k)/\gamma]-1} \ e^{-t} \ dt   \\&=\theta^k \ \Gamma \biggl(1+\frac{k}{\gamma} \biggr) \ \Gamma \biggl(1-\frac{k}{\gamma} \biggr) \ \ \ \ -\gamma<k<\gamma  \end{aligned}

The range \gamma<k<\gamma follows from the fact that the arguments of the gamma function must be positive. Thus the kth moments of the loglogistic distribution are limited by its shape parameter \gamma. If \gamma=1, then E(X) does not exist. For a larger \gamma, more moments exist but always a finite number of moments. This is an indication that the loglogistic distribution has a thick (right) tail. This is not surprising since mixture distributions (loglogistic in this case) tend to have thicker tails than the conditional distributions (Weibull in this case). The thicker tail is a result of the uncertainty in the random parameter in the conditional distribution (the Weibull \Lambda in this case).

Another Way to Obtain Exponential Distribution

We now consider Example 7. The following is a precise statement of the gamma-geometric mixture.

Example 7. Suppose that X \lvert \alpha has a gamma distribution with shape parameter \alpha that is a positive integer and rate parameter \beta (a known constant). Further suppose that the random parameter \alpha follows a geometric distribution with probability function P[Y=\alpha]=p (1-p)^{\alpha-1} where \alpha=1,2,3,\cdots. Then the unconditional distribution for X is an exponential distribution with rate parameter \beta p.

The conditional gamma distribution has an uncertain shape parameter \alpha that can take on positive integers. The parameter \alpha follows a geometric distribution. Here’s the ingredients that go into the mixture.

    \displaystyle f(x \lvert \alpha)=\frac{1}{(\alpha-1)!} \ \beta^\alpha \ x^{\alpha-1} \ e^{-\beta x}

    P[Y=\alpha]=p (1-p)^{\alpha-1}

The following is the unconditional probability density function of X.

    \displaystyle \begin{aligned} f(x)&=\sum \limits_{\alpha=1}^\infty f(x \lvert \alpha) \ P[Y=\alpha] \\&=\sum \limits_{\alpha=1}^\infty \frac{1}{(\alpha-1)!} \ \beta^\alpha \ x^{\alpha-1} \ e^{-\beta x} \ p (1-p)^{\alpha-1} \\&=\beta p \ e^{-\beta x} \sum \limits_{\alpha=1}^\infty \frac{[\beta(1-p) x]^{\alpha-1}}{(\alpha-1)!} \\&=\beta p \ e^{-\beta x} \sum \limits_{\alpha=0}^\infty \frac{[\beta(1-p) x]^{\alpha}}{(\alpha)!} \\&=\beta p \ e^{-\beta x} \ e^{\beta(1-p) x} \end{aligned}

The above density function is that of an exponential distribution with rate parameter \beta p.

Student t Distribution

Example 3 (discussed in the previous post) involves a normal distribution with a random mean. Example 8 involves a normal distribution with mean 0 and an uncertain variance, which follows a gamma distribution such that the two gamma parameters are related to a common parameter r, which will be the degrees of freedom of the student t distribution. The following is a precise description of the normal-gamma mixture.

Example 8. Suppose that X \lvert \Lambda has a normal distribution with mean 0 and variance 1/\Lambda. Further suppose that the random parameter \Lambda follows a gamma distribution with shape parameter \alpha and scale parameter \theta such that 2 \alpha=\frac{2}{\theta}=r is a positive integer. Then the unconditional distribution for X is a student t distribution with r degrees of freedom.

The following gives the ingredients of the normal-gamma mixture. The first item is the conditional density function of X given \Lambda. The second is the density function of the mixing weight \Lambda.

    \displaystyle f(x \lvert \lambda)=\frac{1}{\sqrt{1/\lambda} \ \sqrt{2 \pi}} \ e^{-(\lambda/2) \  x^2}=\sqrt{\frac{\lambda}{2 \pi}} \ e^{-(\lambda/2) \  x^2}

    \displaystyle g(\lambda)=\frac{1}{\Gamma(\alpha)} \biggl( \frac{1}{\theta} \biggr)^\alpha \ \lambda^{\alpha-1} \ e^{-\lambda/\theta}

The following calculation derives the unconditional density function of X.

    \displaystyle \begin{aligned} f(x)&=\int_{0}^\infty f(x \lvert \lambda) \ g(\lambda) \ d \lambda \\&=\int_{0}^\infty \sqrt{\frac{\lambda}{2 \pi}} \ e^{-(\lambda/2) \  x^2} \ \frac{1}{\Gamma(\alpha)} \biggl( \frac{1}{\theta} \biggr)^\alpha \ \lambda^{\alpha-1} \ e^{-\lambda/\theta} \ d \lambda \\&=\frac{1}{\Gamma(\alpha)} \ \biggl( \frac{1}{\theta} \biggr)^\alpha \ \frac{1}{\sqrt{2 \pi}} \int_0^\infty \lambda^{\alpha+\frac{1}{2}-1} e^{-(\frac{x^2}{2}+\frac{1}{\theta} ) \lambda} \ d \lambda \\&=\frac{\Gamma(\alpha+\frac{1}{2})}{\Gamma(\alpha)} \ \biggl( \frac{1}{\theta} \biggr)^\alpha \ \frac{1}{\sqrt{2 \pi}} \ \biggl(\frac{2 \theta}{\theta x^2+2} \biggr)^{\alpha+\frac{1}{2}} \\& \times \int_0^\infty \frac{1}{\Gamma(\alpha+\frac{1}{2})} \ \biggl(\frac{\theta x^2+2}{2 \theta} \biggr)^{\alpha+\frac{1}{2}} \lambda^{\alpha+\frac{1}{2}-1} e^{-\frac{\theta x^2+2}{2 \theta} \lambda} \ d \lambda \\&=\frac{\Gamma(\alpha+\frac{1}{2})}{\Gamma(\alpha)} \ \biggl( \frac{1}{\theta} \biggr)^\alpha \ \frac{1}{\sqrt{2 \pi}} \ \biggl(\frac{2 \theta}{\theta x^2+2} \biggr)^{\alpha+\frac{1}{2}} \ \ \ \ \ -\infty<x<\infty \end{aligned}

The above density function is in terms of the two parameters \alpha and \theta. In the assumptions, the two parameters are related to a common parameter r such that \alpha=\frac{r}{2} and \theta=\frac{2}{r}. The following derivation converts to the common r.

    \displaystyle \begin{aligned} f(x)&=\frac{\Gamma(\frac{r}{2}+\frac{1}{2})}{\Gamma(\frac{r}{2})} \ \biggl( \frac{r}{2} \biggr)^{\frac{r}{2}} \ \frac{1}{\sqrt{2 \pi}} \ \biggl(\frac{2 \frac{2}{r}}{\frac{2}{r} x^2+2} \biggr)^{\frac{r}{2}+\frac{1}{2}} \\&=\frac{\Gamma(\frac{r}{2}+\frac{1}{2})}{\Gamma(\frac{r}{2})} \ \frac{r^{r/2}}{2^{r/2}} \ \frac{1}{2^{1/2} \sqrt{\pi}} \ \biggl(\frac{2/r}{x^2/r+1} \biggr)^{(r+1)/2} \\&=\frac{\Gamma \biggl(\displaystyle \frac{r+1}{2} \biggr)}{\Gamma \biggl(\displaystyle \frac{r}{2} \biggr)} \ \frac{1}{\sqrt{\pi r}} \ \frac{1 \ \ \ \ \ }{\biggl(1+\displaystyle \frac{x^2}{r} \biggr)^{(r+1)/2}} \ \ \ \ \ -\infty<x<\infty \end{aligned}

The above density function is that of a student t distribution with r degrees of freedom. Of course, in performing test of significance, the t distribution is accessed by using tables or software. A usual textbook definition of the student t distribution is the ratio of a normal distribution and a chi-squared distribution (see Theorem 6 in this previous post.

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

Mixing probability distributions

This post discusses another way to generate new distributions from old, that of mixing distributions. The resulting distributions are called mixture distributions.

What is a Mixture?

First, let’s start with continuous mixture. Suppose that X is a continuous random variable with probability density function (pdf) f_{X \lvert \Theta}(x \lvert \theta) where \theta is a parameter in the pdf. There may be other parameters in the distribution but they are not relevant at the moment (e.g. these other parameters may be known constants). Suppose that the parameter \theta is an uncertain quantity and is a random variable with pdf h_\Theta(\theta) (if \Theta is a continuous random variable) or with probability function P(\Theta=\theta) (if \Theta a discrete random variable). Then taking the weighted average of f_{X \lvert \Theta}(x \lvert \theta) with h_\Theta(\theta) or P(\Theta=\theta) as weight produces a mixture distribution. The following would be pdf of the resulting mixture distribution.

    \displaystyle (1a) \ \ \ \ \ f_X(x)=\int_{-\infty}^\infty f_{X \lvert \Theta}(x \lvert \theta) \ h_\Theta(\theta) \ d \theta

    \displaystyle (1b) \ \ \ \ \ f_X(x)=\sum \limits_{\theta} \biggl(f_{X \lvert \Theta}(x \lvert \theta) \ P(\Theta=\theta) \biggr)

Thus a continuous random variable X is said to be a mixture (or has a mixture distribution) if its probability density function f_X(x) is a weighted average of a family of pdfs f_{X \lvert \Theta}(x \lvert \theta) where the weight is the density function or probability function of the random parameter \Theta. The random variable \Theta is said to be the mixing random variable and its pdf or probability function is called the mixing weight.

Another definition of mixture distribution is that the cumulative distribution function (cdf) of the random variable X is the weighted average of a family of cumulative distribution functions indexed by the mixing random variable \Theta.

    \displaystyle (2a) \ \ \ \ \ F_X(x)=\int_{-\infty}^\infty F_{X \lvert \Theta}(x \lvert \theta) \ h_\Theta(\theta) \ d \theta

    \displaystyle (2b) \ \ \ \ \ F_X(x)=\sum \limits_{\theta} \biggl(F_{X \lvert \Theta}(x \lvert \theta) \ P(\Theta=\theta) \biggr)

The idea of discrete mixture is similar. A discrete random variable X is said to be a mixture if its probability function P(X=x) or cumulative distribution function P(X \le x) is a weighted average of a family of probability functions or cumulative distributions indexed by the mixing random variable \Theta. The mixing weight can be discrete or continuous. The following shows the probability function and the cdf of a discrete mixture distribution.

    \displaystyle (3a) \ \ \ \ \ P(X=x)=\int_{-\infty}^\infty P(X=x \lvert \Theta=\theta) \ h_\Theta(\theta) \ d \theta

    \displaystyle (3b) \ \ \ \ \ P(X \le x)=\int_{-\infty}^\infty P(X \le x \lvert \Theta=\theta) \ h_\Theta(\theta) \ d \theta

    \text{ }

    \displaystyle (4a) \ \ \ \ \ P(X=x)=\sum \limits_{\theta} \biggl(P(X=x \lvert \Theta=\theta) \ P(\Theta=\theta) \biggr)

    \displaystyle (4b) \ \ \ \ \ P(X \le x)=\sum \limits_{\theta} \biggl(P(X \le x \lvert \Theta=\theta) \ P(\Theta=\theta) \biggr)

When the mixture distribution is a weighted average of finitely many distributions, it is called a n-point mixture where n is the number of distributions. Suppose that there are n distributions with pdfs

    f_1(x),f_2(x),\cdots,f_n(x) (continuous case)

or probability functions

    P(X_1=x),P(X_2=x),\cdots,P(X_n=x) (discrete case)

with mixing probabilities p_1,p_2,\cdots,p_n where the sum of the p_i is 1. Then the following gives the pdf or the probability function of the mixture distribution.

    \displaystyle (5a) \ \ \ \ \ f_X(x)=\sum \limits_{j=1}^n p_j \ f_j(x)

    \displaystyle (5b) \ \ \ \ \ P(X=x)=\sum \limits_{j=1}^n p_j \ P(X_j=x)

The cdf for the n-point mixture is similarly obtained by weighting the respective conditional cdfs as in (4b).

Distributional Quantities

Once the pdf (or probability function) or cdf of a mixture is established, the other distributional quantities can be derived from the pdf or cdf. Some of the distributional quantities can be obtained by taking weighted average of the corresponding conditional counterparts. For example, the following gives the survival function and moments of a mixture distribution. We assume that the mixing weight is continuous. For discrete mixing weight, simply replace the integral with summation.

    \displaystyle (6a) \ \ \ \ \ S_X(x)=\int_{-\infty}^\infty S_{X \lvert \Theta}(x \lvert \theta) \ h_\Theta(\theta) \ d \theta

    \displaystyle (6b) \ \ \ \ \ E(X)=\int_{-\infty}^\infty E(X \lvert \theta) \ h_\Theta(\theta) \ d \theta

    \displaystyle (6c) \ \ \ \ \ E(X^k)=\int_{-\infty}^\infty E(X^k \lvert \theta) \ h_\Theta(\theta) \ d \theta

Once the moments are obtained, all distributional quantities that are based on moments can be evaluated, calculations such as variance, skewness, and kurtosis. Note that these quantities are not the weighted average of the conditional quantities. For example, variance of a mixture is not the weighted average of the variance of the conditional distributions. In fact, the variance of a mixture has two components.

    \displaystyle (7) \ \ \ \ \ Var(X)=E[Var(X \lvert \Theta)]+Var[E(X \lvert \Theta)]

The relationship in (7) is called the law of total variance, which is the proper way of computing the unconditional variance Var(X). The first component E[Var(X \lvert \Theta)] is called the expected value of conditional variances, which is the weighted average of the conditional variances. The second component Var[E(X \lvert \Theta)] is called the variance of the conditional means, which represents the additional variance as a result of the uncertainty in the parameter \Theta. If there is a great deal of variation among the conditional mean E(X \lvert \Theta), the variation will be reflected in Var(X) through the second component Var[E(X \lvert \Theta)]. This will be further illustrated in the examples below.

Motivation

Some of the examples discussed below have gamma distribution as mixing weights. See here for basic information on gamma distribution.

A natural interpretation of mixture is that of the uncertain parameter \Theta in the conditional random variable X \lvert \Theta describes an individual in a large population. For example, the parameter \Theta describes a certain characteristics across the units in a population. In this section, we describe the idea of mixture in an insurance setting. The example is to mix Poisson distributions with a gamma distribution as mixing weight. We will see that the resulting mixture is a negative binomial distribution, which is more dispersed than the conditional Poisson distributions.

Consider a large group of insured drivers for auto collision coverage. Suppose that the claim frequency in a year for an insured driver has a Poisson distribution with mean \theta. The conditional probability function for the number of claims in a year for an insured driver is:

    \displaystyle P(X=x \lvert \Theta=\theta)=\frac{e^{-\theta} \ \theta^x}{x!}  \ \ \ \ \ \ x=0,1,2,3,\cdots where \theta>0

The mean number of claims in a year for an insured driver is \theta. The parameter \theta reflects the risk characteristics of an insured driver. Since the population of insured drivers is large, there is uncertainty in the parameter \theta. Thus it is more appropriate to regard \theta as a random variable in order to capture the wide range of risk characteristics across the individuals in the population. As a result, the above probability function is not unconditional, but, rather, a conditional probability function of X.

What about the marginal (unconditional) probability function of X? Suppose that the pdf of \Theta has a gamma distribution with the following pdf:

    \displaystyle h_{\Theta}(\theta)=\frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta}

where \alpha>0 and \beta>0 are known parameters of the gamma distribution. Then the unconditional pdf of X is the weighted average of the conditional Poisson distribution.

    \displaystyle \begin{aligned} P(X=x)&=\int_0^\infty P(X=x \lvert \Theta=\theta) \ h_{\Theta}(\theta) \ d \theta \\&=\int_0^\infty \frac{e^{-\theta} \ \theta^x}{x!} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta}  \\&= \frac{\beta^\alpha}{x! \Gamma(\alpha)} \int_0^\infty \theta^{x+\alpha-1} \ e^{(\beta+1) \theta} \ d \theta  \\&=\frac{\beta^\alpha}{x! \Gamma(\alpha)} \ \frac{\Gamma(x+\alpha)}{(\beta+1)^{x+\alpha}} \int_0^\infty \frac{1}{\Gamma(x+\alpha)} \ (\beta+1)^{x+\alpha} \ \theta^{x+\alpha-1} \ e^{(\beta+1) \theta} \ d \theta \\&=\frac{\beta^\alpha}{x! \Gamma(\alpha)} \ \frac{\Gamma(x+\alpha)}{(\beta+1)^{x+\alpha}} \\&=\frac{\Gamma(x+\alpha)}{x! \ \Gamma(\alpha)} \ \biggl(\frac{\beta}{\beta+1} \biggr)^\alpha \biggl(\frac{1}{\beta+1} \biggr)^x \ \ x=0,1,2,\cdots \end{aligned}

Note that the integral in the 4th step is 1 since the integrand is a gamma density function. The probability function at the last step is that of a negative binomial distribution. If the parameter \alpha is a positive integer, then the following gives the probability function of X after simplifying the expression with gamma function.

    \displaystyle  P(X=x)=\left\{ \begin{array}{ll}                     \displaystyle  \biggl(\frac{\beta}{\beta+1} \biggr)^\alpha &\ x=0 \\           \text{ } & \text{ } \\           \displaystyle  \frac{(x-1+\alpha) \cdots (1+\alpha) \alpha}{x!} \ \biggl(\frac{\beta}{\beta+1} \biggr)^\alpha \biggl(\frac{1}{\beta+1} \biggr)^x &\ x=1,2,\cdots           \end{array} \right.

This probability function can be further simplified as the following:

    \displaystyle P(X=x)=\binom{x+\alpha-1}{x} \biggl(\frac{\beta}{\beta+1} \biggr)^\alpha \biggl(\frac{1}{\beta+1} \biggr)^x

where x=0,1,2,\cdots. This is one form of a negative binomial distribution. The mean is E(X)=\frac{\alpha}{\beta} and the variance is Var(X)=\frac{\alpha}{\beta} (1+\frac{1}{\beta}). The variance of the negative binomial distribution is greater than the mean. In a Poisson distribution, the mean equals the variance. Thus the unconditional claim frequency X is more dispersed than its conditional distributions. This is a characteristic of mixture distributions. The uncertainty in the parameter variable \Theta has the effect of increasing the unconditional variance of the mixture distribution of X. Recall that the variance of a mixture distribution has two components, the weighted average of the conditional variances and the variance of the conditional means. The second component represents the additional variance introduced by the uncertainty in the parameter \Theta.

More Examples

We now further illustrate the notion of mixture with a few more examples. Many familiar distributions are mixture distribution. The negative binomial distribution is a mixture of Poisson distributions with gamma mixing weight as discussed above. The Pareto distribution, more specifically Pareto Type I Lomax, is a mixture of exponential distributions with gamma mixing weight (see Example 2 below). Example 3 discusses the normal-normal mixture. Example 1 demonstrates numerical calculation involving a finite mixture.

Example 1
Suppose that the size of an auto collision claim from a large group of insured drivers is a mixture of three exponential distributions with means 5, 8 and 10 (with respective weights 0.75, 0.15 and 0.10, respectively). Discuss the mixture distribution.

The pdf and cdf are the weighted averages of the respective exponential quantities.

    \displaystyle \begin{aligned} f_X(x)&=0.75 \ (0.2 e^{-0.2x} )+0.15 \ (0.125 e^{-0.125x} )+0.10 (0.10 e^{-0.10x}) \\&\text{ } \\&=0.15 \ e^{-0.2x} +0.01875 \ e^{-0.125x}+0.01 \ e^{-0.10x} \end{aligned}

    \displaystyle \begin{aligned} F_X(x)&=0.75 \ (1- e^{-0.2x} )+0.15 \ (1- e^{-0.125x} )+0.10 (1- e^{-0.10x}) \\&\text{ } \\&=1-0.75 \ e^{-0.2x} -0.15 \ e^{-0.125x}-0.10 \ e^{-0.10x} \end{aligned}

    \displaystyle S_X(x)=0.75 \ e^{-0.2x} +0.15 \ e^{-0.125x}+0.10 \ e^{-0.10x}

For a randomly selected claim from this population of insured drivers, what is the probability that it exceeds 10? The answer is S_X(10)=0.1813. The pdf and cdf of the mixture will allow us to derive other distributional quantities such as moments and then using the moments to derive skewness and kurtosis. The moments for exponential distribution has a closed form. Then the moments of the mixture distribution is simply the weighted average of the exponential moments.

    \displaystyle E(X^k)=0.75 \ [5^k \ k!]+0.15 \ [8^k \ k!]+0.10 \ [10^k \ k!]

where k is a positive integer. The following evaluate the first four moments.

    \displaystyle E(X)=0.75 \ 5+0.15 \ 8+0.10 \ 10=5.95

    \displaystyle E(X^2)=0.75 \ (5^2 \ 2!)+0.15 \ (8^2 \ 2!)+0.10 \ (10^2 \ 2!)=76.7

    \displaystyle E(X^3)=0.75 \ (5^3 \ 3!)+0.15 \ (8^3 \ 3!)+0.10 \ (10^3 \ 3!)=1623.3

    \displaystyle E(X^4)=0.75 \ (5^4 \ 4!)+0.15 \ (8^4 \ 4!)+0.10 \ (10^4 \ 4!)=49995.6

The variance of X is Var(X)=76.7-5.95^2=41.2975. The three conditional exponential variances are 25, 64 and 100. The weighted average of these would be 38.35. Because of the uncertainty resulting from not knowing which exponential distribution the claim is from, the unconditional variance is larger than 38.35.

The skewness of a distribution is the third central moments and the kurtosis is defined as the fourth central moment. Each of them can be expressed in terms of the raw moments up to the third or fourth raw moment.

    \displaystyle \gamma=E\biggl[\biggl( \frac{X-\mu}{\sigma} \biggr)^3\biggr]=\frac{E(X^3)-3 \mu \sigma^2-\mu^3}{(\sigma^2)^{1.5}}

    \displaystyle \text{Kurt}[X]=E\biggl[\biggl( \frac{X-\mu}{\sigma} \biggr)^4\biggr]=\frac{E(X^4)-4 \mu E(X^3)+6 \mu^2 E(X^2)-3 \mu^4}{\sigma^4}

Note that \mu=E(X) and \sigma^2=Var(X). The expressions on the right hand side are in terms of the raw moments E(X^k) up to k=4. Plugging in the raw moments produces the skewness \gamma=2.5453 and kurtosis \text{Kurt}[X]=14.0097. The excess kurtosis is then 11.0097 (subtracting 3 from the kurtosis).

The skewness and excess kurtosis of an exponential distribution are always 2 and 6, respectively. One take way is that skewness and kurtosis of a mixture is not the weighted average of the conditional counterparts. In this particular case, the mixture is more skewed than the individual exponential distributions. Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution (the kurtosis of a normal distribution is 3). Since the excess kurtosis for exponential distributions is 6, this mixture distribution is considered to be heavy tailed and to have higher likelihood of outliers.

Example 2 (Exponential-Gamma Mixture)
The Pareto distribution (Type I Lomax) is a mixture of exponential distributions with gamma mixing weight. Suppose X has the exponential pdf f_{X \lvert \Theta}(x \lvert \theta)=\theta \ e^{-\theta x}, where x>0, conditional on the parameter \Theta. Suppose that the pdf of \Theta has a gamma distribution with the following pdf:

    \displaystyle h_{\Theta}(\theta)=\frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta}

Then the following gives the unconditional pdf of the random variable X.

    \displaystyle \begin{aligned} f_X(x)&=\int_0^\infty f_{X \lvert \Theta}(x \lvert \theta) \  h_{\Theta}(\theta) \ d \theta \\&=\int_0^\infty \theta \ e^{-\theta x} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta} \ d \theta \\&= \frac{\beta^\alpha}{\Gamma(\alpha)} \int_0^\infty \theta^{\alpha+1-1} \ e^{-(x+\beta) \theta} \ d \theta \\&= \frac{\beta^\alpha}{\Gamma(\alpha)} \frac{\Gamma(\alpha+1)}{(x+\beta)^{\alpha+1}} \int_0^\infty \frac{1}{\Gamma(\alpha+1)} \ (x+\beta)^{\alpha+1} \  \theta^{\alpha+1-1} \ e^{-(x+\beta) \theta} \ d \theta \\&=\frac{\beta^\alpha}{\Gamma(\alpha)} \frac{\Gamma(\alpha+1)}{(x+\beta)^{\alpha+1}} \\&= \frac{\alpha \ \beta^{\alpha}}{(x+\beta)^{\alpha+1}} \end{aligned}

The above is the density of the Pareto Type I Lomax distribution. Pareto distribution is discussed here. The example of exponential-gamma mixture is discussed here.

Example 3 (Normal-Normal Mixture)
Conditional on \Theta=\theta, consider a normal random variable X with mean \theta and variance v where v is known. The following is the conditional density function of X.

    \displaystyle f_{X \lvert \Theta}(x \lvert \theta)=\frac{1}{\sqrt{2 \pi v}} \ \text{exp}\biggl[-\frac{1}{2v}(x-\theta)^2 \biggr] \ \ \ -\infty<x<\infty

Suppose that the parameter \Theta is normally distributed with mean \mu and variance a (both known parameters). The following is the density function of \Theta.

    \displaystyle f_{\Theta}(\theta)=\frac{1}{\sqrt{2 \pi a}} \ \text{exp}\biggl[-\frac{1}{2a}(\theta-\mu)^2 \biggr] \ \ \ -\infty<x<\infty

Determine the unconditional pdf of X.

    \displaystyle \begin{aligned} f_X(x)&=\int_{-\infty}^\infty \frac{1}{\sqrt{2 \pi v}} \ \text{exp}\biggl[-\frac{1}{2v}(x-\theta)^2 \biggr] \ \frac{1}{\sqrt{2 \pi a}} \ \text{exp}\biggl[-\frac{1}{2a}(\theta-\mu)^2 \biggr] \ d \theta \\&=\frac{1}{2 \pi \sqrt{va}} \int_{-\infty}^\infty \text{exp}\biggl[-\frac{1}{2v}(x-\theta)^2 -\frac{1}{2a}(\theta-\mu)^2\biggr] \ d \theta  \end{aligned}

The expression in the exponent has the following equivalent expression.

    \displaystyle \frac{(x-\theta)^2}{v}+\frac{(\theta-\mu)^2}{a}=\frac{a+v}{va} \biggl[\theta-\frac{ax+v \mu}{a+v}\biggr]^2 +\frac{(x-\mu)^2}{a+v}

Continuing the derivation:

    \displaystyle \begin{aligned} f_X(x)&=\frac{1}{2 \pi \sqrt{va}} \int_{-\infty}^\infty \text{exp}\biggl[-\frac{1}{2} \biggl(\frac{a+v}{va} \biggl[\theta-\frac{ax+v \mu}{a+v}\biggr]^2 +\frac{(x-\mu)^2}{a+v}  \biggr) \biggr] \ d \theta \\&\displaystyle =\frac{\text{exp}\biggl[\displaystyle -\frac{(x-\mu)^2}{2(a+v)} \biggr]}{2 \pi \sqrt{va}}  \int_{-\infty}^\infty  \text{exp}\biggl[\displaystyle -\frac{1}{2} \biggl(\frac{a+v}{va} \biggl[\theta-\frac{ax+v \mu}{a+v}\biggr]^2 \biggr) \biggr] \ d \theta \\&=\frac{\text{exp}\biggl[\displaystyle -\frac{(x-\mu)^2}{2(a+v)} \biggr]}{\sqrt{2 \pi (a+v)} }  \int_{-\infty}^\infty \frac{1}{\sqrt{2 \pi}} \sqrt{\frac{a+v}{va}} \ \text{exp}\biggl[-\frac{1}{2} \biggl(\frac{a+v}{va} \biggl[\theta-\frac{ax+v \mu}{a+v}\biggr]^2 \biggr) \biggr] \ d \theta \\&=\frac{\text{exp}\biggl[\displaystyle -\frac{(x-\mu)^2}{2(a+v)} \biggr]}{\sqrt{2 \pi (a+v)} }  \end{aligned}

Note that the integrand in the integral in the third line is the density function of a normal distribution with mean \frac{ax+v \mu}{a+v} and variance \frac{va}{a+v}. Hence the integral is 1. The last expression is the unconditional pdf of X, repeated as follows.

    \displaystyle f_X(x)=\frac{1}{\sqrt{2 \pi (a+v)}} \ \text{exp}\biggl[-\frac{(x-\mu)^2}{2(a+v)} \biggr] \ \ \ \ -\infty<x<\infty

The above is the pdf of a normal distribution with mean \mu and variance a+v. Thus the mixing normal distribution with mean \Theta and variance v with the mixing weight \Theta being normally distributed with mean \mu and variance a produces a normal distribution with mean \mu (same mean as the mixing weight) and variance a+v (sum of the conditional variance and the mixing variance).

The mean of the conditional normal distribution is uncertain. When the mean \Theta follows a normal distribution with mean \mu, the mixture is a normal distribution that centers around \mu, however, with increased variance a+v. The increased variance of the unconditional distribution reflects the uncertainty of the parameter \Theta.

Remarks

Mixture distributions can be used to model a statistical population with subpopulations, where the conditional density functions are the densities on the subpopulations, and the mixing weights are the proportions of each subpopulation in the overall population. If the population can be divided into finite number of homogeneous subpopulations, then the model would be a finite mixture as in Example 1. In certain situations, continuous mixing weights may be more appropriate (e.g. Poisson-Gamma mixture).

Many other familiar distributions are mixture distributions and are discussed in the next post.

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

Transformed Pareto distribution

One way to generate new probability distributions from old ones is to raise a distribution to a power. Two previous posts are devoted on this topic – raising exponential distribution to a power and raising a gamma distribution to a power. Many familiar and useful models can be generated in this fashion. For example, Weibull distribution is generated by raising an exponential distribution to a positive power. This post discusses the raising of a Pareto distribution to a power, as a result generating Burr distribution and inverse Burr distribution.

Raising to a Power

Let X be a random variable. Let \tau be a positive constant. The random variables Y=X^{1/\tau}, Y=X^{-1} and Y=X^{-1/\tau} are called transformed, inverse and inverse transformed, respectively.

Let f_X(x), F_X(x) and S_X(x)=1-F_X(x) be the probability density function (PDF), the cumulative distribution function (CDF) and the survival function of the random variable X (the base distribution). The goal is to express the CDFs of the “transformed” variables in terms of the base CDF F_X(x). The following table shows how.

Name of Distribution Random Variable CDF
Transformed Y=X^{1 / \tau}, \ \tau >0 F_Y(y)=F_X(y^\tau)
Inverse Y=X^{-1} F_Y(y)=1-F_X(y^{-1})
Inverse Transformed Y=X^{-1 / \tau}, \ \tau >0 F_Y(y)=1-F_X(y^{-\tau})

If the CDF of the base distribution, as represented by the random variable X, is known, then the CDF of the “transformed” distribution can be derived using F_X(x) as shown in this table. Thus the CDF, in many cases, is a good entry point of the transformed distribution.

Pareto Information

Before the transformation, we first list out the information on the Pareto distribution. The Pareto distribution of interest here is the Type II Lomax distribution (discussed here). The following table gives several distributional quantities for a Pareto distribution with shape parameter \alpha and scale parameter \theta.

Pareto Type II Lomax
Survival Function S(x)=\displaystyle  \biggl( \frac{\theta}{x+\theta} \biggr)^\alpha  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x >0
Cumulative Distribution Function F(x)=1-\displaystyle  \biggl( \frac{\theta}{x+\theta} \biggr)^\alpha \ \ \ \ \ \ \ \ \ \ \ \ \ x >0
Probability Density Function \displaystyle f(x)=\frac{\alpha \ \theta^\alpha}{(x+\theta)^{\alpha+1}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \  x >0
Mean \displaystyle E(X)=\frac{\theta}{\alpha-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>1
Median \displaystyle \theta \ 2^{\frac{\alpha}{2}}-\theta
Mode 0
Variance \displaystyle Var(X)=\frac{\theta^2 \ \alpha}{(\alpha-1)^2 \ (\alpha-2)} \ \ \ \ \ \ \ \alpha>2
Higher Moments \displaystyle E(X^k)=\frac{k! \ \theta^k}{(\alpha-1) \cdots (\alpha-k)} \ \ \ \ \ \ \alpha>k \ \ \ k is integer
Higher Moments \displaystyle E(X^k)=\frac{\theta^k \ \Gamma(k+1) \Gamma(\alpha-k)}{\Gamma(\alpha)} \ \ \ \ \alpha>k

The higher moments in the general case use \Gamma(\cdot), which is the gamma function.

The Distributions Derived from Pareto

Let X be a random variable that has a Pareto distribution (as described in the table in the preceding section). Assume that X has a shape parameter \alpha and scale parameter \theta. Let \tau be a positive number. When raising X to the power 1/\tau, the resulting distribution is a transformed Pareto distribution and is also called a Burr distribution, which then is a distribution with three parameters – \alpha, \theta and \tau.

When raising X to the power -1/\tau, the resulting distribution is an inverse transformed Pareto distribution and it is also called an inverse Burr distribution. When raising X to the power -1, the resulting distribution is an inverse Pareto distribution (it does not have a special name other than inverse Pareto).

The paralogistic family of distributions is created from the Burr distribution by collapsing two of the parameters into one. Let \alpha, \theta and \tau be the parameters of a Burr distribution. By equating \tau=\alpha, the resulting distribution is a paralogistic distribution. By equating \tau=\alpha in the corresponding inverse Burr distribution, the resulting distribution is an inverse paralogistic distribution.

Transformed Pareto = Burr

There are two ways to create the transformed Pareto distribution. One is to start with a base Pareto with shape parameter \alpha and scale parameter 1 and then raise it to 1/\tau. The scale parameter \theta is added at the end. Another way is to start with a base Pareto distribution with shape parameter \alpha and scale parameter \theta^\tau and then raise it to the power 1/\tau. Both ways would generate the same CDF. We take the latter approach since it generates both the CDF and moments quite conveniently.

Let X be a Pareto distribution with shape parameter \alpha and scale parameter \theta^\tau. The following table gives the distribution information on Y^{1/\tau}.

Burr Distribution
CDF F_Y(y)=\displaystyle  1-\biggl( \frac{1}{(y/\theta )^\tau+1} \biggr)^\alpha y >0
Survival Function S_Y(x)=\displaystyle \biggl( \frac{1}{(y/\theta )^\tau+1} \biggr)^\alpha y >0
Probability Density Function \displaystyle f_Y(y)=\frac{\alpha \ \tau \ (y/\theta)^\tau}{y \ [(y/\theta)^\tau+1 ]^{\alpha+1}} y >0
Mean \displaystyle E(Y)=\frac{\theta \ \Gamma(1/\tau+1) \Gamma(\alpha-1/\tau)}{\Gamma(\alpha)} 1 <\alpha \ \tau
Median \displaystyle \theta \ (2^{1/\alpha}-1)^{1/\tau}
Mode \displaystyle \theta \ \biggl(\frac{\tau-1}{\alpha \tau+1} \biggr)^{1/\tau} \tau >1, else 0
Higher Moments \displaystyle E(Y^k)=\frac{\theta^k \ \Gamma(k/\tau+1) \Gamma(\alpha-k/\tau)}{\Gamma(\alpha)} -\tau<k <\alpha \ \tau

The distribution displayed in the above table is a three-parameter distribution. It is called the Burr distribution with parameters \alpha (shape), \theta (scale) and \tau (power).

To obtain the moments, note that E(Y^k)=E(X^{k/\tau}), which is derived using the Pareto moments. The Burr CDF has a closed form that is relatively easy to compute. Thus percentiles are very accessible. The moments rely on the gamma function and are usually calculated by software.

Inverse Transformed Pareto = Inverse Burr

One way to generate inverse transformed Pareto distribution is to raise a Pareto distribution with shape parameter \alpha and scale parameter 1 to the power of -1 and then add the scale parameter. Another way is to raise a Pareto distribution with shape parameter \alpha and scale parameter \theta^{-\tau}. Both ways derive the same CDF. As in the preceding case, we take the latter approach.

Let X be a Pareto distribution with shape parameter \alpha and scale parameter \theta^{-\tau}. The following table gives the distribution information on Y^{-1/\tau}.

Inverse Burr Distribution
CDF F_Y(y)=\displaystyle  \biggl( \frac{(y/\theta)^\tau}{(y/\theta )^\tau+1} \biggr)^\alpha y >0
Survival Function S_Y(x)=\displaystyle 1-\biggl( \frac{(y/\theta)^\tau}{(y/\theta )^\tau+1} \biggr)^\alpha y >0
Probability Density Function \displaystyle f_Y(y)=\frac{\alpha \ \tau \ (y/\theta)^{\tau \alpha}}{y \ [1+(y/\theta)^\tau]^{\alpha+1}} y >0
Mean \displaystyle E(Y)=\frac{\theta \ \Gamma(1-1/\tau) \Gamma(\alpha+1/\tau)}{\Gamma(\alpha)} 1 <\tau
Median \displaystyle \theta \ \biggl[\frac{1}{ 2^{1/\alpha}-1} \biggr]^{1/\tau}
Mode \displaystyle \theta \ \biggl(\frac{\alpha \tau-1}{\tau+1} \biggr)^{1/\tau} \alpha \tau >1, else 0
Higher Moments \displaystyle E(Y^k)=\frac{\theta^k \ \Gamma(1-k/\tau) \Gamma(\alpha+k/\tau)}{\Gamma(\alpha)} -\alpha \tau<k <\tau

The distribution displayed in the above table is a three-parameter distribution. It is called the Inverse Burr distribution with parameters \alpha (shape), \theta (scale) and \tau (power).

Note that both the moments for Burr and inverse Burr distributions are limited, the Burr limited by the product of the parameters \alpha and \tau and the inverse Burr limited by the parameter \tau. This is not surprising since the base Pareto distribution has limited moments. This is one indication that all of these distributions have a heavy right tail.

The Paralogistic Family

With the facts of the Burr distribution and the inverse Burr distribution established, paralogistic and inverse paralogistic distributions can now be obtained. A paralogistic distribution is simply a Burr distribution with \tau=\alpha. An inverse paralogistic distribution is simply an inverse Burr distribution with \tau=\alpha. In the above tables for Burr and inverse Burr, replacing \tau by \alpha gives the following table.

Paralogistic Distribution
CDF F_Y(y)=\displaystyle  1-\biggl( \frac{1}{(y/\theta )^\alpha+1} \biggr)^\alpha y >0
Survival Function S_Y(x)=\displaystyle \biggl( \frac{1}{(y/\theta )^\alpha+1} \biggr)^\alpha y >0
Probability Density Function \displaystyle f_Y(y)=\frac{\alpha^2 \ \ (y/\theta)^\alpha}{y \ [(y/\theta)^\alpha+1 ]^{\alpha+1}} y >0
Mean \displaystyle E(Y)=\frac{\theta \ \Gamma(1/\alpha+1) \Gamma(\alpha-1/\alpha)}{\Gamma(\alpha)} 1 <\alpha^2
Median \displaystyle \theta \ (2^{1/\alpha}-1)^{1/\alpha}
Mode \displaystyle \theta \ \biggl(\frac{\alpha-1}{\alpha^2+1} \biggr)^{1/\alpha} \alpha >1, else 0
Higher Moments \displaystyle E(Y^k)=\frac{\theta^k \ \Gamma(k/\alpha+1) \Gamma(\alpha-k/\alpha)}{\Gamma(\alpha)} -\alpha<k <\alpha^2
Inverse Paralogistic Distribution
CDF F_Y(y)=\displaystyle  \biggl( \frac{(y/\theta)^\alpha}{(y/\theta )^\alpha+1} \biggr)^\alpha y >0
Survival Function S_Y(x)=\displaystyle 1-\biggl( \frac{(y/\theta)^\alpha}{(y/\theta )^\alpha+1} \biggr)^\alpha y >0
Probability Density Function \displaystyle f_Y(y)=\frac{\alpha^2 \ (y/\theta)^{\alpha^2}}{y \ [1+(y/\theta)^\alpha]^{\alpha+1}} y >0
Mean \displaystyle E(Y)=\frac{\theta \ \Gamma(1-1/\alpha) \Gamma(\alpha+1/\alpha)}{\Gamma(\alpha)} 1 <\alpha
Median \displaystyle \theta \ \biggl[\frac{1}{ 2^{1/\alpha}-1} \biggr]^{1/\alpha}
Mode \displaystyle \theta \ (\alpha-1)^{1/\alpha} \alpha^2 >1, else 0
Higher Moments \displaystyle E(Y^k)=\frac{\theta^k \ \Gamma(1-k/\alpha) \Gamma(\alpha+k/\alpha)}{\Gamma(\alpha)} -\alpha^2<k <\alpha

Inverse Pareto Distribution

The distribution that has not been discussed is the inverse Pareto. Again, we have the option of deriving it by raising to a base Pareto with just the shape parameter to -1 and then add the scale parameter. We take the approach of raising a base Pareto distribution with shape parameter \alpha and scale parameter \theta^{-1}. Both approaches lead to the same CDF.

Inverse Pareto Distribution
CDF F_Y(y)=\displaystyle  \biggl( \frac{y}{\theta+y} \biggr)^\alpha y >0
Survival Function S_Y(x)=\displaystyle 1-\biggl( \frac{y}{\theta+y} \biggr)^\alpha y >0
Probability Density Function \displaystyle f_Y(y)=\frac{\alpha \ \theta \ y^{\alpha-1}}{[\theta+y ]^{\alpha+1}} y >0
Median \displaystyle \frac{\theta}{2^{1/\alpha}-1}
Mode \displaystyle \theta \ \frac{\alpha-1}{2} \alpha >1, else 0
Higher Moments \displaystyle E(Y^k)=\frac{\theta^k \ \Gamma(1-k) \Gamma(\alpha+k)}{\Gamma(\alpha)} -\alpha<k <1

The distribution described in the above table is an inverse Pareto distribution with parameters \alpha (shape) and \theta (scale). Note that the moments are even more limited than the Burr and inverse Burr distributions. For inverse Pareto, even the mean E(Y) is nonexistent.

Remarks

The Burr and paralogistic families of distributions are derived from the Pareto family (Pareto Type II Lomax). The Pareto connection helps put Burr and paralogistic distributions in perspective. The Pareto distribution itself can be generated as a mixture of exponential distributions with gamma mixing weight (see here). Thus from basic building blocks (exponential and gamma), vast families of distributions can be created, thus expanding the toolkit for modeling. The distributions discussed here are found in the appendix that is found in this link.

\copyright 2017 – Dan Ma

Transformed gamma distribution

The previous post opens up a discussion on generating distributions by raising an existing distribution to a power. The previous post focuses on the example of raising an exponential distribution to a power. This post focuses on the distributions generated by raising a gamma distribution to a power.

Raising to a Power

Let X be a random variable. Let \tau be a positive constant. The random variables Y=X^{1/\tau}, Y=X^{-1} and Y=X^{-1/\tau} are called transformed, inverse and inverse transformed, respectively.

Let f_X(x), F_X(x) and S_X(x)=1-F_X(x) be the probability density function (PDF), the cumulative distribution function (CDF) and the survival function of the random variable X (the base distribution). The following derivation seeks to express the CDFs of the “transformed” variables in terms of the base CDF F_X(x).

    Y=X^{1/\tau} (Transformed):
    \displaystyle \begin{aligned} F_Y(y)&=P(Y \le y) \\&=P(X^{1/\tau} \le y) \\&=P(X \le y^\tau) \\&=F_X(y^\tau)  \end{aligned}

    Y=X^{-1} (Inverse):
    \displaystyle \begin{aligned} F_Y(y)&=P(Y \le y) \\&=P(X^{-1} \le y) \\&=P(X \ge y^{-1}) \\&=S_X(y^{-1})=1-F_X(y^{-1})  \end{aligned}

    Y=X^{-1/\tau} (Inverse Transformed):
    \displaystyle \begin{aligned} F_Y(y)&=P(Y \le y) \\&=P(X^{-1/\tau} \le y)\\&=P(X^{-1} \le y^\tau) \\&=P(X \ge y^{-\tau)} \\&=S_X(y^{-\tau})=1-F_X(y^{-\tau})  \end{aligned}

Gamma CDF

Unlike the exponential distribution, the CDF of the gamma distribution does not have a closed form. Suppose that X is a random variable that has a gamma distribution with shape parameter \alpha and scale parameter \theta. The following is the expression of the gamma CDF.

    \displaystyle F_X(x)=\int_0^x \frac{1}{\Gamma(\alpha)} \ \frac{1}{\theta^\alpha} \ t^{\alpha-1} \ e^{- t/\theta} \ dt \ \ \ \ \ \ \ \ \ x>0

By a change of variable, the CDF can be expressed as the following integral.

    \displaystyle \begin{aligned} F_X(x)&=\int_0^{x/\theta} \frac{1}{\Gamma(\alpha)} \ u^{\alpha-1} \ e^{- u} \ du \ \ \ \ \ \ \ \ \ x>0 \\&=\frac{1}{\Gamma(\alpha)} \ \gamma(\alpha; x/\theta)  \end{aligned}

Note that \Gamma(\cdot) and \gamma(\beta;\cdot) are the gamma function and incomplete gamma function, respectively, defined as follows.

    \displaystyle \Gamma(\beta)=\int_0^\infty \ t^{\beta-1} \ e^{- t} \ dt

    \displaystyle \gamma(\beta; w)=\int_0^w \ t^{\beta-1} \ e^{- t} \ dt

The CDF F_X(x) can be evaluated numerically using software. When the gamma distribution is raised to a power, the resulting CDF will be defined as a function of F_X(x).

“Transformed” Gamma CDFs

The CDF of the “transformed” gamma distributions does not have a closed form. Thus the CDFs are to be defined based on an integral or the incomplete gamma function, shown in the preceding section. We still use the two-step approach – first deriving the CDF without the scale parameter and then add it at the end. Based on the two preceding sections, the following shows the CDFs of the three different cases.

Step 1. “Transformed” Gamma CDF (without scale parameter)

Transformed Y=X^{1 / \tau} \tau >0
F_Y(y)=F_X(y^\tau)
\displaystyle F_Y(y)=\int_0^{y^\tau} \frac{1}{\Gamma(\alpha)} \ u^{\alpha-1} \ e^{- u} \ du y >0
\displaystyle F_Y(y)=\frac{1}{\Gamma(\alpha)} \ \gamma(\alpha; y^\tau) y >0
Inverse Y=X^{1 / \tau} \tau=-1
F_Y(y)=S_X(y^{-1})=1-F_X(y^{-1})
\displaystyle F_Y(y)=1-\int_0^{y^{-1}} \frac{1}{\Gamma(\alpha)} \ u^{\alpha-1} \ e^{- u} \ du y >0
\displaystyle F_Y(y)=1-\frac{1}{\Gamma(\alpha)} \ \gamma(\alpha; y^{-1}) y >0
Inverse Transformed Y=X^{-1 / \tau} \tau >0
F_Y(y)=S_X(y^{-\tau})=1-F_X(y^{-\tau})
\displaystyle F_Y(y)=1-\int_0^{y^{-\tau}} \frac{1}{\Gamma(\alpha)} \ u^{\alpha-1} \ e^{- u} \ du y >0
\displaystyle F_Y(y)=1-\frac{1}{\Gamma(\alpha)} \ \gamma(\alpha; y^{-\tau}) y >0

Step 2. “Transformed” Gamma CDF (with scale parameter)

Transformed Y=X^{1 / \tau} \tau >0
\displaystyle F_Y(y)=\int_0^{(y/\theta)^\tau} \frac{1}{\Gamma(\alpha)} \ u^{\alpha-1} \ e^{- u} \ du y >0
\displaystyle F_Y(y)=\frac{1}{\Gamma(\alpha)} \ \gamma(\alpha; (y/\theta)^\tau) y >0
Inverse Y=X^{1 / \tau} \tau=-1
\displaystyle F_Y(y)=1-\int_0^{(y/\theta)^{-1}} \frac{1}{\Gamma(\alpha)} \ u^{\alpha-1} \ e^{- u} \ du y >0
\displaystyle F_Y(y)=1-\frac{1}{\Gamma(\alpha)} \ \gamma(\alpha; (y/\theta)^{-1}) y >0
Inverse Transformed Y=X^{-1 / \tau} \tau >0
\displaystyle F_Y(y)=1-\int_0^{(y/\theta)^{-\tau}} \frac{1}{\Gamma(\alpha)} \ u^{\alpha-1} \ e^{- u} \ du y >0
\displaystyle F_Y(y)=1-\frac{1}{\Gamma(\alpha)} \ \gamma(\alpha; (y/\theta)^{-\tau}) y >0

The transformed gamma distribution and the inverse transformed gamma distribution are three-parameter distributions with \tau being the shape parameter, \theta being the scale parameter and \tau being in the power to which the base gamma distribution is raised. The inverse gamma distribution has two parameters with \theta being the scale parameter and \alpha being shape parameter (the same two parameters in the base gamma distribution). Computation of these CDFs would require the use of software that can evaluate incomplete gamma function.

Another Way to Work with “Transformed” Gamma

The CDFs derived in the preceding section is a two-step approach – first raising a gamma distribution with scale parameter equals to 1 to a power and then adding a scale parameter. The end result gives CDFs that are a function of the incomplete gamma function. Calculating a CDF would require using a software that has the capability of evaluating incomplete gamma function (or evaluating an equivalent integral). If the software that is used does not have the incomplete gamma function but has gamma CDF (e.g. Excel), then there is another way of generating the “transformed” gamma CDF.

Observe that the CDFs in the last section are the results of raising a base gamma distribution with shape parameter \alpha and \theta^\tau (transformed), shape parameter \alpha and \theta^{-1} (inverse) and shape parameter \alpha and \theta^{-\tau} (inverse transformed). Based on this observation, we can evaluate the CDFs and the moments. This section shows how to evaluate CDFs using this approach. The next section shows how to evaluate the moments.

Transformed Gamma Distribution

Given a transformed gamma random variable Y with parameters \tau, \alpha (shape) and \theta (scale), know that Y=X^{1/\tau} where X gas a gamma distribution with parameters \alpha (shape) and \theta^\tau (scale). Then F_Y(y)=F_X(y^\tau) such that F_X(y^\tau) is evaluated using a software with the capability of evaluating gamma CDF (e.g. Excel).

Inverse Gamma Distribution

Given an inverse gamma random variable Y with parameters \tau and \theta (scale), know that Y=X^{-1} where X gas a gamma distribution with parameters \alpha (shape) and \theta^{-1} (scale). Then F_Y(y)=S_X(y^{-1})=1-F_X(y^{-1}) such that F_X(y^{-1}) is evaluated using a software with the capability of evaluating gamma CDF (e.g. Excel).

Inverse Transformed Gamma Distribution

Given an inverse transformed gamma random variable Y with parameters \tau, \alpha (shape) and \theta (scale), know that Y=X^{-1/\tau} where X gas a gamma distribution with parameters \alpha (shape) and \theta^{-\tau} (scale). Then F_Y(y)=S_X(y^{-\tau})=1-F_X(y^{-\tau}) such that F_X(y^{-\tau}) is evaluated using a software with the capability of evaluating gamma CDF (e.g. Excel).

Example 1 below uses Excel to compute the transformed gamma CDF.

“Transformed” Gamma Moments

Following the idea in the preceding section, the moments for the “transformed” gamma moments can be derived using gamma moments with the appropriate parameters (see here for the gamma moments). The following table shows the results.

“Transformed Gamma Moments

Base Gamma Parameters Moments
Transformed \alpha and \theta^\tau \displaystyle E(Y^k)=E(X^{k/\tau})=\frac{\theta^k \Gamma(\alpha+k/\tau)}{\Gamma(\alpha)}
k>- \alpha \ \tau
Inverse \alpha and \theta^{-1} \displaystyle E(Y^k)=E(X^{-k})=\frac{\theta^k \Gamma(\alpha-k)}{\Gamma(\alpha)}
k<\alpha
Inverse Transformed \alpha and \theta^{-\tau} \displaystyle E(Y^k)=E(X^{-k/\tau})=\frac{\theta^k \Gamma(\alpha-k/\tau)}{\Gamma(\alpha)}
k<\alpha \ \tau

Note that the moments for the transformed gamma distribution exists for all positive k. The moments for inverse gamma are limited (capped by the shape parameter \alpha). The moments for inverse transformed gamma are also limited, this time limited by both parameters. The computation for these moments is usually done by software, except when the argument of the gamma function is a positive integer.

“Transformed” Gamma PDFs

Once the CDFs are known, the PDFs are derived by taking derivative. The following table gives the three PDFs.

“Transformed” Gamma PDFs

Transformed \displaystyle f_Y(y)=\frac{\tau}{\Gamma(\alpha)} \ \biggl( \frac{y}{\theta} \biggr)^{\tau \alpha} \ \frac{1}{y} \ \exp(-(y/\theta)^\tau) y>0
\text{ }
Inverse \displaystyle f_Y(y)=\frac{1}{\Gamma(\alpha)} \ \biggl( \frac{\theta}{y} \biggr)^{\alpha} \ \frac{1}{y} \ \exp(-\theta/y) y>0
\text{ }
Inverse Transformed \displaystyle f_Y(y)=\frac{\tau}{\Gamma(\alpha)} \ \biggl( \frac{\theta}{y} \biggr)^{\tau \alpha} \ \frac{1}{y} \ \exp(-(\theta/y)^\tau) y>0

Each PDF is obtained by taking derivative of the integral for the corresponding CDF.

Example

The post is concluded with one example demonstrating the calculation for CDF and percentiles using the gamma distribution function in Excel.

Example 1
The size of a collision claim from a large pool of auto insurance policies has a transformed gamma distribution with parameters \tau=2, \alpha=2.5 and \theta=4. Determine the following.

  • The probability that a randomly selected claim is greater than than 6.75.
  • The probability that a randomly selected claim is between 4.25 and 8.25.
  • The median size of a claim from this pool of insurance policies.
  • The mean and variance of a claim from this pool of insurance policies.
  • The probability that a randomly selected claim is within one standard deviation of the mean claim size.
  • The probability that a randomly selected claim is within two standard deviations of the mean claim size.

Since \tau=2, this is a transformed gamma distribution. There are two ways to get a handle on this distribution. One way is to plug in the three parameters \tau=2, \alpha=2.5 and \theta=4 into the transformed gamma CDF (in the table Step 2. “Transformed” Gamma CDF (with scale parameter)). This would require using a software that can evaluate the incomplete gamma function or its equivalent integrals. The other way is to know that this distribution is the result of raising the gamma distribution with shape parameter \alpha=2.5 and scale parameter 4^2=16 to the power of 1/2. We take the latter approach.

In Excel, =GAMMA.DIST(A1,B1,C1,TRUE) is the function that produces the gamma CDF F_X(x), assuming that the value x is in cell A1, the shape parameter is in cell B1 and the scale parameter is in cell C1. When the last parameter is TRUE, it gives the CDF. If it is FALSE, it gives the PDF. For example, =GAMMA.DIST(2.5, 2, 1, TRUE) gives the value of 0.712702505.

The transformed gamma distribution in question (random variable Y) is the result of raising gamma \alpha=2.5 and scale parameter 16 (random variable X) to 1/2. Thus the CDF F_Y(y) is obtained from evaluating the gamma CDF F_X(y^2), which is =GAMMA.DIST(y^2, 2.5, 16, TRUE). As a result, the following gives the answers for the first two bullet points.

    P(Y>6.75)=1-F_X(6.75)=1-0.663=0.337
    =GAMMA.DIST(6.75^2, 2.5, 16, TRUE)

    P(4.25 \le Y \le 8.25)=F_X(8.25)-F_X(4.25)=0.8696-0.1876=0.6820

The median or other percentiles of transformed gamma distribution are obtained by a trial and error approach, i.e. by plugging in values of y into the gamma CDF =GAMMA.DIST(y^2, 2.5, 16, TRUE). After performing the trial and error process, we see that F_Y(5.9001425)=0.5 and F_Y(5.9101425)=0.502020715. Thus we take the median to be 5.9. So the median size of a claim is around 5.9.

Computation of the moments of a gamma distribution requires the evaluation of the gamma function \Gamma(\cdot). Excel does not have an explicit function for gamma function. Since \Gamma(\cdot) is in the gamma PDF, we can derive the gamma function from the gamma PDF in Excel =GAMMA.DIST(1, a, 1, FALSE). The value of this gamma PDF is e^{-1}/\Gamma(a). This the value of \Gamma(a) is e^{-1} divided by the value of this gamma PDF value. For example, the following Excel formulas give \Gamma(5) and \Gamma(2.5).

    =EXP(-1)/GAMMA.DIST(1, 5, 1, FALSE) = 24

    =EXP(-1)/GAMMA.DIST(1, 2.5, 1, FALSE) = 1.329340388

The moments E(Y^k) is E(X^{k/2}). Based on the results in the section on “Transformed” gamma moments, the following gives the mean and variance.

    \displaystyle E(Y)=E(X^{1/2})=\frac{16^{1/2} \ \Gamma(2.5+1/2)}{\Gamma(2.5)}=6.018022225

    \displaystyle E(Y^2)=E(X)=2.5 \cdot 16=40

    Var(Y)=40-6.018022225^2=3.783408505

    \sigma_Y=3.783408505^{0.5}=1.945098585

The following gives the probability of a claim within one or two standard deviations of the mean.

    \displaystyle \begin{aligned} P(\mu-\sigma<Y<\mu+\sigma)&=F_Y(\mu+\sigma)-F_Y(\mu-\sigma) \\&=F_Y(7.963120809)-F_Y(4.07292364) \\&=0.839661869-0.161128135 \\&=0.678533734 \end{aligned}

    \displaystyle \begin{aligned} P(\mu-2 \sigma<Y<\mu+2 \sigma)&=F_Y(\mu+2 \sigma)-F_Y(\mu-2 \sigma) \\&=F_Y(9.908219394)-F_Y(2.127825055) \\&=0.968750102-0.010491099 \\&=0.958259003 \end{aligned}

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

Transformed exponential distributions

The processes of creating distributions from existing ones are an important topic in the study of probability models. Such processes expand the tool kit in the modeling process. Two examples: new distributions can be generated by taking independent sum of old ones or by mixing distributions (the result would be called a mixture). Another way to generate distributions is through raising a distribution to a power, which is the subject of this post. Start with a random variable X (the base distribution). Then raising it to a constant generates a new distribution. In this post, the base distribution is the exponential distribution. The next post discusses transforming the gamma distribution.

______________________________________________________________________________

Raising to a Power

Let X be a random variable. Let \tau be a nonzero constant. The new distribution is generated when X is raised to the power of 1 / \tau. Thus the random variable Y=X^{1 / \tau} is the subject of the discussion in this post.

When \tau >0, the distribution for Y=X^{1 / \tau} is called transformed. When \tau=-1, the distribution for Y=X^{1 / \tau} is called inverse. When \tau <0 and \tau \ne -1, the distribution for Y=X^{1 / \tau} is called inverse transformed.

If the base distribution is exponential, then raising it to 1 / \tau would produce a transformed exponential distribution for the case of \tau >0, an inverse exponential distribution for the case of \tau=-1 and an inverse transformed exponential distribution for the case \tau <0 with \tau \ne -1. If the base distribution is a gamma distribution, the three new distributions would be transformed gamma distribution, inverse gamma distribution and inverse transformed gamma distribution.

For the case of inverse transformed, we make the random variable Y=X^{-1 / \tau} by letting \tau >0. The following summarizes the definition.

Name of Distribution Parameter \tau Random Variable
Transformed \tau >0 Y=X^{1 / \tau}
Inverse \tau=-1 Y=X^{1 / \tau}
Inverse Transformed \tau >0 Y=X^{-1 / \tau}

______________________________________________________________________________

Transforming Exponential

The “transformed” distributions discussed here have two parameters, \tau and \theta (\tau=1 for inverse exponential). The parameter \tau is the shape parameter, which comes from the exponent 1 / \tau. The scale parameter \theta is added after raising the base distribution to a power.

Let X be the random variable for the base exponential distribution. The following shows the information on the base exponential distribution.

Base Exponential
Density Function f_X(x)=e^{-x} \ \ \ \ \ \ \ \ \ \ x>0
CDF F_X(x)=1-e^{-x} \ \ \ \ x>0
Survival Function S_X(x)=e^{-x} \ \ \ \ \ \ \ \ \ \ x>0

Note that the above density function and CDF do not have the scale parameter. Once the base distribution is raised to a power, the scale parameter will be added to the newly created distribution.

The following gives the CDF and the density function of the transformed exponential distribution. The density function is obtained by taking the derivative of the CDF.

    \displaystyle \begin{aligned} F_Y(y)&=P(Y \le y) \\&=P(X^{1 / \tau} \le y) \\&=P(X \le y^\tau)\\&=F_X(y^\tau) \\&=1-e^{- y^\tau} \ \ \ \ \ \ \ \ \ \ \ \ \ y>0 \end{aligned}

    \displaystyle f_Y(y)=\tau \ y^{\tau-1} \ e^{- y^\tau} \ \ \ \ \ \ \ \ \ y>0

The following gives the CDF and the density function of the inverse exponential distribution.

    \displaystyle \begin{aligned} F_Y(y)&=P(Y \le y) \\&=P(X^{-1} \le y) \\&=P(X \ge 1/y)\\&=S_X(1/y) \\&=e^{- 1/y} \ \ \ \ \ \ \ \ \ \ \ \ \ y>0 \end{aligned}

    \displaystyle f_Y(y)=\frac{1}{y^2} \ e^{- 1/y} \ \ \ \ \ \ \ \ \ y>0

The following gives the CDF and the density function of the inverse transformed exponential distribution.

    \displaystyle \begin{aligned} F_Y(y)&=P(Y \le y) \\&=P(X^{- 1 / \tau} \le y) \\&=P(X \ge y^{- \tau})\\&=S_X(y^{- \tau}) \\&=e^{- y^{- \tau}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ y>0 \end{aligned}

    \displaystyle f_Y(y)= \frac{\tau}{y^{\tau+1}} \ e^{- 1/y^\tau} \ \ \ \ \ \ \ \ \ y>0

The above derivation does not involve the scale parameter. Now it is added to the results.

Transformed Distribution
Transformed Exponential
CDF F_Y(y)=1-e^{- (y/\theta)^\tau} y>0
Survival Function S_Y(y)=e^{- (y/\theta)^\tau} y>0
Density Function f_Y(y)=(\tau / \theta) \ (y/\theta)^{\tau-1} \ e^{- (y/\theta)^\tau} y>0
Inverse Exponential
CDF F_Y(y)=e^{- \theta/y} y>0
Survival Function S_Y(y)=1-e^{- \theta/y} y>0
Density Function f_Y(y)=\frac{\theta}{y^2} \ e^{- \theta/y} y>0
Inverse Transformed Exponential
CDF F_Y(y)=e^{- (\theta/y)^{\tau}} y>0
Survival Function S_Y(y)=1-e^{- (\theta/y)^{\tau}} y>0
Density Function f_Y(y)=\tau ( \theta / y )^\tau \ (1/y) \ e^{- (\theta/y)^{\tau}} y>0

The transformed exponential distribution and the inverse transformed distribution have two parameters \tau and \theta. The inverse exponential distribution has only one parameter \theta. The parameter \theta is the scale parameter. The parameter \tau, when there is one, is the shape parameter and it comes from the exponent when the exponential is raised to a power.

The above transformation starts with the exponential distribution with mean 1 (without the scale parameter) and the scale parameter \theta is added back in at the end. We can also accomplish the same result by starting with an exponential variable X with mean (scale parameter) \theta^\tau. Then raising X to 1/\tau, -1, and -1/\tau would generate the three distributions described in the above table. In this process, the scale parameter \theta is baked into the base distribution. This makes it easier to obtain the moments of the “transformed” exponential distributions since the moments would be derived from exponential moments.
______________________________________________________________________________

Connection with Weibull Distribution

Compare the density function for the transformed exponential distribution with the density of the Weibull distribution discussed here. Note that the two are identical. Thus raising an exponential distribution to 1 / \tau where \tau >0 produces a Weibull distribution.

On the other hand, raising a Weillbull distribution to -1 produces an inverse Weillbull distribution (by definition). Let F_X(x)=1-e^{- x^\tau} be the CDF of the base Weibull distribution where \tau >0. Let’s find the CDF of Y=X^{-1}. Then add the scale parameter.

    \displaystyle F_Y(y)=P(Y \le y)=P(X \ge 1 / y)=e^{- (1/y)^\tau}

    \displaystyle F_Y(y)=e^{- (\theta/y)^\tau} (scale parameter added)

Note the the CDF of inverse Weibull distribution is identical to the one for inverse transformed exponential distribution. Thus transformed exponential distribution is identical to a Weibull distribution and inverse transformed exponential distribution is identical to an inverse Weibull distribution.

Since Weibull distribution is the same as transformed exponential distribution, the previous post on Weibull distribution can inform us on transformed exponential distribution. For example, assuming that the Weibull distribution (or transformed exponential) is a model for the time until death of a life, varying the shape parameter \tau yields different mortality patterns. The following are two graphics from the proevious post.

Figure 1

weibull-pdfs-2

Figure 2

weibull-failure-rates

Figure 1 shows the Weibull density functions for different values of the shape parameter (the scale parameter \theta is fixed at 1). The curve for \tau=1 is the exponential density curve. It is clear that the green density curve (\tau=2) approaches the x-axis at a faster rate then the other two curves and thus has a lighter tail than the other two density curves. In general, the Weibull (transformed exponential) distribution with shape parameter \tau >1 has a lighter tail than the Weibull with shape parameter 0<\tau <1.

Figure 2 shows the failure rates for the Weibull (transformed exponential) distributions with the same three values of \tau. Note that the failure rate for \tau=0.5 (blue) decreases over time and the failure rate for \tau=2 increases over time. The failure rate for \tau=1 is constant since it is the exponential distribution.

What is being displayed in Figure 2 describes a general pattern. When the shape parameter is 0<\tau<0.5, the failure rate decreases as time increases and the Weibull (transformed exponential) distribution is a model for infant mortality, or early-life failures. Hence these Weibull distributions have a thicker tail as shown in Figure 1.

When the shape parameter is \tau >1, the failure rate increases as time increases and the Weibull (transformed exponential) distribution is a model for wear-out failures. As times go by, the lives are fatigued and “die off.” Hence these Weibull distributions have a lighter tail as shown in Figure 1.

When \tau=1, the resulting Weibull (transformed exponential) distribution is exponential. The failure rate is constant and it is a model for random failures (failures that are independent of age).

Thus the transformed exponential family has a great deal of flexibility for modeling the failures of objects (machines, devices).

______________________________________________________________________________

Moments and Other Distributional Quantities

The moments for the three “transformed” exponential distributions are based on the gamma function. The two inverse distributions have limited moments. Since the transformed exponential distribution is identical to Weibull, its moments are identical to that of the Weibull distribution. The moments of the “transformed” exponential distributions are E(Y)=E(X^{1 / \tau}) where X has an exponential distribution with mean (scale parameter) \theta^\tau. See here for the information on exponential moments. The following shows the moments of the “transformed” exponential distributions.

Name of Distribution Moment
Transformed Exponential E(Y^k)=\theta^k \Gamma(1+k/\tau) k >- \tau
Inverse Exponential E(Y^k)=\theta^k \Gamma(1-k) k <1
Inverse Transformed Exponential E(Y^k)=\theta^k \Gamma(1-k/\tau) k <\tau

The function \Gamma(\cdot) is the Gamma function. The transformed exponential moment E(Y^k) exists for all k >- \tau. The moments are limited for the other two distributions. The first moment E(Y) does not exist for the inverse exponential distribution. The inverse transformed exponential moment E(Y^k) exist only for k<\tau. Thus the inverse transformed exponential mean and variance exist only if the shape parameter \tau is larger than 2.

The distributional quantities that are based on moments can be calculated (e.g. variance, skewness and kurtosis) when the moments are available. For all three "transformed" exponential distributions, percentiles are easily computed since the CDFs contain only one instance of the unknown y. The following gives the mode of the three distributions.

Name of Distribution Mode
Transformed Exponential \displaystyle \theta \biggl(\frac{\tau-1}{\tau} \biggr)^{1/\tau} for \tau >1, else 0
Inverse Exponential \theta / 2
Inverse Transformed Exponential \displaystyle \theta \biggl(\frac{\tau}{\tau+1} \biggr)^{1/\tau}

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

Pareto Distribution

The Pareto distribution is a power law probability distribution. It was named after the Italian civil engineer, economist and sociologist Vilfredo Pareto, who was the first to discover that income follows what is now called Pareto distribution, and who was also known for the 80/20 rule, according to which 20% of all the people receive 80% of all income. This post is a discussion on the mathematical properties of this distribution and its applications.

_______________________________________________________________________________________________

Pareto Distribution of Type I

There are several types of the Pareto distribution. Let’s start with Type I. The random variable X is said to follow a Type I Pareto distribution if the following is the survival function,

    \displaystyle  S(x)=P(X>x)=\left\{ \begin{array}{ll}                     \displaystyle  \biggl( \frac{x_m}{x} \biggr)^\alpha &\ x \ge x_m \\           \text{ } & \text{ } \\           \displaystyle  1 &\ x<x_m           \end{array} \right.

where x_m and \alpha are both positive parameters. The support of the distribution is the interval [x_m,\infty). The parameter x_m is a scale parameter and \alpha is a shape parameter. The parameter \alpha is also known as the tail index. When the Pareto distribution is used as a model of wealth or income, \alpha is also known as the Pareto index, which is a measure of the breath of the wealth distribution.

The following table lists out the cumulative distribution function (CDF) and the probability density function (PDF).

__________________________________________________________________________________________
Pareto Type I – Probability Functions

Survival Function S(x)=\displaystyle  \biggl( \frac{x_m}{x} \biggr)^\alpha \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x \ge x_m>0
Cumulative Distribution Function F(x)=1-\displaystyle  \biggl( \frac{x_m}{x} \biggr)^\alpha \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x \ge x_m>0
Probability Density Function \displaystyle f(x)=\frac{\alpha \ x_m^\alpha}{x^{\alpha+1}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \  x \ge x_m>0

__________________________________________________________________________________________

The following figure shows the graphs of the PDFs for the shape parameters \alpha=1,2,3.

Figure 1 – Pareto PDFs (Type I)

All the density curves in Figure 1 are skewed to the right and have a long tail. However, some tails are thicker than the others. It is noticeable that the curve with a higher value of \alpha approaches the x-axis faster, hence has a lighter tail comparing to the density curve with a lower value of \alpha. The role of \alpha is discussed further below. The following table lists out several more Pareto distributional quantities.

__________________________________________________________________________________________
Pareto Type I – Additional Distributional Quantities

Mean \displaystyle E(X)=\frac{\alpha \ x_m}{\alpha-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>1
Median \displaystyle x_m \ 2^{\frac{\alpha}{2}}
Mode x_m
Variance \displaystyle Var(X)=\frac{x_m^2 \ \alpha}{(\alpha-1)^2 \ (\alpha-2)} \ \ \ \ \ \ \ \alpha>2
Higher Moments \displaystyle E(X^k)=\frac{\alpha \ x_m^k}{\alpha-k} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>k
Skewness \displaystyle \frac{2(1+\alpha)}{\alpha-3} \ \sqrt{\frac{\alpha-2}{\alpha}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>3
Excess Kurtosis \displaystyle \frac{6(\alpha^3+\alpha^2-6 \alpha-2)}{\alpha (\alpha-3) (\alpha-4)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>4

__________________________________________________________________________________________

Given the survival function, it is straightforward to derive the CDF and the PDF. The mean and higher moments can also be derived by evaluating the integral \int_{x_m}^\infty x^k f(x) dx. Once the moments are obtained, other quantities that depend on moments can be derived (e.g. variance, skewness and excess kurtosis). The following gives the definition for these distributional quantities.

Distributional Quantity Definition
Moments \displaystyle E(X^k)=\int_{x_m}^\infty x^k f(x) \ dx
Variance Var(X)=E(X^2)-E(X)^2
Skewness \displaystyle \gamma_1=\frac{E[(X-\mu)^3]}{\sigma^3}=\frac{E(X^3)-3 \mu \sigma^2 - \mu^3}{(\sigma^2)^{\frac{3}{2}}}
Kurtosis \displaystyle \frac{E[(X-\mu)^4]}{\sigma^4}
Excess Kurtosis Kurtosis – 3

In these definitions, \mu and \sigma are the mean and standard deviation of a given distribution, respectively. Then \sigma^2 is the variance. The skewness of a distribution is the ratio of the third central moment E[(X-\mu)^3] to the cube of the standard deviation. The kurtosis is the ratio of the fourth central moment E[(X-\mu)^4] to the square of the variance. This previous post has a detailed discussion on the skewness.

_______________________________________________________________________________________________

A Closer Look at the Shape Parameter

The above tables show that the Pareto distribution is mathematically tractable, especially when it comes to the calculation of moments. Another observation is that the mean and other moments do not always exist. This stems from the fact that when the shape parameter \alpha is too small, the integral for the moment E(X^k) may not converge.

The mean exists only when the shape parameter \alpha is greater than 1. The variance exists only when the shape parameter \alpha is greater than 2. In general, the kth moment exists only when the shape parameter \alpha is greater than k. The larger the shape parameter \alpha, the more moments that can be calculated. All kth moments where k<\alpha can be calculated. However, the kth moment for any k>\alpha cannot be calculated.

Having moments that cannot exist is a sign that the distribution has a heavy tail. Let’s examine the graphs of Pareto survival functions and CDFs.

Figure 2 – Pareto Survival Functions (Type I)

Figure 2 shows the survival function S(x)=P(X>x) for three values of the shape parameter \alpha where x>1 (the scale parameter is 1). The following figure shows the corresponding cumulative distributions F(x)=1-S(x)=P(X \le x).

Figure 3 – Pareto CDFs (Type I)

The survival function S(x) is the probability of the right tail (x,\infty). On the other hand, the CDF F(x) is the probability put on the initial interval (x_m,x]. The sum of the two is obviously 1.0. One thing that stands out in Figure 2 is that the larger the \alpha, the faster the survival curve approaches zero and thus less probabilities are put on the right tail. In other words, more probabilities are attached to the lower values and thus the integral for the moments is more likely to converge when \alpha is larger. This explains that it is possible for more of the moments to exist for a Pareto distribution with a larger \alpha. Thus kth moments exists for the lower k when \alpha is larger, confirming the earlier observation.

Another thing to point out in Figure 2 is that the distribution with the larger \alpha has a lighter right tail and the one with a smaller \alpha has a heavier right tail. So within the Pareto family, a lower \alpha means a distribution with a heavier tail and a larger \alpha means a lighter tail.

A comparison with other families of distributions is also instructive. All moments exist for the gamma distributions (including exponential distributions) and for the lognormal distribution as well as the normal distribution. Moment generating functions also exist for all these distributions. In contrast, the moment generating function does not exist for Pareto distributions (otherwise all moments would exist). These are signs that the Pareto distributions are heavy tailed distributions. For a more in depth discussion of the tail weight of the Pareto family, see this blog post in an affiliated blog. The Pareto distribution discussed there is of Pareto Type II.

When the Pareto model is used as a model of lifetime of systems (machines or devices), a larger value of the shape parameter \alpha would mean that less “lives” surviving to old ages, equivalently more lives die off in relatively young ages (as discussed above this means a lighter right tail). If the Pareto model is used as a model of income or wealth of individuals, then a higher \alpha would mean a smaller proportion of the people are in the higher income brackets (or more people in the lower income ranges). Thus the shape parameter \alpha is called the Pareto index, which is a measure of the breath of income/wealth. The higher this measure, the less inequality in income.

_______________________________________________________________________________________________

Log-Linear Model

We now discuss the motivation behind the Pareto survival function. The Pareto distribution is a power law distribution. It is a model that can describe phenomena that behave in a log-linear fashion. Let’s revisit the original reasoning for using the Pareto survival function as a model of income.

Let N(x) be the number of people with income greater than x. Suppose that x_0 be the minimum income in the population in question. Then N(x_0) be the size of the entire population. Pareto proposed that N(x) can be modeled in a log-linear fashion:

    \log N(x)=\log C - \alpha \log x

where log is logarithm to the base e, C is a constant and \alpha is a positive parameter. If this relation holds, it would hold at the minimum income level x_0.

    \log N(x_0)=\log C - \alpha \log x_0

Letting the first relation subtract the second gives the following:

    \displaystyle \log \biggl[ \frac{N(x)}{N(x_0)} \biggr]=- \alpha \log \biggl[ \frac{x}{x_0} \biggr]

Raising the natural log constant e to each side gives the following:

    \displaystyle \frac{N(x)}{N(x_0)}=\biggl( \frac{x}{x_0} \biggr)^{- \alpha}

    \displaystyle \frac{N(x)}{N(x_0)}=\biggl( \frac{x_0}{x} \biggr)^{\alpha}

Note that the left hand side of the last equation is the proportion of the people having income greater than x, which of course is the survival function described at the beginning.

_______________________________________________________________________________________________

The Hierarchy of Pareto Distribution

The Pareto survival function discussed above is of Type I. The following lists out the survival functions of the other types.

__________________________________________________________________________________________
Pareto Distributions

Pareto Type Survival Function Support Parameters
Type I \displaystyle  \biggl[ \frac{x_m}{x} \biggr]^\alpha x>x_m \alpha>0, x_m>0
Lomax \displaystyle  \biggl[ \frac{x_m}{x+x_m} \biggr]^\alpha x>0 \alpha>0, x_m>0
Type II \displaystyle  \biggl[\frac{x_m}{(x-\mu)+x_m} \biggr]^\alpha x>\mu \mu, \alpha>0, x_m>0
Type III \displaystyle  \biggl[\frac{(x_m)^{\frac{1}{\gamma}}  }{(x-\mu)^{\frac{1}{\gamma}}+(x_m)^{\frac{1}{\gamma}}} \biggr] x>\mu \mu, \gamma>0, x_m>0
Type IV \displaystyle  \biggl[\frac{(x_m)^{\frac{1}{\gamma}}  }{(x-\mu)^{\frac{1}{\gamma}}+(x_m)^{\frac{1}{\gamma}}} \biggr]^\alpha x>\mu \mu, \alpha>0, \gamma>0, x_m>0

__________________________________________________________________________________________

The lower types are special cases of the higher types. For example, Lomax is Type I shifted to the left by the amount x_m. Type II with \mu=0 becomes Lomax. Type III with \gamma=1 becomes Type II with \alpha=1. Type IV with \alpha=1 becomes Type III.

We discuss Type II Lomax in the next section. For the other types, see the Pareto Wikipedia entry.

_______________________________________________________________________________________________

Pareto Type II Lomax

The Pareto distribution of Lomax type is the result of shifting Type I to the left by the amount x_m, the scale parameter in Pareto Type I. As a result, the support is now the entire positive x-axis. Some of the mathematical properties of the Lomax Type can be derived by making the appropriate shifting. For the sake of completeness, the following table lists out some of the basic distributional quantities. The scale parameter x_m is renamed \theta.

__________________________________________________________________________________________
Pareto Type I Lomax – Distributional Quantities

Survival Function S(x)=\displaystyle  \biggl( \frac{\theta}{x+\theta} \biggr)^\alpha  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x >0
Cumulative Distribution Function F(x)=1-\displaystyle  \biggl( \frac{\theta}{x+\theta} \biggr)^\alpha \ \ \ \ \ \ \ \ \ \ \ \ \ x >0
Probability Density Function \displaystyle f(x)=\frac{\alpha \ \theta^\alpha}{(x+\theta)^{\alpha+1}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \  x >0
Mean \displaystyle E(X)=\frac{\theta}{\alpha-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>1
Median \displaystyle \theta \ 2^{\frac{\alpha}{2}}-\theta
Mode 0
Variance \displaystyle Var(X)=\frac{\theta^2 \ \alpha}{(\alpha-1)^2 \ (\alpha-2)} \ \ \ \ \ \ \ \alpha>2
Higher Moments \displaystyle E(X^k)=\frac{k! \ \theta^k}{(\alpha-1) \cdots (\alpha-k)} \ \ \ \ \ \ \alpha>k \ \ \ k is integer

To help see the shifting, let Y be a Pareto Type I random variable with shape parameter \alpha and scale parameter \theta. Then X=Y-\theta is a Pareto Type II Lomax random variable. Immediately, E(X)=E(Y)-\theta, which simplified to \frac{\theta}{\alpha-1}. On the other hand, shifting by a constant does not change the variance. If S_Y(x) and S_X(x) represent the survival functions for Y and X, respectively, then S_X(x)=S_Y(x+\theta). The same can be said about the CDFs and PDFs.

Another interesting fact about Pareto Lomax type is that it is the mixture of exponential distributions with gamma mixing weight. An insurance interpretation is a good motivation. Suppose that the loss arising from an insured randomly selected from a large group of insureds follow an exponential distribution with the following probability density function:

    f_{X \lvert \Lambda}(x \lvert \lambda)= \lambda \ e^{-\lambda x} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x>0

The above density function is from an exponential distribution. However it is conditional one since the parameter \Lambda=\lambda is uncertain. Since the density function f_{X \lvert \Lambda}(x \lvert \lambda) is that of an exponential distribution, the mean claim cost for this insured is \frac{1}{\lambda}. So the parameter \Lambda=\lambda reflects the risk characteristics of the insured. Suppose this is a large pool of insureds. Then there is uncertainty in the parameter \Lambda=\lambda. It is more appropriate to regard \Lambda as a random variable in order to capture the wide range of risk characteristics across the individuals in the population. As a result, the pdf indicated above is not an unconditional pdf, but, rather, a conditional pdf of X. Suppose that the uncertain parameter \Lambda follows a gamma distribution with shape parameter \alpha and scale parameter \theta with the following PDF.

    \displaystyle g(\lambda)=\frac{\theta^\alpha}{\Gamma(\alpha)} \ \lambda^{\alpha-1} \ e^{-\theta \lambda} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \lambda>0

where \Gamma(\cdot) is the gamma function. The gamma distribution has been written extensively in this blog. Here is a post on the gamma function ans here is an introduction on the gamma distribution.

The unconditional density function of X is then the weighted average of the conditional density f_{X \lvert \Lambda}(x \lvert \lambda) weighted by the above gamma density function.

    \displaystyle \begin{aligned} f_X(x)&=\int_0^\infty f_{X \lvert \Lambda}(x \lvert \lambda) \ g(\lambda) \ d \lambda \\&=\int_0^\infty \biggl( \lambda \ e^{-\lambda x} \biggr) \ \biggl( \frac{\theta^\alpha}{\Gamma(\alpha)} \ \lambda^{\alpha-1} \ e^{-\theta \lambda} \biggr) \ d \lambda \\&=\int_0^\infty \frac{\theta^\alpha}{\Gamma(\alpha)} \lambda^\alpha e^{-(\theta+x) \lambda} \ d \lambda \\&=\frac{\theta^\alpha}{\Gamma(\alpha)} \frac{\Gamma(\alpha+1)}{(\theta+x)^{\alpha+1}} \int_0^\infty \frac{(\theta+x)^{\alpha+1}}{\Gamma(\alpha+1)} \ \lambda^{\alpha+1-1} \ e^{-(\theta+x) \lambda} \ d \lambda \\&=\frac{\alpha \theta^{\alpha}}{(\theta+x)^{\alpha+1}} \end{aligned}

The above derivation shows that the unconditional density function of X is a Pareto Lomax density function. Thus if each individual insured in a large pool of insureds has an exponential claim cost distribution where the rate parameter \Lambda is distributed according to a gamma distribution, then the unconditional claim cost for a randomly selected insured is distributed according to a Pareto Lomax distribution. Mathematically speaking, the Pareto Lomax distribution is a mixture of exponential distributions with gamma mixing weights.

In the above discussion, we comment that Pareto Type I distribution has a heavy tail as compared to other distribution. One of the tell tale signs is that not all moments exist in a Pareto distribution. Th Pareto Lomax distribution is also a heavy tailed distribution. This blog post in an affiliated blog has a detailed discussion. The discussion in that blog post examines Pareto Lomax as a heavy tailed distribution in four perspectives: existence of moments, speed of decay of the survival function to zero, hazard rate function, and mean excess loss function. Another blog post discusses the Pareto Lomax distribution as a mixture of exponential distributions with gamma mixing weights.

_______________________________________________________________________________________________

Remarks

The Pareto distribution is positively skewed and has a heavy tail on the right. It is an excellent model for extreme phenomena, e.g. the long tail contains 80% or more of the probabilities. It is originally applied as a model to describe income and wealth of a country. In insurance applications, heavy-tailed distributions such as Pareto are essential tools for modeling extreme loss, especially for the more risky types of insurance such as medical malpractice insurance. In financial applications, the study of heavy-tailed distributions provides information about the potential for financial fiasco or financial ruin.

For more information on the mathematical aspects of the Pareto distribution, refer to the text by Johnson and Kotz. For an actuarial perspective, refer to the text Loss Models.

_______________________________________________________________________________________________

Reference

  1. Johnson N. L., Kotz S., Continuous Univariate Distributions – I, Hougton Mifflin Company, Boston, 1970
  2. Klugman S.A., Panjer H. H., Wilmot G. E., Loss Models, From Data to Decisions, Fourth Edition, Wiley-Interscience, a John Wiley & Sons, Inc., New York, 2012.

_______________________________________________________________________________________________
\copyright 2017 – Dan Ma

Lognormal distribution

This post discusses the basic properties of the lognormal distribution. The lognormal distribution is a transformation of the normal distribution through exponentiation. As a result, some of the mathematical properties of the lognormal distribution can be derived from the normal distribution. The normal distribution is applicable in many situations but not in all situations. The normal density curve is a bell-shaped curve and is thus not appropriate in phenomena that are skewed to the right. In such situations, the lognormal distribution can be a good alternative to the normal distribution. In an actuarial setting, the lognormal distribution is an excellent candidate for a model of insurance claim sizes.

_______________________________________________________________________________________________

Defining the distribution

In this post, the notation log refers to the natural log function, i.e., logarithm to the base e=2.718281828459 \cdots. Thus log(2) = 0.693147181.

A random variable Y is said to follow a lognormal distribution if \log(Y) follows a normal distribution. A lognormal distribution has two parameters \mu and \sigma, which are the mean and standard deviation of the normal random variable \log(Y). To be more precise, the definition is restated as follows:

A random variable Y is said to follow a lognormal distribution with parameters \mu and \sigma if \log(Y) follows a normal distribution with mean \mu and standard deviation \sigma.

Many useful probability distributions are transformations of other known distributions. The above definition shows that a normal distribution is the transformation of a lognormal distribution under the natural logarithm. Start with a lognormal distribution, taking the natural log of it gives you a normal distribution. The other direction is actually more informative, i.e., a lognormal distribution is the transformation of a normal distribution by the exponential function. Start with a normal random variable X, the exponentiation of it is a lognormal distribution, i.e., Y=e^{X} is a lognormal distribution. The following summarizes these two transformations.

If the random variable Y is lognormal, then the random variable X=\log(Y) is normal.
If the random variable X is normal, then the random variable Y=e^{X} is lognormal.

_______________________________________________________________________________________________

Comparing Normal and Lognormal

Normally one of the first things to focus on is the probability density function when studying a continuous probability model. In the case of the lognormal distribution, a natural way to start is to focus on the relationship between lognormal distribution and normal distribution. In this section, we compare the following:

  • The lognormal distribution with parameters \mu = 0 and \sigma = 1 (standard lognormal distribution).
  • The normal distribution with mean 0 and standard deviation 1 (standard normal distribution).

Suppose that the random variable Y follows the standard lognormal distribution. This means that X=\log(Y) is the normal distribution with mean 0 and standard deviation 1, i.e. the standard normal distribution. Going the other direction, Y=e^X is the lognormal distribution we start with.

The standard normal X takes on values from -\infty to \infty. But most likely, about 99.7% of the times, X takes on values between -3 and 3 (in general, a normal random variable takes on values between -3 standard deviations and +3 standard deviation about 99.7% of the time).

Since exponentiation always gives positive values, the standard lognormal random variable Y always takes on positive values. Since X is rarely outside of -3 and +3, the standard lognormal random variable Y will takes on values between e^{-3} = 0.049787068 to e^{3} = 20.08553692 about 99.7% of the time. Thus observing a standard lognormal value over 20 would be an extremely rare event. The following figure shows the graphs of the standard normal density function and the standard lognormal density function.

    Figure 1 – normal and lognormal density curves
    standard normal - lognormal PDFs

In Figure 1, the standard normal density curve is symmetric bell shaped curve, with mean and median located at x = 0. The standard lognormal density is located entirely over the positive x-axis. This is because the exponential function always gives positive values regardless of the sign of the argument. The lognormal density curve in Figure 1 is not symmetric and is a uni-modal curve skewed toward the right. All the standard normal values on the negative x-axis, when exponentiated, are in the interval (0, 1). Thus in Figure 1, the lower half of the lognormal probabilities lie in the interval x = 0 to x = 1 (i.e. the median of this lognormal distribution is x = 1). The other half of the lognormal probabilities lie in the interval (1, \infty). Such lopsided assignment of probabilities shows that lognormal distribution is a positively skewed distribution (skewed to the right).

In the above paragraph, the lower half of the normal distribution on (-\infty,0) is matched with the lognormal distribution on the interval (0, 1). Such interval matching can tell us a great deal about the lognormal distribution. Another example: about 75% of the standard normal distributional values lie below x = 0.67. Thus in Figure 1, about 75% of the lognormal probabilities lie in the interval (0, 1.95) where e^{0.67} \approx 1.95. Another example: what is the probability that the lognormal distribution in Figure 1 lie between 1 and 3.5? Then the normal matching interval is (0, 1.25) where \log(3.5) \approx 1.25. The normal probability in this interval is 0.8944 – 0.5 = 0.3944. Thus randomly generated a value in the standard lognormal distribution, there is a 39.44% percent chance that it is between 1 and 3.5.

The interval matching idea is very useful for computing lognormal probabilities (e.g. cumulative distribution function) and for finding lognormal percentiles. This idea is discussed further below to make it work for any lognormal distribution, not just the standard lognormal distribution. The following compares the two cumulative distribution functions (CDFs).

    Figure 2 – normal and lognormal CDFs

Note that the standard normal CDF basically reaches the level of y = 1 when the x-values get close to 3.0. The lognormal CDF approaches 1.0 too, but at a much slower rate. The lognormal CDF is close to 1 when x = 10 and is rapidly approaching 1 after that point.

Though lognormal distribution is a skewed distribution, some are less skewed than others. The lognormal distributions with larger parameter value of \sigma tend to be more skewed. The following is a diagram of three lognormal density curves that demonstrates this point. Note that the small \sigma of 0.25 relatively resembles a bell curve.

    Figure 3 – three lognormal density curves
    Three lognormal density curves

_______________________________________________________________________________________________

How to compute lognormal probabilities and percentiles

Let Y be a random variable that follows a lognormal distribution with parameters \mu and \sigma. Then the related normal random variable is X=\log(Y), which has mean \mu and standard deviation \sigma. If we raise e to X, we get back the lognormal Y.

Continuing the interval matching idea, the lognormal interval Y \le y will match with the normal interval X \le \log(y). Both intervals receive the same probability in their respective distributions. The following states this more clearly.

    \displaystyle P\biggl(Y \le y\biggr)=P\biggl(X=\log(Y) \le \log(y)\biggr) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

On the other hand, the normal interval X \le x will match with the lognormal interval Y \le e^x. The same probability is assigned to both intervals in their respective distributions. This idea is stated as follows:

    \displaystyle P\biggl(X \le x\biggr)=P\biggl(Y=e^X \le e^x\biggr) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

The idea of (1) gives the cumulative distribution of the lognormal distribution (argument y), which is evaluated as the CDF of the corresponding normal distribution at \log(y). One obvious application of (2) is to have an easy way to find percentiles for the lognormal distribution. It is relatively easy to find the corresponding percentile of the normal distribution. Then the lognormal percentile is e raised to the corresponding percentile of the normal distribution. For example, the median of the normal distribution is at the mean \mu. Then the median of the lognormal distribution is e^{\mu}.

The calculation in both (1) and (2) involve finding normal probabilities, which can be obtained using software or using a table of probability values of the standard normal distribution. To do the table approach, each normal CDF is converted to the standard normal CDF as follows:

    \displaystyle P\biggl(X \le x \biggr)=P\biggl(\frac{X-\mu}{\sigma} \le \frac{x-\mu}{\sigma} \biggr)=\Phi(z) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)

where z is the z-score which is the ratio (x-\mu)/\sigma and \Phi(z) is the cumulative distribution function of the standard normal distribution, which can be looked up from a table based on the z-score. In light of this, (1) can be expressed as follows:

    \displaystyle P\biggl(Y \le y\biggr)=\Phi\biggl(\frac{\log(y)-\mu}{\sigma} \biggr) \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)

A quick example to demonstrate how this works.

Example 1
If Y is lognormally distributed with parameters \mu=2.5 and \sigma=1.5,

  • what is the probability P(1.9 \le Y \le 31.34)?
  • what is the 95th percentile of this lognormal distribution?

The first answer is P(1.9 \le Y \le 31.34)=P(Y \le 31.34)-P(Y \le 1.9), which is calculated as follows:

    \displaystyle P(Y \le 31.34)=\Phi \biggl( \frac{\log(31.34)-2.5}{1.5}\biggr)=\Phi(0.63)=0.7357

    \displaystyle P(Y \le 1.9)=\Phi \biggl( \frac{\log(1.9)-2.5}{1.5}\biggr)=\Phi(-1.24)=1-0.8925=0.1075

    \displaystyle \begin{aligned} P(1.9 \le Y \le 31.34)&=P(Y \le 31.34)-P(Y \le 1.9) \\&=0.7357-0.1075 \\&=0.6282  \end{aligned}

The z-score for the 95th percentile for the standard normal distribution is z = 1.645. Then the 95th percentile for the normal distribution with mean 2.5 and standard deviation 1.5 is x = 2.5 + 1.645 (1.5) = 4.9675. Then apply the exponential function to obtain e^{4.9675} \approx 143.67, which is the desired lognormal 95th percentile. \square

As (2) and Example 1 suggest, to find a lognormal percentile, first find the percentile for the corresponding normal distribution. If x_p is the 100pth percentile of the normal distribution, then e^{x_p} is the 100pth percentile of the lognormal distribution. Usually, we can first find the 100pth percentile for the standard normal distribution z_p. Then the normal percentile we need is x=\mu + z_p \cdot \sigma. The lognormal percentile is then:

    \displaystyle e^{\displaystyle \mu + z_p \cdot \sigma}=\text{lognormal } 100p \text{th percentile} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (5)

The above discussion shows that the explicit form of the lognormal density curve is not needed in computing lognormal probabilities and percentiles. For the sake of completeness, the following shows the probability density functions of both the normal distribution and the lognormal distribution.

    Normal PDF
    \displaystyle f_X(x)=\frac{1}{\sigma \sqrt{2 \pi}} \ \ \text{\Large e}^{\displaystyle -\frac{(x-\mu)^2}{2 \sigma^2}} \ \ \ \ \ \ \ \ \ \ \ -\infty<x<\infty \ \ \ \ \ \ \ \ \ \ (6)

    Lognormal PDF
    \displaystyle f_Y(y)=\frac{1}{y \ \sigma \sqrt{2 \pi}} \ \ \text{\Large e}^{\displaystyle -\frac{(\log(y)-\mu)^2}{2 \sigma^2}} \ \ \ \ \ 0<y<\infty \ \ \ \ \ \ \ \ \ \ \ \ \ (7)

The cumulative distribution function for the lognormal distribution is then

    \displaystyle F_Y(y)=\int_0^y \frac{1}{t \ \sigma \sqrt{2 \pi}} \ \ \text{\Large e}^{\displaystyle -\frac{(\log(t)-\mu)^2}{2 \sigma^2}} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (8)

Of course, we do not have to use (8) since the lognormal CDF can be obtained based on the corresponding normal CDF.

One application of the lognormal PDF in (7) is to use it to find the mode (by taking its derivative and finding the critical value). The mode of the lognormal distribution with parameters \mu and \sigma is \displaystyle e^{\mu - \sigma^2}.

_______________________________________________________________________________________________

How to find lognormal moments

To find the mean and higher moments of the lognormal distribution, we once again rely on basic information about normal distribution. For any random variable T (normal or otherwise), its moment generating function, if exists, is defined by M_T(t)=E(e^{t \ T}). The following is the moment generating function of the normal distribution with mean \mu and standard deviation \sigma.

    \displaystyle M_X(t)=\text{\Large e}^{\displaystyle \biggl[ \mu t + (1/2) \sigma^2 t^2 \biggr]} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (9)

As before, let Y be a random variable that follows a lognormal distribution with parameters \mu and \sigma. Then Y=e^X where X is normal with mean \mu and standard deviation \sigma. Then E(Y)=E(e^X) is simply the normal moment generating function evaluated at 1. In fact, the kth moment of Y, E(Y^k)=E(e^{k X}), is simply the normal mgf evaluated at k. Because the mgf of the normal distribution is defined at any real number, all moments for the lognormal distribution exist. The following gives the moments explicitly.

    E(Y)=\text{\Large e}^{\displaystyle \biggl[ \mu+(1/2) \sigma^2 \biggr]}

    E(Y^k)=\text{\Large e}^{\displaystyle \biggl[ k \mu+(k^2/2) \sigma^2 \biggr]} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (10)

In particular, the variance and standard deviation are:

    \displaystyle \begin{aligned}Var(Y)&=\text{\Large e}^{\displaystyle \biggl[ 2 \mu+2 \sigma^2 \biggr]}-\text{\Large e}^{\displaystyle \biggl[ 2 \mu+\sigma^2 \biggr]} \\&=\biggl(\text{\Large e}^{\displaystyle \sigma^2}-1\biggr) \text{\Large e}^{\displaystyle \biggl[ 2 \mu+\sigma^2 \biggr]} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (11) \end{aligned}

    \displaystyle \sigma_Y=\sqrt{\text{\Large e}^{\displaystyle \sigma^2}-1} \ \ \text{\Large e}^{\displaystyle \biggl[ \mu+\frac{1}{2} \sigma^2 \biggr]}=\sqrt{\text{\Large e}^{\displaystyle \sigma^2}-1} \ E(Y) \ \ \ \ \ \ \ \ \ \ \ \ (12)

The formulas (11) and (12) give the variance and standard deviation if the parameters \mu and \sigma are known. They do not need to be committed to memory, since they can always be generated from knowing the moments in (10). As indicated before, the lognormal median is e^{\mu}, which is always less than the mean, which is e raised to \mu+(1/2) \sigma^2. So the mean is greater than the median by a factor of e raised to (1/2) \sigma^2. The mean being greater than the median is another sign that the lognormal distribution is skewed right.

Example 2
Suppose Y follows a lognormal distribution with mean 12.18 and variance 255.02. Determine the probability that Y is greater than its mean.

With the given information, we have:

    E(Y)=\text{\Large e}^{\displaystyle \biggl[ \mu+(1/2) \sigma^2 \biggr]}=12.18

    \biggl(\text{\Large e}^{\displaystyle \sigma^2}-1\biggr) \text{\Large e}^{\displaystyle \biggl[ 2 \mu+\sigma^2 \biggr]}=\biggl(\text{\Large e}^{\displaystyle \sigma^2}-1\biggr) \biggl(\text{\Large e}^{\displaystyle \biggl[ \mu+(1/2) \sigma^2 \biggr]}\biggr)^2=255.02

    \displaystyle \begin{aligned}Var(Y)&=\biggl(\text{\Large e}^{\displaystyle \sigma^2}-1\biggr) \text{\Large e}^{\displaystyle \biggl[ 2 \mu+\sigma^2 \biggr]} \\&=\biggl(\text{\Large e}^{\displaystyle \sigma^2}-1\biggr) \biggl(\text{\Large e}^{\displaystyle \biggl[ \mu+(1/2) \sigma^2 \biggr]}\biggr)^2  \\&=  \biggl(\text{\Large e}^{\displaystyle \sigma^2}-1\biggr) 12.18^2=255.02    \end{aligned}

From the last equation, we can solve for \sigma. The following shows the derivation.

    \displaystyle \begin{aligned} &\biggl(\text{\Large e}^{\displaystyle \sigma^2}-1\biggr)=\frac{255.02}{12.18^2}   \\&\text{\Large e}^{\displaystyle \sigma^2}=2.719014994  \\&\sigma^2=\log(2.719014994)=1.00026968 \end{aligned}

Thus we can take \sigma=1. Then plug \sigma=1 into E(Y) to get \mu=2. The desired probability is:

    \displaystyle \begin{aligned} P(Y>12.18)&=1-\Phi \biggl(\frac{\log(12.18)-2}{1}  \biggr) \\&=1-\Phi(0.499795262)  \\&=1-\Phi(0.5)=1-0.6915=0.3085  \ \square    \end{aligned}

_______________________________________________________________________________________________

Linear transformations

For any random variable W, a linear transformation of W is the random variable aW+b where a and b are real constants. It is well known that if X follows a normal distribution, any linear transformation of X also follows a normal distribution. Does this apply to lognormal distribution? A linear transformation of a lognormal distribution may not have distributional values over the entire positive x-axis. For example, if Y is lognormal, then Y+1 is technically not lognormal since the values of Y+1 lie in (1, \infty) and not (0, \infty). Instead, we focus on the transformations cY where c>0 is a constant. We have the following fact.

    If Y has a lognormal distribution with parameters \mu and \sigma, then cY has a lognormal distribution with parameters \mu+\log(c) and \sigma.

The effect of the constant adjustment of the lognormal distribution is on the \mu parameter, which is adjusted by adding the natural log of the constant c. Note that the adjustment on \mu is addition and not multiplication. The \sigma parameter is unchanged.

One application of the transformation cY is that of inflation. For example, suppose Y represents claim amounts in a given calendar year arising from a group of insurance policies. If the insurance company expects that the claims amounts in the next year will increase by 10%, then 1.1Y is the random variable that models next year’s claim amounts. If Y is assumed to be lognormal, then the effect of the 10% inflation is on the \mu parameter as indicated above.

To see why the inflation on Y works as described, let’s look at the cumulative distribution function of T=cY.

    \displaystyle \begin{aligned} P(T \le y)&=P(Y \le \frac{y}{c})=F_Y(y/c) \\&=\int_0^{y/c} \frac{1}{t \ \sigma \sqrt{2 \pi}} \ \ \text{\Large e}^{\displaystyle -\frac{(\log(t)-\mu)^2}{2 \sigma^2}} \ dt  \end{aligned}

Taking derivative of the last item above, we obtain the probability density function f_T(y).

    \displaystyle \begin{aligned} f_T(y)&=\frac{1}{y/c \ \sigma \sqrt{2 \pi}} \ \ \text{\Large e}^{\displaystyle -\frac{(\log(y/c)-\mu)^2}{2 \sigma^2}} (1/c) \\&=\frac{1}{y \ \sigma \sqrt{2 \pi}} \ \ \text{\Large e}^{\displaystyle -\frac{\biggl(\log(y)-(\mu+\log(c))\biggr)^2}{2 \sigma^2}}  \end{aligned}

Comparing with the density function (8), the last line is the lognormal density function with parameters \mu + \log(c) and \sigma.

_______________________________________________________________________________________________

Distributional quantities involving higher moments

As the formula (10) shows, all moments exist for the lognormal distribution. As a result, any distributional quantity that is defined using moments can be described explicitly in terms of the parameters \mu and \sigma. We highlight three such distributional quantities: coefficient of variation, coefficient of skewness and kurtosis. The following shows their definitions. The calculation is done by plugging in the moments obtained from (10).

    \displaystyle CV=\frac{\sigma_Y}{\mu_Y} \ \ \ \ \ \text{(Coefficient of variation)}

    \displaystyle \begin{aligned} \gamma_1&=\frac{E[ (Y-\mu_Y)^3 ]}{\sigma_Y^3}  \\&=\frac{E(Y^3)-3 \mu_Y \sigma_Y^2-\mu_Y^3}{(\sigma_Y^2)^{\frac{3}{2}}}  \ \ \ \ \ \text{(Coefficient of skewness)}    \end{aligned}

    \displaystyle \begin{aligned} \beta_2&=\frac{E[ (Y-\mu_Y)^4 ]}{\biggl(E[ (Y-\mu_Y)^2]\biggr)^2}  \\&=\frac{E(Y^4)-4 \ \mu_Y \ E(Y^3) + 6 \ \mu_Y^2 \ E(Y^2) - 3 \ \mu_Y^4}{(\sigma_Y^2)^{2}}  \ \ \ \ \ \text{(Kurtosis)}    \end{aligned}

The above definitions are made for any random variable Y. The notations \mu_Y and \sigma_Y are the mean and standard deviation of Y, respectively. Coefficient of variation is the ratio the standard deviation to the mean. It is a standardized measure of dispersion of a probability distribution or frequency distribution. The coefficient of skewness is defined as the ratio of the third central moment about the mean to the cube of the standard deviation. The second line in the above definition is an equivalent form that is in terms of the mean, variance and the third raw moment, which may be easier to calculate in some circumstances. Kurtosis is defined to be the ratio of the fourth central moment about the mean to the square of the second central moment about the mean. The second line in the definition gives an equivalent form that is in terms of the mean, variance and the third and fourth raw moments.

The above general definitions of CV, \gamma_1 and \beta_2 can be obtained for the lognormal distribution. The mean and variance and higher raw moments can be obtained by using (10). Then it is a matter of plugging in the relevant items into the above definitions. The following example shows how this is done.

Example 3
Determine the CV, \gamma_1 and \beta_2 of the lognormal distribution in Example 2.

The calculation in Example 2 shows that the lognormal parameters are \mu=2 and \sigma=1. Now use formula (10) to get the ingredients.

    \mu_Y=e^{2.5}

    Var(Y)=(e-1) e^5 \ \ \ \ \ \ \ \ \sigma_Y=\sqrt{e-1} \ e^{2.5}

    E(Y^2)=e^6

    E(Y^3)=e^{10.5}

    E(Y^4)=e^{16}

Right away, CV = \sqrt{e-1}=1.31. The following shows the calculation for skewness and kurtosis.

    \displaystyle \begin{aligned} \gamma_1&=\frac{E(Y^3)-3 \mu_Y \sigma_Y^2-\mu_Y^3}{(\sigma_Y^2)^{\frac{3}{2}}}  \\&=\frac{e^{10.5}-3 \ e^{2.5} \ (e-1) \ e^5-(e^{2.5})^3}{[(e-1) \ e^5]^{\frac{3}{2}}}\\&=6.1849     \end{aligned}

    \displaystyle \begin{aligned} \beta_2&=\frac{E(Y^4)-4 \ \mu_Y \ E(Y^3) + 6 \ \mu_Y^2 \ E(Y^2) - 3 \ \mu_Y^4}{(\sigma_Y^2)^{2}}  \\&=\frac{e^{16}-4 \ e^{2.5} \ e^{10.5} + 6 \ (e^{2.5})^2 \ e^{6} - 3 \ (e^{2.5})^4}{[(e-1) \ e^5]^{2}}\\&=113.9364    \end{aligned}

_______________________________________________________________________________________________

Is there a moment generating function for the lognormal distribution?

Because the normal distribution has a moment generating function, all moments exist for the lognormal distribution (see formula (10) above). Does the moment generating function exist for the lognormal distribution? Whenever the mgf exists for a distribution, its moments can be derived from the mgf. What about the converse? That is, when all moments exist for a given distribution, does it mean that its moment generating function would always exist? The answer is no. It turns out that the lognormal distribution is a counterexample. We conclude this post by showing this fact.

Let Y be the standard lognormal distribution, i.e., X=\log(Y) has the standard normal distribution. We show that the expectation E(e^{tY}) converges to infinity when t>0.

    \displaystyle \begin{aligned} E(e^{t \ Y})&=E(e^{t \ e^X}) \\&=\frac{1}{\sqrt{2 \pi}} \ \int_{-\infty}^\infty e^{t \ e^x} \ e^{-0.5 x^2} \ dx \\&=\frac{1}{\sqrt{2 \pi}} \ \int_{-\infty}^\infty e^{t \ e^x - 0.5x^2} \ dx \\&> \frac{1}{\sqrt{2 \pi}} \ \int_{0}^\infty e^{t \ e^x - 0.5x^2} \ dx \\&> \frac{1}{\sqrt{2 \pi}} \ \int_{0}^\infty e^{\biggl(t \ ( \displaystyle 1+x+\frac{x^2}{2!}+\frac{x^3}{3!}) - 0.5x^2 \biggr)} \ dx=\infty \end{aligned}

The last integral in the above derivation converges to infinity. Note that the Taylor’s series expansion of e^x is \displaystyle 1+x+\frac{x^2}{2!}+\frac{x^3}{3!}+\frac{x^4}{4!}+\cdots. In the last step, e^x is replaced by \displaystyle 1+x+\frac{x^2}{2!}+\frac{x^3}{3!}. Then the exponent in the last integral is a third degree polynomial with a positive coefficient for the x^3 term. Thus this third degree polynomial converges to infinity as x goes to infinity. With the last integral goes to infinity, the mgf E(e^{t \ Y}) goes to infinity as well.

_______________________________________________________________________________________________

Practice problems

Practice problems to reinforce the calculation are found in here.

More practice problems are available: A summary of the lognormal basic properties, more lognormal practice problems.

_______________________________________________________________________________________________
\copyright 2017 – Dan Ma