# Pareto Distribution

The Pareto distribution is a power law probability distribution. It was named after the Italian civil engineer, economist and sociologist Vilfredo Pareto, who was the first to discover that income follows what is now called Pareto distribution, and who was also known for the 80/20 rule, according to which 20% of all the people receive 80% of all income. This post is a discussion on the mathematical properties of this distribution and its applications.

_______________________________________________________________________________________________

Pareto Distribution of Type I

There are several types of the Pareto distribution. Let’s start with Type I. The random variable $X$ is said to follow a Type I Pareto distribution if the following is the survival function,

$\displaystyle S(x)=P(X>x)=\left\{ \begin{array}{ll} \displaystyle \biggl( \frac{x_m}{x} \biggr)^\alpha &\ x \ge x_m \\ \text{ } & \text{ } \\ \displaystyle 1 &\ x

where $x_m$ and $\alpha$ are both positive parameters. The support of the distribution is the interval $[x_m,\infty)$. The parameter $x_m$ is a scale parameter and $\alpha$ is a shape parameter. The parameter $\alpha$ is also known as the tail index. When the Pareto distribution is used as a model of wealth or income, $\alpha$ is also known as the Pareto index, which is a measure of the breath of the wealth distribution.

The following table lists out the cumulative distribution function (CDF) and the probability density function (PDF).

__________________________________________________________________________________________
Pareto Type I – Probability Functions

Survival Function $S(x)=\displaystyle \biggl( \frac{x_m}{x} \biggr)^\alpha \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x \ge x_m>0$
Cumulative Distribution Function $F(x)=1-\displaystyle \biggl( \frac{x_m}{x} \biggr)^\alpha \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x \ge x_m>0$
Probability Density Function $\displaystyle f(x)=\frac{\alpha \ x_m^\alpha}{x^{\alpha+1}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x \ge x_m>0$

__________________________________________________________________________________________

The following figure shows the graphs of the PDFs for the shape parameters $\alpha=1,2,3$.

Figure 1 – Pareto PDFs (Type I)

All the density curves in Figure 1 are skewed to the right and have a long tail. However, some tails are thicker than the others. It is noticeable that the curve with a higher value of $\alpha$ approaches the x-axis faster, hence has a lighter tail comparing to the density curve with a lower value of $\alpha$. The role of $\alpha$ is discussed further below. The following table lists out several more Pareto distributional quantities.

__________________________________________________________________________________________
Pareto Type I – Additional Distributional Quantities

Mean $\displaystyle E(X)=\frac{\alpha \ x_m}{\alpha-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>1$
Median $\displaystyle x_m \ 2^{\frac{\alpha}{2}}$
Mode $x_m$
Variance $\displaystyle Var(X)=\frac{x_m^2 \ \alpha}{(\alpha-1)^2 \ (\alpha-2)} \ \ \ \ \ \ \ \alpha>2$
Higher Moments $\displaystyle E(X^k)=\frac{\alpha \ x_m^k}{\alpha-k} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>k$
Skewness $\displaystyle \frac{2(1+\alpha)}{\alpha-3} \ \sqrt{\frac{\alpha-2}{\alpha}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>3$
Excess Kurtosis $\displaystyle \frac{6(\alpha^3+\alpha^2-6 \alpha-2)}{\alpha (\alpha-3) (\alpha-4)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>4$

__________________________________________________________________________________________

Given the survival function, it is straightforward to derive the CDF and the PDF. The mean and higher moments can also be derived by evaluating the integral $\int_{x_m}^\infty x^k f(x) dx$. Once the moments are obtained, other quantities that depend on moments can be derived (e.g. variance, skewness and excess kurtosis). The following gives the definition for these distributional quantities.

Distributional Quantity Definition
Moments $\displaystyle E(X^k)=\int_{x_m}^\infty x^k f(x) \ dx$
Variance $Var(X)=E(X^2)-E(X)^2$
Skewness $\displaystyle \gamma_1=\frac{E[(X-\mu)^3]}{\sigma^3}=\frac{E(X^3)-3 \mu \sigma^2 - \mu^3}{(\sigma^2)^{\frac{3}{2}}}$
Kurtosis $\displaystyle \frac{E[(X-\mu)^4]}{\sigma^4}$
Excess Kurtosis Kurtosis – 3

In these definitions, $\mu$ and $\sigma$ are the mean and standard deviation of a given distribution, respectively. Then $\sigma^2$ is the variance. The skewness of a distribution is the ratio of the third central moment $E[(X-\mu)^3]$ to the cube of the standard deviation. The kurtosis is the ratio of the fourth central moment $E[(X-\mu)^4]$ to the square of the variance. This previous post has a detailed discussion on the skewness.

_______________________________________________________________________________________________

A Closer Look at the Shape Parameter

The above tables show that the Pareto distribution is mathematically tractable, especially when it comes to the calculation of moments. Another observation is that the mean and other moments do not always exist. This stems from the fact that when the shape parameter $\alpha$ is too small, the integral for the moment $E(X^k)$ may not converge.

The mean exists only when the shape parameter $\alpha$ is greater than 1. The variance exists only when the shape parameter $\alpha$ is greater than 2. In general, the $k$th moment exists only when the shape parameter $\alpha$ is greater than $k$. The larger the shape parameter $\alpha$, the more moments that can be calculated. All $k$th moments where $k<\alpha$ can be calculated. However, the $k$th moment for any $k>\alpha$ cannot be calculated.

Having moments that cannot exist is a sign that the distribution has a heavy tail. Let’s examine the graphs of Pareto survival functions and CDFs.

Figure 2 – Pareto Survival Functions (Type I)

Figure 2 shows the survival function $S(x)=P(X>x)$ for three values of the shape parameter $\alpha$ where $x>1$ (the scale parameter is 1). The following figure shows the corresponding cumulative distributions $F(x)=1-S(x)=P(X \le x)$.

Figure 3 – Pareto CDFs (Type I)

The survival function $S(x)$ is the probability of the right tail $(x,\infty)$. On the other hand, the CDF $F(x)$ is the probability put on the initial interval $(x_m,x]$. The sum of the two is obviously 1.0. One thing that stands out in Figure 2 is that the larger the $\alpha$, the faster the survival curve approaches zero and thus less probabilities are put on the right tail. In other words, more probabilities are attached to the lower values and thus the integral for the moments is more likely to converge when $\alpha$ is larger. This explains that it is possible for more of the moments to exist for a Pareto distribution with a larger $\alpha$. Thus $k$th moments exists for the lower $k$ when $\alpha$ is larger, confirming the earlier observation.

Another thing to point out in Figure 2 is that the distribution with the larger $\alpha$ has a lighter right tail and the one with a smaller $\alpha$ has a heavier right tail. So within the Pareto family, a lower $\alpha$ means a distribution with a heavier tail and a larger $\alpha$ means a lighter tail.

A comparison with other families of distributions is also instructive. All moments exist for the gamma distributions (including exponential distributions) and for the lognormal distribution as well as the normal distribution. Moment generating functions also exist for all these distributions. In contrast, the moment generating function does not exist for Pareto distributions (otherwise all moments would exist). These are signs that the Pareto distributions are heavy tailed distributions. For a more in depth discussion of the tail weight of the Pareto family, see this blog post in an affiliated blog. The Pareto distribution discussed there is of Pareto Type II.

When the Pareto model is used as a model of lifetime of systems (machines or devices), a larger value of the shape parameter $\alpha$ would mean that less “lives” surviving to old ages, equivalently more lives die off in relatively young ages (as discussed above this means a lighter right tail). If the Pareto model is used as a model of income or wealth of individuals, then a higher $\alpha$ would mean a smaller proportion of the people are in the higher income brackets (or more people in the lower income ranges). Thus the shape parameter $\alpha$ is called the Pareto index, which is a measure of the breath of income/wealth. The higher this measure, the less inequality in income.

_______________________________________________________________________________________________

Log-Linear Model

We now discuss the motivation behind the Pareto survival function. The Pareto distribution is a power law distribution. It is a model that can describe phenomena that behave in a log-linear fashion. Let’s revisit the original reasoning for using the Pareto survival function as a model of income.

Let $N(x)$ be the number of people with income greater than $x$. Suppose that $x_0$ be the minimum income in the population in question. Then $N(x_0)$ be the size of the entire population. Pareto proposed that $N(x)$ can be modeled in a log-linear fashion:

$\log N(x)=\log C - \alpha \log x$

where log is logarithm to the base $e$, $C$ is a constant and $\alpha$ is a positive parameter. If this relation holds, it would hold at the minimum income level $x_0$.

$\log N(x_0)=\log C - \alpha \log x_0$

Letting the first relation subtract the second gives the following:

$\displaystyle \log \biggl[ \frac{N(x)}{N(x_0)} \biggr]=- \alpha \log \biggl[ \frac{x}{x_0} \biggr]$

Raising the natural log constant $e$ to each side gives the following:

$\displaystyle \frac{N(x)}{N(x_0)}=\biggl( \frac{x}{x_0} \biggr)^{- \alpha}$

$\displaystyle \frac{N(x)}{N(x_0)}=\biggl( \frac{x_0}{x} \biggr)^{\alpha}$

Note that the left hand side of the last equation is the proportion of the people having income greater than $x$, which of course is the survival function described at the beginning.

_______________________________________________________________________________________________

The Hierarchy of Pareto Distribution

The Pareto survival function discussed above is of Type I. The following lists out the survival functions of the other types.

__________________________________________________________________________________________
Pareto Distributions

Pareto Type Survival Function Support Parameters
Type I $\displaystyle \biggl[ \frac{x_m}{x} \biggr]^\alpha$ $x>x_m$ $\alpha>0$, $x_m>0$
Lomax $\displaystyle \biggl[ \frac{x_m}{x+x_m} \biggr]^\alpha$ $x>0$ $\alpha>0$, $x_m>0$
Type II $\displaystyle \biggl[\frac{x_m}{(x-\mu)+x_m} \biggr]^\alpha$ $x>\mu$ $\mu$, $\alpha>0$, $x_m>0$
Type III $\displaystyle \biggl[\frac{(x_m)^{\frac{1}{\gamma}} }{(x-\mu)^{\frac{1}{\gamma}}+(x_m)^{\frac{1}{\gamma}}} \biggr]$ $x>\mu$ $\mu$, $\gamma>0$, $x_m>0$
Type IV $\displaystyle \biggl[\frac{(x_m)^{\frac{1}{\gamma}} }{(x-\mu)^{\frac{1}{\gamma}}+(x_m)^{\frac{1}{\gamma}}} \biggr]^\alpha$ $x>\mu$ $\mu$, $\alpha>0$, $\gamma>0$, $x_m>0$

__________________________________________________________________________________________

The lower types are special cases of the higher types. For example, Lomax is Type I shifted to the left by the amount $x_m$. Type II with $\mu=0$ becomes Lomax. Type III with $\gamma=1$ becomes Type II with $\alpha=1$. Type IV with $\alpha=1$ becomes Type III.

We discuss Type II Lomax in the next section. For the other types, see the Pareto Wikipedia entry.

_______________________________________________________________________________________________

Pareto Type II Lomax

The Pareto distribution of Lomax type is the result of shifting Type I to the left by the amount $x_m$, the scale parameter in Pareto Type I. As a result, the support is now the entire positive x-axis. Some of the mathematical properties of the Lomax Type can be derived by making the appropriate shifting. For the sake of completeness, the following table lists out some of the basic distributional quantities. The scale parameter $x_m$ is renamed $\theta$.

__________________________________________________________________________________________
Pareto Type I Lomax – Distributional Quantities

Survival Function $S(x)=\displaystyle \biggl( \frac{\theta}{x+\theta} \biggr)^\alpha \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x >0$
Cumulative Distribution Function $F(x)=1-\displaystyle \biggl( \frac{\theta}{x+\theta} \biggr)^\alpha \ \ \ \ \ \ \ \ \ \ \ \ \ x >0$
Probability Density Function $\displaystyle f(x)=\frac{\alpha \ \theta^\alpha}{(x+\theta)^{\alpha+1}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x >0$
Mean $\displaystyle E(X)=\frac{\theta}{\alpha-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \alpha>1$
Median $\displaystyle \theta \ 2^{\frac{\alpha}{2}}-\theta$
Mode 0
Variance $\displaystyle Var(X)=\frac{\theta^2 \ \alpha}{(\alpha-1)^2 \ (\alpha-2)} \ \ \ \ \ \ \ \alpha>2$
Higher Moments $\displaystyle E(X^k)=\frac{k! \ \theta^k}{(\alpha-1) \cdots (\alpha-k)} \ \ \ \ \ \ \alpha>k \ \ \ k$ is integer

To help see the shifting, let $Y$ be a Pareto Type I random variable with shape parameter $\alpha$ and scale parameter $\theta$. Then $X=Y-\theta$ is a Pareto Type II Lomax random variable. Immediately, $E(X)=E(Y)-\theta$, which simplified to $\frac{\theta}{\alpha-1}$. On the other hand, shifting by a constant does not change the variance. If $S_Y(x)$ and $S_X(x)$ represent the survival functions for $Y$ and $X$, respectively, then $S_X(x)=S_Y(x+\theta)$. The same can be said about the CDFs and PDFs.

Another interesting fact about Pareto Lomax type is that it is the mixture of exponential distributions with gamma mixing weight. An insurance interpretation is a good motivation. Suppose that the loss arising from an insured randomly selected from a large group of insureds follow an exponential distribution with the following probability density function:

$f_{X \lvert \Lambda}(x \lvert \lambda)= \lambda \ e^{-\lambda x} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ x>0$

The above density function is from an exponential distribution. However it is conditional one since the parameter $\Lambda=\lambda$ is uncertain. Since the density function $f_{X \lvert \Lambda}(x \lvert \lambda)$ is that of an exponential distribution, the mean claim cost for this insured is $\frac{1}{\lambda}$. So the parameter $\Lambda=\lambda$ reflects the risk characteristics of the insured. Suppose this is a large pool of insureds. Then there is uncertainty in the parameter $\Lambda=\lambda$. It is more appropriate to regard $\Lambda$ as a random variable in order to capture the wide range of risk characteristics across the individuals in the population. As a result, the pdf indicated above is not an unconditional pdf, but, rather, a conditional pdf of $X$. Suppose that the uncertain parameter $\Lambda$ follows a gamma distribution with shape parameter $\alpha$ and scale parameter $\theta$ with the following PDF.

$\displaystyle g(\lambda)=\frac{\theta^\alpha}{\Gamma(\alpha)} \ \lambda^{\alpha-1} \ e^{-\theta \lambda} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \lambda>0$

where $\Gamma(\cdot)$ is the gamma function. The gamma distribution has been written extensively in this blog. Here is a post on the gamma function ans here is an introduction on the gamma distribution.

The unconditional density function of $X$ is then the weighted average of the conditional density $f_{X \lvert \Lambda}(x \lvert \lambda)$ weighted by the above gamma density function.

\displaystyle \begin{aligned} f_X(x)&=\int_0^\infty f_{X \lvert \Lambda}(x \lvert \lambda) \ g(\lambda) \ d \lambda \\&=\int_0^\infty \biggl( \lambda \ e^{-\lambda x} \biggr) \ \biggl( \frac{\theta^\alpha}{\Gamma(\alpha)} \ \lambda^{\alpha-1} \ e^{-\theta \lambda} \biggr) \ d \lambda \\&=\int_0^\infty \frac{\theta^\alpha}{\Gamma(\alpha)} \lambda^\alpha e^{-(\theta+x) \lambda} \ d \lambda \\&=\frac{\theta^\alpha}{\Gamma(\alpha)} \frac{\Gamma(\alpha+1)}{(\theta+x)^{\alpha+1}} \int_0^\infty \frac{(\theta+x)^{\alpha+1}}{\Gamma(\alpha+1)} \ \lambda^{\alpha+1-1} \ e^{-(\theta+x) \lambda} \ d \lambda \\&=\frac{\alpha \theta^{\alpha}}{(\theta+x)^{\alpha+1}} \end{aligned}

The above derivation shows that the unconditional density function of $X$ is a Pareto Lomax density function. Thus if each individual insured in a large pool of insureds has an exponential claim cost distribution where the rate parameter $\Lambda$ is distributed according to a gamma distribution, then the unconditional claim cost for a randomly selected insured is distributed according to a Pareto Lomax distribution. Mathematically speaking, the Pareto Lomax distribution is a mixture of exponential distributions with gamma mixing weights.

In the above discussion, we comment that Pareto Type I distribution has a heavy tail as compared to other distribution. One of the tell tale signs is that not all moments exist in a Pareto distribution. Th Pareto Lomax distribution is also a heavy tailed distribution. This blog post in an affiliated blog has a detailed discussion. The discussion in that blog post examines Pareto Lomax as a heavy tailed distribution in four perspectives: existence of moments, speed of decay of the survival function to zero, hazard rate function, and mean excess loss function. Another blog post discusses the Pareto Lomax distribution as a mixture of exponential distributions with gamma mixing weights.

_______________________________________________________________________________________________

Remarks

The Pareto distribution is positively skewed and has a heavy tail on the right. It is an excellent model for extreme phenomena, e.g. the long tail contains 80% or more of the probabilities. It is originally applied as a model to describe income and wealth of a country. In insurance applications, heavy-tailed distributions such as Pareto are essential tools for modeling extreme loss, especially for the more risky types of insurance such as medical malpractice insurance. In financial applications, the study of heavy-tailed distributions provides information about the potential for financial fiasco or financial ruin.

For more information on the mathematical aspects of the Pareto distribution, refer to the text by Johnson and Kotz. For an actuarial perspective, refer to the text Loss Models.

_______________________________________________________________________________________________

Reference

1. Johnson N. L., Kotz S., Continuous Univariate Distributions – I, Hougton Mifflin Company, Boston, 1970
2. Klugman S.A., Panjer H. H., Wilmot G. E., Loss Models, From Data to Decisions, Fourth Edition, Wiley-Interscience, a John Wiley & Sons, Inc., New York, 2012.

_______________________________________________________________________________________________
$\copyright$ 2017 – Dan Ma