The Pareto distribution is a power law probability distribution. It was named after the Italian civil engineer, economist and sociologist Vilfredo Pareto, who was the first to discover that income follows what is now called Pareto distribution, and who was also known for the 80/20 rule, according to which 20% of all the people receive 80% of all income. This post is a discussion on the mathematical properties of this distribution and its applications.
Pareto Distribution of Type I
There are several types of the Pareto distribution. Let’s start with Type I. The random variable is said to follow a Type I Pareto distribution if the following is the survival function,
where and are both positive parameters. The support of the distribution is the interval . The parameter is a scale parameter and is a shape parameter. The parameter is also known as the tail index. When the Pareto distribution is used as a model of wealth or income, is also known as the Pareto index, which is a measure of the breath of the wealth distribution.
The following table lists out the cumulative distribution function (CDF) and the probability density function (PDF).
Pareto Type I – Probability Functions
|Cumulative Distribution Function|
|Probability Density Function|
The following figure shows the graphs of the PDFs for the shape parameters .
All the density curves in Figure 1 are skewed to the right and have a long tail. However, some tails are thicker than the others. It is noticeable that the curve with a higher value of approaches the x-axis faster, hence has a lighter tail comparing to the density curve with a lower value of . The role of is discussed further below. The following table lists out several more Pareto distributional quantities.
Pareto Type I – Additional Distributional Quantities
Given the survival function, it is straightforward to derive the CDF and the PDF. The mean and higher moments can also be derived by evaluating the integral . Once the moments are obtained, other quantities that depend on moments can be derived (e.g. variance, skewness and excess kurtosis). The following gives the definition for these distributional quantities.
|Excess Kurtosis||Kurtosis – 3|
In these definitions, and are the mean and standard deviation of a given distribution, respectively. Then is the variance. The skewness of a distribution is the ratio of the third central moment to the cube of the standard deviation. The kurtosis is the ratio of the fourth central moment to the square of the variance. This previous post has a detailed discussion on the skewness.
A Closer Look at the Shape Parameter
The above tables show that the Pareto distribution is mathematically tractable, especially when it comes to the calculation of moments. Another observation is that the mean and other moments do not always exist. This stems from the fact that when the shape parameter is too small, the integral for the moment may not converge.
The mean exists only when the shape parameter is greater than 1. The variance exists only when the shape parameter is greater than 2. In general, the th moment exists only when the shape parameter is greater than . The larger the shape parameter , the more moments that can be calculated. All th moments where can be calculated. However, the th moment for any cannot be calculated.
Having moments that cannot exist is a sign that the distribution has a heavy tail. Let’s examine the graphs of Pareto survival functions and CDFs.
Figure 2 shows the survival function for three values of the shape parameter where (the scale parameter is 1). The following figure shows the corresponding cumulative distributions .
The survival function is the probability of the right tail . On the other hand, the CDF is the probability put on the initial interval . The sum of the two is obviously 1.0. One thing that stands out in Figure 2 is that the larger the , the faster the survival curve approaches zero and thus less probabilities are put on the right tail. In other words, more probabilities are attached to the lower values and thus the integral for the moments is more likely to converge when is larger. This explains that it is possible for more of the moments to exist for a Pareto distribution with a larger . Thus th moments exists for the lower when is larger, confirming the earlier observation.
Another thing to point out in Figure 2 is that the distribution with the larger has a lighter right tail and the one with a smaller has a heavier right tail. So within the Pareto family, a lower means a distribution with a heavier tail and a larger means a lighter tail.
A comparison with other families of distributions is also instructive. All moments exist for the gamma distributions (including exponential distributions) and for the lognormal distribution as well as the normal distribution. Moment generating functions also exist for all these distributions. In contrast, the moment generating function does not exist for Pareto distributions (otherwise all moments would exist). These are signs that the Pareto distributions are heavy tailed distributions. For a more in depth discussion of the tail weight of the Pareto family, see this blog post in an affiliated blog. The Pareto distribution discussed there is of Pareto Type II.
When the Pareto model is used as a model of lifetime of systems (machines or devices), a larger value of the shape parameter would mean that less “lives” surviving to old ages, equivalently more lives die off in relatively young ages (as discussed above this means a lighter right tail). If the Pareto model is used as a model of income or wealth of individuals, then a higher would mean a smaller proportion of the people are in the higher income brackets (or more people in the lower income ranges). Thus the shape parameter is called the Pareto index, which is a measure of the breath of income/wealth. The higher this measure, the less inequality in income.
We now discuss the motivation behind the Pareto survival function. The Pareto distribution is a power law distribution. It is a model that can describe phenomena that behave in a log-linear fashion. Let’s revisit the original reasoning for using the Pareto survival function as a model of income.
Let be the number of people with income greater than . Suppose that be the minimum income in the population in question. Then be the size of the entire population. Pareto proposed that can be modeled in a log-linear fashion:
where log is logarithm to the base , is a constant and is a positive parameter. If this relation holds, it would hold at the minimum income level .
Letting the first relation subtract the second gives the following:
Raising the natural log constant to each side gives the following:
Note that the left hand side of the last equation is the proportion of the people having income greater than , which of course is the survival function described at the beginning.
The Hierarchy of Pareto Distribution
The Pareto survival function discussed above is of Type I. The following lists out the survival functions of the other types.
|Pareto Type||Survival Function||Support||Parameters|
|Type II||, ,|
|Type III||, ,|
|Type IV||, , ,|
The lower types are special cases of the higher types. For example, Lomax is Type I shifted to the left by the amount . Type II with becomes Lomax. Type III with becomes Type II with . Type IV with becomes Type III.
We discuss Type II Lomax in the next section. For the other types, see the Pareto Wikipedia entry.
Pareto Type II Lomax
The Pareto distribution of Lomax type is the result of shifting Type I to the left by the amount , the scale parameter in Pareto Type I. As a result, the support is now the entire positive x-axis. Some of the mathematical properties of the Lomax Type can be derived by making the appropriate shifting. For the sake of completeness, the following table lists out some of the basic distributional quantities. The scale parameter is renamed .
Pareto Type I Lomax – Distributional Quantities
|Cumulative Distribution Function|
|Probability Density Function|
|Higher Moments||is integer|
To help see the shifting, let be a Pareto Type I random variable with shape parameter and scale parameter . Then is a Pareto Type II Lomax random variable. Immediately, , which simplified to . On the other hand, shifting by a constant does not change the variance. If and represent the survival functions for and , respectively, then . The same can be said about the CDFs and PDFs.
Another interesting fact about Pareto Lomax type is that it is the mixture of exponential distributions with gamma mixing weight. An insurance interpretation is a good motivation. Suppose that the loss arising from an insured randomly selected from a large group of insureds follow an exponential distribution with the following probability density function:
The above density function is from an exponential distribution. However it is conditional one since the parameter is uncertain. Since the density function is that of an exponential distribution, the mean claim cost for this insured is . So the parameter reflects the risk characteristics of the insured. Suppose this is a large pool of insureds. Then there is uncertainty in the parameter . It is more appropriate to regard as a random variable in order to capture the wide range of risk characteristics across the individuals in the population. As a result, the pdf indicated above is not an unconditional pdf, but, rather, a conditional pdf of . Suppose that the uncertain parameter follows a gamma distribution with shape parameter and scale parameter with the following PDF.
The unconditional density function of is then the weighted average of the conditional density weighted by the above gamma density function.
The above derivation shows that the unconditional density function of is a Pareto Lomax density function. Thus if each individual insured in a large pool of insureds has an exponential claim cost distribution where the rate parameter is distributed according to a gamma distribution, then the unconditional claim cost for a randomly selected insured is distributed according to a Pareto Lomax distribution. Mathematically speaking, the Pareto Lomax distribution is a mixture of exponential distributions with gamma mixing weights.
In the above discussion, we comment that Pareto Type I distribution has a heavy tail as compared to other distribution. One of the tell tale signs is that not all moments exist in a Pareto distribution. Th Pareto Lomax distribution is also a heavy tailed distribution. This blog post in an affiliated blog has a detailed discussion. The discussion in that blog post examines Pareto Lomax as a heavy tailed distribution in four perspectives: existence of moments, speed of decay of the survival function to zero, hazard rate function, and mean excess loss function. Another blog post discusses the Pareto Lomax distribution as a mixture of exponential distributions with gamma mixing weights.
The Pareto distribution is positively skewed and has a heavy tail on the right. It is an excellent model for extreme phenomena, e.g. the long tail contains 80% or more of the probabilities. It is originally applied as a model to describe income and wealth of a country. In insurance applications, heavy-tailed distributions such as Pareto are essential tools for modeling extreme loss, especially for the more risky types of insurance such as medical malpractice insurance. In financial applications, the study of heavy-tailed distributions provides information about the potential for financial fiasco or financial ruin.
For more information on the mathematical aspects of the Pareto distribution, refer to the text by Johnson and Kotz. For an actuarial perspective, refer to the text Loss Models.
- Johnson N. L., Kotz S., Continuous Univariate Distributions – I, Hougton Mifflin Company, Boston, 1970
- Klugman S.A., Panjer H. H., Wilmot G. E., Loss Models, From Data to Decisions, Fourth Edition, Wiley-Interscience, a John Wiley & Sons, Inc., New York, 2012.
2017 – Dan Ma