This post is an introduction on the standard beta distribution, i.e. the one whose support is the unit interval and is mathematically defined from the beta function. There are many generalizations of the standard beta distribution. The most straightforward one is to resize the standard beta so that the support is an interval other than , which is discussed at the end of this post. There are many forms of generalized beta distribution. One subsequent post will define one such version.
The beta distribution and its many generalizations are important in economic modeling (e.g. modeling of income distribution). It can also be applied in actuarial modeling, e.g. for modeling insurances losses. In Bayesian analysis, the beta distribution can be used as a conjugate prior for the binomial model. The beta distribution has an interesting and important connection with order statistics and non-parametric inference, which is the subject of the next post.
The Beta Density Function
Let and be positive real numbers. The previous post establishes the following fact about the beta function.
As a result, the following is a density function.
A random variable is said to follow the beta distribution with parameters and if its pdf is . In particular, if and are positive integers, then the beta density function is:
It is instructive to look at the role of the parameters and in the beta distribution. It turns out that both parameters and play a role in determining the shape of the distribution. If and are less than 1, the beta density is a U shape where the density curve goes up to infinity as and (see the U shaped red curve in Figure 1).
If , then the beta density curves are symmetric about the vertical line at . When , the value of the density curve at is identical to the value at (see Figure 1). Note that when , the beta distribution is the uniform distribution on .
If and are greater than 1 and if , then the beta density curve is skewed to the right (positively skewed). This means that a short portion of the left side of gets assigned more probabilities and the side to the right gets assigned less probabilities. More specifically, when , the random quantity that is described by the beta distribution is concentrated more on the lower end of the interval (see Figure 2). The greater the as compared to , the more pronounced the skewness.
If and are greater than 1 and if , the shape is the opposite (see Figure 3). The skewness is to the left (negatively skewed). Most of the probabilities are concentrated on a relatively short segment on the right side of . Then the longer side to the left is assigned much less probabilities. Hence the distribution has a longer left tail.
Distributional Quantities Based on Moments
As the above figures of density curves show, both parameters and drive the shape of the beta distribution. Several of the distributional quantities are also easily derived from and . For example, the moments of the beta distribution are easily derived. As a result, many distributional quantities that are based on moments can be calculated in a straightforward fashion, e.g. coefficient of variation, skewness, kurtosis. These in turn can shed more light on the beta distribution.
The trick is to make the integrand the density of a beta distribution (in this case beta parameters and ). Then the contents that can be taken out of the integral would be the answer. To get the final expression of , we rely on a fact of the gamma function, which is that . Thus the first four moments of the beta distribution are:
Here are some calculations that can be made using the first four moments established above.
Note the skewness coefficient agrees with the diagrams shown above. When the parameters and are equal, there is no skewness. When is greater than , the skewness is positive (skewed to the right). The skewness is more pronounced when the parameter is substantially greater than the parameter . Similarly, when is smaller than , the skewness is negative.
Another interesting observation is about the variance. If one of the parameters and is fixed (it does not matter which one) while the other goes to infinity, the variance will go to zero. To see this, let be fixed. In the expression for , divide the numerator and denominator by and obtain:
The quantity in the denominator approaches 1 and the quantity approaches infinity as . This means that the spread is driven by the parameters and too. The spread is narrower and narrower as one of the parameters becomes larger and larger. This phenomenon is also reflected in the above diagrams of beta density curves.
Other Distributional Quantities
The following is the cumulative distribution function of the beta distribution.
The CDF can also be expressed as where is the incomplete beta function defined by:
The CDF as exhibited in has no closed form. As a result, the values of the CDF will have to be obtained thru numerical approximation or using software. Likewise, percentiles of the beta distribution can only be approximated. For the distributional quantities not discussed here (e.g. moment generating function), see the Wikepedia entry on the beta function.
Extending the Support
The standard beta distribution takes on values in the unit interval . If the random quantity to be modeled can extend beyond the unit interval, the beta distribution can also be transformed to match the situation. Let be the standard beta distribution with parameters and . Let be a positive number. Let . The support of is then the interval . As a constant multiple of the standard beta , the random variable would have similar shapes and properties. The following are the density function and CDF.
Note that expresses the CDF of using the incomplete beta function defined in . All the distributional quantities of discussed above can be obtained for by applying the appropriate multiplier. For example, . As a result, a similar table can be obtained for the transformed beta distribution.