Introducing the beta distribution

This post is an introduction on the standard beta distribution, i.e. the one whose support is the unit interval $(0, 1)$ and is mathematically defined from the beta function. There are many generalizations of the standard beta distribution. The most straightforward one is to resize the standard beta so that the support is an interval other than $(0, 1)$, which is discussed at the end of this post. There are many forms of generalized beta distribution. One subsequent post will define one such version.

The beta distribution and its many generalizations are important in economic modeling (e.g. modeling of income distribution). It can also be applied in actuarial modeling, e.g. for modeling insurances losses. In Bayesian analysis, the beta distribution can be used as a conjugate prior for the binomial model. The beta distribution has an interesting and important connection with order statistics and non-parametric inference, which is the subject of the next post.

_______________________________________________________________________________________________

The Beta Density Function

Let $a$ and $b$ be positive real numbers. The previous post establishes the following fact about the beta function.

$\displaystyle B(a,b)=\int_0^1 t^{a-1} \ (1-t)^{b-1} \ dt=\frac{\Gamma(a) \ \Gamma(b)}{\Gamma(a+b)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

As a result, the following is a density function.

$\displaystyle f(x)=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ x^{a-1} \ (1-x)^{b-1}; \ \ \ \ \ \ \ \ 0

A random variable $X$ is said to follow the beta distribution with parameters $a$ and $b$ if its pdf is $(1)$. In particular, if $a$ and $b$ are positive integers, then the beta density function is:

$\displaystyle f(x)=\frac{(a+b-1)!}{(a-1)! \ (b-1)!} \ x^{a-1} \ (1-x)^{b-1}; \ \ \ \ \ \ \ \ 0

It is instructive to look at the role of the parameters $a$ and $b$ in the beta distribution. It turns out that both parameters $a$ and $b$ play a role in determining the shape of the distribution. If $a$ and $b$ are less than 1, the beta density is a U shape where the density curve goes up to infinity as $x \rightarrow 0$ and $x \rightarrow 1$ (see the U shaped red curve in Figure 1).

If $a=b$, then the beta density curves are symmetric about the vertical line at $x=0.5$. When $a=b$, the value of the density curve at $x$ is identical to the value at $1-x$ (see Figure 1). Note that when $a=b=1$, the beta distribution is the uniform distribution on $(0,1)$.

Figure 1

If $a$ and $b$ are greater than 1 and if $a, then the beta density curve is skewed to the right (positively skewed). This means that a short portion of the left side of $(0,1)$ gets assigned more probabilities and the side to the right gets assigned less probabilities. More specifically, when $a, the random quantity that is described by the beta distribution is concentrated more on the lower end of the interval $(0, 1)$ (see Figure 2). The greater the $b$ as compared to $a$, the more pronounced the skewness.

Figure 2

If $a$ and $b$ are greater than 1 and if $a>b$, the shape is the opposite (see Figure 3). The skewness is to the left (negatively skewed). Most of the probabilities are concentrated on a relatively short segment on the right side of $(0, 1)$. Then the longer side to the left is assigned much less probabilities. Hence the distribution has a longer left tail.

Figure 3

_______________________________________________________________________________________________

Distributional Quantities Based on Moments

As the above figures of density curves show, both parameters $a$ and $b$ drive the shape of the beta distribution. Several of the distributional quantities are also easily derived from $a$ and $b$. For example, the moments of the beta distribution are easily derived. As a result, many distributional quantities that are based on moments can be calculated in a straightforward fashion, e.g. coefficient of variation, skewness, kurtosis. These in turn can shed more light on the beta distribution.

\displaystyle \begin{aligned} E(X^k)&=\int_0^1 x^k \ \frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ x^{a-1} \ (1-x)^{b-1} \ dx \\&=\int_0^1 \frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ x^{a+k-1} \ (1-x)^{b-1} \ dx \\&=\frac{\Gamma(a+k)}{\Gamma(a)} \frac{\Gamma(a+b)}{\Gamma(a+b+k)} \ \int_0^1 \frac{\Gamma(a+b+k)}{\Gamma(a+k) \ \Gamma(b)} \ x^{a+k-1} \ (1-x)^{b-1} \ dx\\&=\frac{\Gamma(a+k)}{\Gamma(a)} \frac{\Gamma(a+b)}{\Gamma(a+b+k)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ k \text{ is a real number with } k>-a \\&=\frac{a(a+1) \cdots (a+k-1)}{(a+b)(a+b+1) \cdots (a+b+k-1)} \ \ \ \ \ \ \ \ \ k \text{ is a positive integer} \ \ \ \ \ \ \ \ \ \ \ \ (4) \end{aligned}

The trick is to make the integrand the density of a beta distribution (in this case beta parameters $a+k$ and $b$). Then the contents that can be taken out of the integral would be the answer. To get the final expression of $E(X^k)$, we rely on a fact of the gamma function, which is that $\Gamma(\alpha+k)=\Gamma(\alpha) \ \alpha (\alpha+1) \cdots (\alpha+k-1)$. Thus the first four moments of the beta distribution are:

$\displaystyle E(X)=\frac{a}{a+b}$

$\displaystyle E(X^2)=\frac{a(a+1)}{(a+b)(a+b+1)}$

$\displaystyle E(X^3)=\frac{a(a+1)(a+2)}{(a+b)(a+b+1)(a+b+2)}$

$\displaystyle E(X^4)=\frac{a(a+1)(a+2)(a+3)}{(a+b)(a+b+1)(a+b+2)(a+b+3)}$

Here are some calculations that can be made using the first four moments established above.

$\displaystyle \begin{array}{lllll} \text{ } &\text{ } & \text{Definition} & \text{ } & \text{Standard Beta Distribution} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Mean} &\text{ } & \text{ } & \text{ } & \displaystyle \frac{a}{a+b} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Variance} &\text{ } & E(X^2)-E(X)^2 & \text{ } & \displaystyle \frac{ab}{(a+b)^2 \ (a+b+1)} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Coefficient of Variation} &\text{ } & \displaystyle \frac{\sqrt{Var(X)}}{E(X)}=\frac{\sigma}{\mu} & \text{ } & \displaystyle \sqrt{\frac{b}{a(a+b+1)}} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Skewness} &\text{ } & \displaystyle E\biggl[\biggl(\frac{X-\mu}{\sigma}\biggr)^3\biggr] & \text{ } & \displaystyle \frac{2(b-a) \sqrt{a+b+1}}{(a+b+2) \sqrt{ab}} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Kurtosis} &\text{ } & \displaystyle E\biggl[\biggl(\frac{X-\mu}{\sigma}\biggr)^4\biggr] & \text{ } & \displaystyle \frac{6[(a-b)^2 (a+b+1)-ab(a+b+2)]}{ab(a+b+2)(a+b+3)} +3 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Excess Kurtosis} &\text{ } & \displaystyle E\biggl[\biggl(\frac{X-\mu}{\sigma}\biggr)^4\biggr]-3 & \text{ } & \displaystyle \frac{6[(a-b)^2 (a+b+1)-ab(a+b+2)]}{ab(a+b+2)(a+b+3)} \end{array}$

Note the skewness coefficient agrees with the diagrams shown above. When the parameters $a$ and $b$ are equal, there is no skewness. When $b$ is greater than $a$, the skewness is positive (skewed to the right). The skewness is more pronounced when the parameter $b$ is substantially greater than the parameter $a$. Similarly, when $b$ is smaller than $a$, the skewness is negative.

Another interesting observation is about the variance. If one of the parameters $a$ and $b$ is fixed (it does not matter which one) while the other goes to infinity, the variance will go to zero. To see this, let $a$ be fixed. In the expression for $Var(X)$, divide the numerator and denominator by $b$ and obtain:

$\displaystyle Var(X)=\frac{a}{(a+b)^2 \ (\frac{a}{b}+1+\frac{1}{b})}$

The quantity $\frac{a}{b}+1+\frac{1}{b}$ in the denominator approaches 1 and the quantity $(a+b)^2$ approaches infinity as $b \rightarrow \infty$. This means that the spread is driven by the parameters $a$ and $b$ too. The spread is narrower and narrower as one of the parameters becomes larger and larger. This phenomenon is also reflected in the above diagrams of beta density curves.

_______________________________________________________________________________________________

Other Distributional Quantities

The following is the cumulative distribution function of the beta distribution.

$\displaystyle F(x)=\int_0^x \frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ t^{a-1} \ (1-t)^{b-1} \ dt \ \ \ \ \ \ \ \ \ 0

The CDF can also be expressed as $\displaystyle F(x)=\frac{B(a,b,x)}{B(a,b)}$ where $B(a,b,x)$ is the incomplete beta function defined by:

$\displaystyle B(a,b,x)=\int_0^x t^{a-1} \ (1-t)^{b-1} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (6)$

The CDF as exhibited in $(5)$ has no closed form. As a result, the values of the CDF will have to be obtained thru numerical approximation or using software. Likewise, percentiles of the beta distribution can only be approximated. For the distributional quantities not discussed here (e.g. moment generating function), see the Wikepedia entry on the beta function.

_______________________________________________________________________________________________

Extending the Support

The standard beta distribution takes on values in the unit interval $(0,1)$. If the random quantity to be modeled can extend beyond the unit interval, the beta distribution can also be transformed to match the situation. Let $X$ be the standard beta distribution with parameters $a$ and $b$. Let $\theta$ be a positive number. Let $Y=\theta \ X$. The support of $Y$ is then the interval $(0, \theta)$. As a constant multiple of the standard beta $X$, the random variable $Y$ would have similar shapes and properties. The following are the density function and CDF.

\displaystyle \begin{aligned} f_Y(y)&=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ \biggl(\frac{y}{\theta}\biggr)^{a-1} \ \biggl[1-\biggl(\frac{y}{\theta}\biggr)\biggr]^{b-1} \ \frac{1}{\theta} \ \ \ \ \ \ \ \ 0

$\displaystyle F_Y(y)=\int_0^{\frac{y}{\theta}} \frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ t^{a-1} \ (1-t)^{b-1} \ dt \ \ \ \ \ \ \ \ \ 0

$\displaystyle F_Y(y)=\frac{B(a,b,\frac{y}{\theta})}{B(a,b)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (8a)$

Note that $(8a)$ expresses the CDF of $Y$ using the incomplete beta function defined in $(6)$. All the distributional quantities of $X$ discussed above can be obtained for $Y$ by applying the appropriate multiplier. For example, $E(Y^k)=\theta^k E(X)$. As a result, a similar table can be obtained for the transformed beta distribution.

$\displaystyle \begin{array}{lllll} \text{ } &\text{ } & \text{Definition} & \text{ } & \text{Transformed Beta } Y=\theta X \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Mean} &\text{ } & \text{ } & \text{ } & \displaystyle \theta \ \frac{a}{a+b} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Variance} &\text{ } & E(Y^2)-E(Y)^2 & \text{ } & \displaystyle \theta^2 \ \frac{ab}{(a+b)^2 \ (a+b+1)} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Coefficient of Variation} &\text{ } & \displaystyle \frac{\sqrt{Var(Y)}}{E(Y)}=\frac{\sigma_Y}{\mu_Y} & \text{ } & \displaystyle \sqrt{\frac{b}{a(a+b+1)}} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Skewness} &\text{ } & \displaystyle E\biggl[\biggl(\frac{Y-\mu_Y}{\sigma_Y}\biggr)^3\biggr] & \text{ } & \displaystyle \frac{2(b-a) \sqrt{a+b+1}}{(a+b+2) \sqrt{ab}} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Kurtosis} &\text{ } & \displaystyle E\biggl[\biggl(\frac{Y-\mu_Y}{\sigma_Y}\biggr)^4\biggr] & \text{ } & \displaystyle \frac{6[(a-b)^2 (a+b+1)-ab(a+b+2)]}{ab(a+b+2)(a+b+3)} +3 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Excess Kurtosis} &\text{ } & \displaystyle E\biggl[\biggl(\frac{Y-\mu_Y}{\sigma_Y}\biggr)^4\biggr]-3 & \text{ } & \displaystyle \frac{6[(a-b)^2 (a+b+1)-ab(a+b+2)]}{ab(a+b+2)(a+b+3)} \end{array}$

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$