# Generalized beta distribution

The beta distribution is defined using the beta function. The beta distribution can also be naturally generated as order statistics by sampling from the uniform distribution. This post presents a generalization of the standard beta distribution.

There are many generalized beta distributions. This post defines a “basic” generalized beta distribution that has four parameters. Recall that the standard beta distribution has two parameters $a$ and $b$. Both $a$ and $b$ drive the shape of the beta distribution, e.g. its skewness is driven by the magnitude of $b-a$. The generalized beta distribution defined here has four parameters $a$, $b$, $\tau$ and $\theta$. The value $\tau$ is an exponent parameter and the parameter $\theta$ is a scale parameter to translate the distribution to an interval other than the unit interval. The generalized beta distribution discussed here is called the generalized beta distribution of the first kind (see the paper listed in the reference section).

The role of the parameter $\tau$ is interesting in that it affects the shape of the new distribution, e.g. making the distribution more skewed or less skewed. Yet it is not a strictly a shape parameter. But it can greatly accentuate or reduce the skewness of the starting beta distribution depending on the value of $\tau$ (see the last section below).

_______________________________________________________________________________________________

Example 1

Before formally define the distribution, let’s look at the effect of the two additional parameters $\tau$ and $\theta$ through an example. The parameter $\theta$ is a translation parameter. First, look at the effect of adding $\tau$.

Let $X$ be a random variable that follows the beta distribution with parameters $a=3$ and $b=7$. The following is the density function of $X$.

$\displaystyle f(x)=252 \ x^{3-1} \ (1-x)^{7-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

Consider the following random variables:

$\displaystyle Y=X^{\frac{1}{\tau}} \ \ \ \ \ \ \ \ \ \ \ \ \tau =\frac{1}{2}$

$\displaystyle T=X^{\frac{1}{\tau}} \ \ \ \ \ \ \ \ \ \ \ \ \tau =2$

$\displaystyle Y_1=5 \ Y$

$\displaystyle T_1=5 \ T$

Essentially $Y$ is the square of $X$ and $T$ is the square root of $X$. To know more about the random variables $Y$ and $T$, let’s look at the graphs of the density functions. The following diagram shows the density functions of $X$ (blue curve), $Y$ (tall red curve) and $T$ (black curve).

Figure 1

The standard beta density curve for the random variable $X$ has moderate right skewness (the blue curve). Squaring $X$ produces a density curve with much more pronounced skewness (the red curve). This is because the action of squaring puts more probabilities on the smaller numbers. Squaring tends to shift the data closer to the origin. For example, 0.9 becomes 0.81, 0.5 becomes 0.25, 0.1 becomes 0.01, 0.001 becomes 0.000001 and so on.

Yet taking the square root on the standard beta has the opposite effect. The effect is to push the data toward 1.0, producing a density curve (the black curve) that is slightly negatively skewed (it looks almost symmetric).

Even though the random variable $Y$ is obtained by squaring the beta $X$, the density function of $Y$ is via a square root. On the other hand, while the random variable $T$ is obtained by taking square root of $X$, the density function of $T$ is obtained via squaring. The following are the density functions of $Y$ and $T$.

\displaystyle \begin{aligned}f_Y(y)&=f_X(y^{1/2}) \ \frac{d}{dy} y^{1/2} \\&\text{ } \\&=252 \ [y^{1/2} ]^2 \ [1-y^{1/2} ]^6 \frac{1}{2 y^{1/2}} \\&\text{ } \\&=126 \ y^{1/2} \ [1-y^{1/2} ]^6 \ \ \ \ \ \ \ \ \ \ \ 0

$\text{ }$

\displaystyle \begin{aligned}f_T(t)&=f_X(t^2) \ \frac{d}{dy} t^2 \\&\text{ } \\&=252 \ [t^2 ]^2 \ [1-t^2 ]^6 \ 2t \\&\text{ } \\&=504 \ t^5 \ [1-t^2 ]^6 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

Because $Y$ and $T$ are defined by raising $X$ to a power, the properties involving moments can be derived from the beta distribution on $X$, via $E(Y^k)=E(X^{2k})$ and $E(T^k)=E(X^{k/2})$. The following table shows the first four moments of $Y$ and $T$ (using the formula for beta moments found here).

$\displaystyle \begin{array}{lllll} \text{ } &\text{ } & E(Y^k)=E(X^{2k}) & \text{ } & E(T^k)=E(X^{k/2}) \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ k=1 &\text{ } & \displaystyle E(X^2)=\frac{6}{55} & \text{ } & \displaystyle E(X^{1/2})=\frac{24576}{46189} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ k=2 &\text{ } & \displaystyle E(X^4)=\frac{3}{143} & \text{ } & \displaystyle E(X)=\frac{3}{10} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ k=3 &\text{ } & \displaystyle E(X^6)=\frac{4}{715} & \text{ } & \displaystyle E(X^{3/2})=\frac{57344}{323323} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ k=4 &\text{ } & \displaystyle E(X^8)=\frac{9}{4862} & \text{ } & \displaystyle E(X^2)=\frac{6}{55} \\ \end{array}$

The following table shows a comparison of the three random variables.

$\displaystyle \begin{array}{lllllll} \text{ } &\text{ } & X & \text{ } & Y=X^2& \text{ } & T=X^{1/2} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Mean} &\text{ } & \displaystyle E(X)=\frac{3}{10}=0.3 & \text{ } & \displaystyle E(Y)=\frac{6}{55}=0.11 & \text{ } & \displaystyle E(T)=\frac{24576}{46189}=0.53\\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Variance} &\text{ } & \displaystyle Var(X)=\frac{21}{1100} & \text{ } & \displaystyle Var(Y)=0.009078195 & \text{ } & \displaystyle Var(T)=0.016896475 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Std Dev} &\text{ } & \displaystyle \sigma_X=0.1382 & \text{ } & \displaystyle \sigma_Y=0.09528 & \text{ } & \displaystyle \sigma_T=0.1300 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{CV} &\text{ } & \displaystyle \frac{\sigma_X}{\mu_X}=0.4606 & \text{ } & \displaystyle \frac{\sigma_Y}{\mu_Y}=0.8734 & \text{ } & \displaystyle \frac{\sigma_T}{\mu_T}=0.2443 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Skewness} &\text{ } & \displaystyle \gamma_X=0.4825 & \text{ } & \displaystyle \gamma_Y=1.5320 & \text{ } & \displaystyle \gamma_T=-0.1113 \\ \end{array}$

The variance is calculated by letting the second moment subtracting the square of the first moment. The standard deviation is the square root of the variance. CV stands for coefficient of variation, which is the ratio of the standard deviation to the mean. The skewness is the third central moment divided by the cube of the standard deviation.

Note that the skewness calculation confirms what we see in the three density curves in Figure 1. The skewness of the beta distribution is moderate (right skewed). Squaring the beta distribution has the effect of pushing the data to the origin, hence the standard deviation is smaller and the right skew is more pronounced (3 times as strong). Taking the square root of the beta distribution goes the opposite direction, leading to a slightly left skewed distribution.

The above discussion only focuses on the effect of the parameter $\tau$ (the effect of raising the base distribution to a power). The other parameter $\theta$ is a scale parameter that translates the transformed distribution from the interval $(0,1)$ to the interval $(0, \theta)$. The following diagrams show the density function of $Y_1=5 \ Y$ (Figure 2) and the density function of $T_1=5 \ T$ (Figure 3).

Figure 2

Figure 3

Multiplying by 5 certainly affects the mean and variance. The CV and skewness remain the same. Thus the scale parameter does not change the shape.

_______________________________________________________________________________________________

Basic Properties

Let $a$, $b$, $\tau$ and $\theta$ be some fixed positive real numbers. A random variable $T$ follows the generalized beta distribution with parameters $a$, $b$, $\tau$ and $\theta$ if

$\displaystyle T=\theta \ X^{\frac{1}{\tau}}$

where $X$ is a random variable that follows the beta distribution with parameters $a$ and $b$. In other words, if we start with a standard beta distribution, raising it to a power and then multiplying a scale parameter would produce a generalized beta distribution. On the other hand, if we start with a generalized beta distribution, dividing it by a parameter called $\theta$ and then raising it to a power would produce a standard beta distribution. In this post, we prefer to work with the first progression – defining the generalized beta from the standard beta.

We first derive the density function of the random variable $\displaystyle Y=X^{1/\tau}$, i.e. the generalized beta distribution without the parameter $\theta$. The new random variable is obtained by raising the old to $1/\tau$. As a result, the new density function is obtained by plugging $x^\tau$ into the old density function and multiplying the derivative of $x^\tau$.

$\displaystyle f_X(x)=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ x^{a-1} \ (1-x)^{b-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

\displaystyle \begin{aligned}f_Y(x)&=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ (x^\tau)^{a-1} \ (1-x^\tau)^{b-1} \ \tau x^{\tau-1} \\&=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ (x^\tau)^{a} \ (1-x^\tau)^{b-1} \ \frac{\tau}{x} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

Now add the scale parameter. The effect is that $x/ \theta$ is plugged into the density function of $Y$ and that the density function is multiplied by $1/ \theta$ (the derivative).

\displaystyle \begin{aligned}f_T(x)&=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ \biggl[\biggl(\frac{x}{\theta}\biggr)^\tau \biggr]^{a-1} \ \biggl[1-\biggl(\frac{x}{\theta}\biggr)^\tau \biggr]^{b-1} \ \tau \biggl(\frac{x}{\theta}\biggr)^{\tau-1} \ \frac{1}{\theta} \\&=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ \biggl[\biggl(\frac{x}{\theta}\biggr)^\tau \biggr]^{a} \ \biggl[1-\biggl(\frac{x}{\theta}\biggr)^\tau \biggr]^{b-1} \ \frac{\tau}{x} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

Recall that the cumulative distribution function of the standard beta $X$ can be expressed using the incomplete beta function.

$\displaystyle F_X(x)=\frac{B(a,b,x)}{B(a,b)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

$\displaystyle B(a,b,x)=\int_0^x \ t^{a-1} \ (1-t)^{b-1} \ dt$

$\displaystyle B(a,b)=\int_0^1 \ t^{a-1} \ (1-t)^{b-1} \ dt$

The CDF in $(4)$ has no closed form. Since $Y$ and $T$ are obtained by raising $X$ to a power, their CDFs can still be expressed using the incomplete beta function.

$\displaystyle F_Y(x)=\frac{B(a,b,x^\tau)}{B(a,b)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

$\displaystyle F_T(x)=\frac{B(a,b,(x/ \theta )^\tau )}{B(a,b)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

For the standard beta distribution, all positive moments exist, i.e. $E(X^k)$ is defined for all positive real numbers $k$. As a result, all positive moments exist for the generalized $Y$ and $T$ as well.

$\displaystyle E(X^k) = \left\{ \begin{array}{ll} \displaystyle \frac{\Gamma(a+b)}{\Gamma(a)} \ \frac{\Gamma(a+k)}{\Gamma(a+b+k)} &\ \ \ \ \ \ k>-a \\ \text{ } & \text{ } \\ \displaystyle \frac{a (a+1) \cdots (a+k-1)}{(a+b) (a+b+1) \cdots (a+b+k-1)} &\ \ \ \ \ \ k \text{ is a positive integer} \end{array} \right.$

$\displaystyle E(Y^k)=E(X^{k / \tau})=\displaystyle \frac{\Gamma(a+b)}{\Gamma(a)} \ \frac{\Gamma(a+k/ \tau)}{\Gamma(a+b+k / \tau)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ k>-a \tau \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (7)$

$\displaystyle E(T^k)=E(\theta^k \ X^{k / \tau})=\displaystyle \theta^k \ \frac{\Gamma(a+b)}{\Gamma(a)} \ \frac{\Gamma(a+k/ \tau)}{\Gamma(a+b+k / \tau)} \ \ \ \ \ \ \ \ \ \ \ k>-a \tau \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (8)$

Once the moments are known, distributional quantities such as variance, coefficient of variation and skewness and kurtosis can be routinely calculated.

_______________________________________________________________________________________________

How the Shape Can Change by the Parameter $\tau$

The parameters $a$ and $b$ are both shape parameters for the standard beta distribution. the larger the one of them (in relation to the other), the more stronger the skewness. The direction of the skew depends on which one is larger. The beta distribution has a right skew if $b$ is larger (the parameter associated with the $1-x$ term in the beta density function) and has a left skew if $a$ is larger (the parameter associated with the $x$ term in the beta density). The additional parameter $\tau$ can further tweak the skewness of the beta distribution.

In the example discussed above, the starting beta distribution with $a=3$ and $b = 7$ is a right skewed distribution. Squaring it ($\tau=0.5$) produces a stronger skewness to the right (see Figure 1). Taking a square root ($\tau=2$) produces a weaker skewness to the right (in fact a slight skewness to the left).

Consider the case $\tau<1$. Raising the beta $X$ to the power of $1/ \tau$ has the effect of pushing the data toward the origin. As a result, this action makes the random variable $X^{1 / \tau}$ to become right skewed. If the standard beta is already right skewed, raising it to the power of $1/ \tau$ will make the right skew stronger. If the standard beta is symmetric, raising it to the power of $1/ \tau$ will produce a moderate right skew. If the standard beta is left skewed, raising it to the power of $1/ \tau$ will reduce the magnitude of the left skew (possibly producing a slight right skew).

Now consider that case that $\tau>1$. Raising the beta $X$ to the power of $1/ \tau$ has the effect of pushing the data toward the end point of the interval at 1. As a result, this action makes the random variable $X^{1 / \tau}$ to become left skewed. If the standard beta is already left skewed, raising it to the power of $1/ \tau$ will make the left skew stronger. If the standard beta is symmetric, raising it to the power of $1/ \tau$ will produce a moderate left skew. If the standard beta is right skewed, raising it to the power of $1/ \tau$ will reduce the magnitude of the right skew (possibly producing a slight left skew).

The following example illustrates the idea of alternating the skewness by the parameter $\tau$. The calculation is left as an exercise.

$\displaystyle \begin{array}{ccccccc} \text{Beta } X&\text{ } & \text{Skewness of } X& \text{ } & \text{Skewness of } X^{1 / \tau}& \text{ } & \text{Skewness of } X^{1 / \tau}\\ \text{ } &\text{ } & \text{ } & \text{ } & \tau=0.5 <1& \text{ } & \tau=2 >1 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Beta 7, 3} &\text{ } & \displaystyle -0.4825 & \text{ } & \displaystyle -0.01066 & \text{ } & \displaystyle -0.7859\\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Beta 5, 5} &\text{ } & \displaystyle 0 & \text{ } & \displaystyle 0.67285 & \text{ } & \displaystyle -0.3930 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Beta 3, 7} &\text{ } & \displaystyle 0.4825 & \text{ } & \displaystyle 1.53195 & \text{ } & \displaystyle -0.11135 \\ \end{array}$

The Beta with $a=7$ and $b=3$ has a left skew since $a$ dominates. Raising it to $1 / \tau$ with $\tau<1$ pushes the data to the origin and thus reducing the left skew greatly. On the other hand, Raising it to $1 / \tau$ with $\tau>1$ pushes the data to 1.0 and as a result making the left skew even stronger.

Starting with the symmetric Beta with $a=5$ and $b=5$, the case of $\tau<1$ produces a moderate right skew (pushing the data to the origin) and the case of $\tau>1$ produces a moderate left skew (pushing the data to 1.0).

The right skewed Beta with $a=3$ and $b=7$ has the opposite dynamics as for the beta with $a=7$ and $b=3$ and is illustrated in Figure 1 above.

_______________________________________________________________________________________________

Reference

1. McDonald, J. B., Some generalization functions for the size distribution of income, Econometrica, 52, 3, 647-663 (1984).

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$