# Generalized beta distribution

The beta distribution is defined using the beta function. The beta distribution can also be naturally generated as order statistics by sampling from the uniform distribution. This post presents a generalization of the standard beta distribution.

There are many generalized beta distributions. This post defines a “basic” generalized beta distribution that has four parameters. Recall that the standard beta distribution has two parameters $a$ and $b$. Both $a$ and $b$ drive the shape of the beta distribution, e.g. its skewness is driven by the magnitude of $b-a$. The generalized beta distribution defined here has four parameters $a$, $b$, $\tau$ and $\theta$. The value $\tau$ is an exponent parameter and the parameter $\theta$ is a scale parameter to translate the distribution to an interval other than the unit interval. The generalized beta distribution discussed here is called the generalized beta distribution of the first kind (see the paper listed in the reference section).

The role of the parameter $\tau$ is interesting in that it affects the shape of the new distribution, e.g. making the distribution more skewed or less skewed. Yet it is not a strictly a shape parameter. But it can greatly accentuate or reduce the skewness of the starting beta distribution depending on the value of $\tau$ (see the last section below).

_______________________________________________________________________________________________

Example 1

Before formally define the distribution, let’s look at the effect of the two additional parameters $\tau$ and $\theta$ through an example. The parameter $\theta$ is a translation parameter. First, look at the effect of adding $\tau$.

Let $X$ be a random variable that follows the beta distribution with parameters $a=3$ and $b=7$. The following is the density function of $X$.

$\displaystyle f(x)=252 \ x^{3-1} \ (1-x)^{7-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

Consider the following random variables:

$\displaystyle Y=X^{\frac{1}{\tau}} \ \ \ \ \ \ \ \ \ \ \ \ \tau =\frac{1}{2}$

$\displaystyle T=X^{\frac{1}{\tau}} \ \ \ \ \ \ \ \ \ \ \ \ \tau =2$

$\displaystyle Y_1=5 \ Y$

$\displaystyle T_1=5 \ T$

Essentially $Y$ is the square of $X$ and $T$ is the square root of $X$. To know more about the random variables $Y$ and $T$, let’s look at the graphs of the density functions. The following diagram shows the density functions of $X$ (blue curve), $Y$ (tall red curve) and $T$ (black curve).

Figure 1

The standard beta density curve for the random variable $X$ has moderate right skewness (the blue curve). Squaring $X$ produces a density curve with much more pronounced skewness (the red curve). This is because the action of squaring puts more probabilities on the smaller numbers. Squaring tends to shift the data closer to the origin. For example, 0.9 becomes 0.81, 0.5 becomes 0.25, 0.1 becomes 0.01, 0.001 becomes 0.000001 and so on.

Yet taking the square root on the standard beta has the opposite effect. The effect is to push the data toward 1.0, producing a density curve (the black curve) that is slightly negatively skewed (it looks almost symmetric).

Even though the random variable $Y$ is obtained by squaring the beta $X$, the density function of $Y$ is via a square root. On the other hand, while the random variable $T$ is obtained by taking square root of $X$, the density function of $T$ is obtained via squaring. The following are the density functions of $Y$ and $T$.

\displaystyle \begin{aligned}f_Y(y)&=f_X(y^{1/2}) \ \frac{d}{dy} y^{1/2} \\&\text{ } \\&=252 \ [y^{1/2} ]^2 \ [1-y^{1/2} ]^6 \frac{1}{2 y^{1/2}} \\&\text{ } \\&=126 \ y^{1/2} \ [1-y^{1/2} ]^6 \ \ \ \ \ \ \ \ \ \ \ 0

$\text{ }$

\displaystyle \begin{aligned}f_T(t)&=f_X(t^2) \ \frac{d}{dy} t^2 \\&\text{ } \\&=252 \ [t^2 ]^2 \ [1-t^2 ]^6 \ 2t \\&\text{ } \\&=504 \ t^5 \ [1-t^2 ]^6 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

Because $Y$ and $T$ are defined by raising $X$ to a power, the properties involving moments can be derived from the beta distribution on $X$, via $E(Y^k)=E(X^{2k})$ and $E(T^k)=E(X^{k/2})$. The following table shows the first four moments of $Y$ and $T$ (using the formula for beta moments found here).

$\displaystyle \begin{array}{lllll} \text{ } &\text{ } & E(Y^k)=E(X^{2k}) & \text{ } & E(T^k)=E(X^{k/2}) \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ k=1 &\text{ } & \displaystyle E(X^2)=\frac{6}{55} & \text{ } & \displaystyle E(X^{1/2})=\frac{24576}{46189} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ k=2 &\text{ } & \displaystyle E(X^4)=\frac{3}{143} & \text{ } & \displaystyle E(X)=\frac{3}{10} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ k=3 &\text{ } & \displaystyle E(X^6)=\frac{4}{715} & \text{ } & \displaystyle E(X^{3/2})=\frac{57344}{323323} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ k=4 &\text{ } & \displaystyle E(X^8)=\frac{9}{4862} & \text{ } & \displaystyle E(X^2)=\frac{6}{55} \\ \end{array}$

The following table shows a comparison of the three random variables.

$\displaystyle \begin{array}{lllllll} \text{ } &\text{ } & X & \text{ } & Y=X^2& \text{ } & T=X^{1/2} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Mean} &\text{ } & \displaystyle E(X)=\frac{3}{10}=0.3 & \text{ } & \displaystyle E(Y)=\frac{6}{55}=0.11 & \text{ } & \displaystyle E(T)=\frac{24576}{46189}=0.53\\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Variance} &\text{ } & \displaystyle Var(X)=\frac{21}{1100} & \text{ } & \displaystyle Var(Y)=0.009078195 & \text{ } & \displaystyle Var(T)=0.016896475 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Std Dev} &\text{ } & \displaystyle \sigma_X=0.1382 & \text{ } & \displaystyle \sigma_Y=0.09528 & \text{ } & \displaystyle \sigma_T=0.1300 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{CV} &\text{ } & \displaystyle \frac{\sigma_X}{\mu_X}=0.4606 & \text{ } & \displaystyle \frac{\sigma_Y}{\mu_Y}=0.8734 & \text{ } & \displaystyle \frac{\sigma_T}{\mu_T}=0.2443 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Skewness} &\text{ } & \displaystyle \gamma_X=0.4825 & \text{ } & \displaystyle \gamma_Y=1.5320 & \text{ } & \displaystyle \gamma_T=-0.1113 \\ \end{array}$

The variance is calculated by letting the second moment subtracting the square of the first moment. The standard deviation is the square root of the variance. CV stands for coefficient of variation, which is the ratio of the standard deviation to the mean. The skewness is the third central moment divided by the cube of the standard deviation.

Note that the skewness calculation confirms what we see in the three density curves in Figure 1. The skewness of the beta distribution is moderate (right skewed). Squaring the beta distribution has the effect of pushing the data to the origin, hence the standard deviation is smaller and the right skew is more pronounced (3 times as strong). Taking the square root of the beta distribution goes the opposite direction, leading to a slightly left skewed distribution.

The above discussion only focuses on the effect of the parameter $\tau$ (the effect of raising the base distribution to a power). The other parameter $\theta$ is a scale parameter that translates the transformed distribution from the interval $(0,1)$ to the interval $(0, \theta)$. The following diagrams show the density function of $Y_1=5 \ Y$ (Figure 2) and the density function of $T_1=5 \ T$ (Figure 3).

Figure 2

Figure 3

Multiplying by 5 certainly affects the mean and variance. The CV and skewness remain the same. Thus the scale parameter does not change the shape.

_______________________________________________________________________________________________

Basic Properties

Let $a$, $b$, $\tau$ and $\theta$ be some fixed positive real numbers. A random variable $T$ follows the generalized beta distribution with parameters $a$, $b$, $\tau$ and $\theta$ if

$\displaystyle T=\theta \ X^{\frac{1}{\tau}}$

where $X$ is a random variable that follows the beta distribution with parameters $a$ and $b$. In other words, if we start with a standard beta distribution, raising it to a power and then multiplying a scale parameter would produce a generalized beta distribution. On the other hand, if we start with a generalized beta distribution, dividing it by a parameter called $\theta$ and then raising it to a power would produce a standard beta distribution. In this post, we prefer to work with the first progression – defining the generalized beta from the standard beta.

We first derive the density function of the random variable $\displaystyle Y=X^{1/\tau}$, i.e. the generalized beta distribution without the parameter $\theta$. The new random variable is obtained by raising the old to $1/\tau$. As a result, the new density function is obtained by plugging $x^\tau$ into the old density function and multiplying the derivative of $x^\tau$.

$\displaystyle f_X(x)=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ x^{a-1} \ (1-x)^{b-1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

\displaystyle \begin{aligned}f_Y(x)&=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ (x^\tau)^{a-1} \ (1-x^\tau)^{b-1} \ \tau x^{\tau-1} \\&=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ (x^\tau)^{a} \ (1-x^\tau)^{b-1} \ \frac{\tau}{x} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

Now add the scale parameter. The effect is that $x/ \theta$ is plugged into the density function of $Y$ and that the density function is multiplied by $1/ \theta$ (the derivative).

\displaystyle \begin{aligned}f_T(x)&=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ \biggl[\biggl(\frac{x}{\theta}\biggr)^\tau \biggr]^{a-1} \ \biggl[1-\biggl(\frac{x}{\theta}\biggr)^\tau \biggr]^{b-1} \ \tau \biggl(\frac{x}{\theta}\biggr)^{\tau-1} \ \frac{1}{\theta} \\&=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ \biggl[\biggl(\frac{x}{\theta}\biggr)^\tau \biggr]^{a} \ \biggl[1-\biggl(\frac{x}{\theta}\biggr)^\tau \biggr]^{b-1} \ \frac{\tau}{x} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

Recall that the cumulative distribution function of the standard beta $X$ can be expressed using the incomplete beta function.

$\displaystyle F_X(x)=\frac{B(a,b,x)}{B(a,b)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

$\displaystyle B(a,b,x)=\int_0^x \ t^{a-1} \ (1-t)^{b-1} \ dt$

$\displaystyle B(a,b)=\int_0^1 \ t^{a-1} \ (1-t)^{b-1} \ dt$

The CDF in $(4)$ has no closed form. Since $Y$ and $T$ are obtained by raising $X$ to a power, their CDFs can still be expressed using the incomplete beta function.

$\displaystyle F_Y(x)=\frac{B(a,b,x^\tau)}{B(a,b)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

$\displaystyle F_T(x)=\frac{B(a,b,(x/ \theta )^\tau )}{B(a,b)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ 0

For the standard beta distribution, all positive moments exist, i.e. $E(X^k)$ is defined for all positive real numbers $k$. As a result, all positive moments exist for the generalized $Y$ and $T$ as well.

$\displaystyle E(X^k) = \left\{ \begin{array}{ll} \displaystyle \frac{\Gamma(a+b)}{\Gamma(a)} \ \frac{\Gamma(a+k)}{\Gamma(a+b+k)} &\ \ \ \ \ \ k>-a \\ \text{ } & \text{ } \\ \displaystyle \frac{a (a+1) \cdots (a+k-1)}{(a+b) (a+b+1) \cdots (a+b+k-1)} &\ \ \ \ \ \ k \text{ is a positive integer} \end{array} \right.$

$\displaystyle E(Y^k)=E(X^{k / \tau})=\displaystyle \frac{\Gamma(a+b)}{\Gamma(a)} \ \frac{\Gamma(a+k/ \tau)}{\Gamma(a+b+k / \tau)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ k>-a \tau \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (7)$

$\displaystyle E(T^k)=E(\theta^k \ X^{k / \tau})=\displaystyle \theta^k \ \frac{\Gamma(a+b)}{\Gamma(a)} \ \frac{\Gamma(a+k/ \tau)}{\Gamma(a+b+k / \tau)} \ \ \ \ \ \ \ \ \ \ \ k>-a \tau \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (8)$

Once the moments are known, distributional quantities such as variance, coefficient of variation and skewness and kurtosis can be routinely calculated.

_______________________________________________________________________________________________

How the Shape Can Change by the Parameter $\tau$

The parameters $a$ and $b$ are both shape parameters for the standard beta distribution. the larger the one of them (in relation to the other), the more stronger the skewness. The direction of the skew depends on which one is larger. The beta distribution has a right skew if $b$ is larger (the parameter associated with the $1-x$ term in the beta density function) and has a left skew if $a$ is larger (the parameter associated with the $x$ term in the beta density). The additional parameter $\tau$ can further tweak the skewness of the beta distribution.

In the example discussed above, the starting beta distribution with $a=3$ and $b = 7$ is a right skewed distribution. Squaring it ($\tau=0.5$) produces a stronger skewness to the right (see Figure 1). Taking a square root ($\tau=2$) produces a weaker skewness to the right (in fact a slight skewness to the left).

Consider the case $\tau<1$. Raising the beta $X$ to the power of $1/ \tau$ has the effect of pushing the data toward the origin. As a result, this action makes the random variable $X^{1 / \tau}$ to become right skewed. If the standard beta is already right skewed, raising it to the power of $1/ \tau$ will make the right skew stronger. If the standard beta is symmetric, raising it to the power of $1/ \tau$ will produce a moderate right skew. If the standard beta is left skewed, raising it to the power of $1/ \tau$ will reduce the magnitude of the left skew (possibly producing a slight right skew).

Now consider that case that $\tau>1$. Raising the beta $X$ to the power of $1/ \tau$ has the effect of pushing the data toward the end point of the interval at 1. As a result, this action makes the random variable $X^{1 / \tau}$ to become left skewed. If the standard beta is already left skewed, raising it to the power of $1/ \tau$ will make the left skew stronger. If the standard beta is symmetric, raising it to the power of $1/ \tau$ will produce a moderate left skew. If the standard beta is right skewed, raising it to the power of $1/ \tau$ will reduce the magnitude of the right skew (possibly producing a slight left skew).

The following example illustrates the idea of alternating the skewness by the parameter $\tau$. The calculation is left as an exercise.

$\displaystyle \begin{array}{ccccccc} \text{Beta } X&\text{ } & \text{Skewness of } X& \text{ } & \text{Skewness of } X^{1 / \tau}& \text{ } & \text{Skewness of } X^{1 / \tau}\\ \text{ } &\text{ } & \text{ } & \text{ } & \tau=0.5 <1& \text{ } & \tau=2 >1 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Beta 7, 3} &\text{ } & \displaystyle -0.4825 & \text{ } & \displaystyle -0.01066 & \text{ } & \displaystyle -0.7859\\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Beta 5, 5} &\text{ } & \displaystyle 0 & \text{ } & \displaystyle 0.67285 & \text{ } & \displaystyle -0.3930 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Beta 3, 7} &\text{ } & \displaystyle 0.4825 & \text{ } & \displaystyle 1.53195 & \text{ } & \displaystyle -0.11135 \\ \end{array}$

The Beta with $a=7$ and $b=3$ has a left skew since $a$ dominates. Raising it to $1 / \tau$ with $\tau<1$ pushes the data to the origin and thus reducing the left skew greatly. On the other hand, Raising it to $1 / \tau$ with $\tau>1$ pushes the data to 1.0 and as a result making the left skew even stronger.

Starting with the symmetric Beta with $a=5$ and $b=5$, the case of $\tau<1$ produces a moderate right skew (pushing the data to the origin) and the case of $\tau>1$ produces a moderate left skew (pushing the data to 1.0).

The right skewed Beta with $a=3$ and $b=7$ has the opposite dynamics as for the beta with $a=7$ and $b=3$ and is illustrated in Figure 1 above.

_______________________________________________________________________________________________

Reference

1. McDonald, J. B., Some generalization functions for the size distribution of income, Econometrica, 52, 3, 647-663 (1984).

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$

# The beta distribution as order statistics

The beta distribution is mathematically defined using the beta function as discussed in the previous post. The beta distribution can also arise naturally from random sampling from the uniform distribution. This natural generation of the beta distribution leads to an interesting discussion of order statistics and non-parametric inference.

_______________________________________________________________________________________________

Sampling from the Uniform Distribution

Suppose $X_1,X_2,\cdots,X_n$ is a random sample from a continuous distribution. Rank the sample in ascending order: $Y_1. The statistic $Y_1$ is the least sample item (the minimum statistic). The statistic $Y_2$ is the second smallest sample item (the second order statistic), and so on. Of course, $Y_n$ is the maximum statistic. Since we are sampling from a continuous distribution, assume that there is no chance for a tie among the sample items $X_i$ or the order statistics $Y_i$. Let $F(x)$ and $f(x)$ be the cumulative distribution function and the density function of the continuous distribution from which the random sample is drawn. The following is the density function of the order statistic $Y_i$ where $1 \le i \le n$.

$\displaystyle f_{Y_i}(y)=\frac{n!}{(i-1)! \ (n-i)!} \ F(y)^{i-1} \ f(y) \ [1-F(y)]^{n-i} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

To see how $(1)$ is derived, see the discussion here and here.

When sampling from the uniform distribution on $(0,1)$, the order statistics have the beta distribution. Suppose that the distribution from which the random sample is drawn is the uniform distribution on the unit interval $(0,1)$. Then $F(x)=x$ and $f(x)=1$ for all $0. Then the density function in $(1)$ becomes the following:

$\displaystyle f_{Y_i}(y)=\frac{n!}{(i-1)! \ (n-i)!} \ y^{i-1} \ [1-y]^{n-i} \ \ \ \ \ \ 0

The density in $(2)$ is a beta density function with $a=i$ and $b=n-i+1$. The following is the mean of the beta distribution described in $(2)$.

$\displaystyle E(Y_i)=\frac{a}{a+b}=\frac{i}{n+1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)$

To summarize, in a random sample of size $n$ drawn from a uniform distribution on $(0,1)$, the $i$th order statistic $Y_i$ has a beta distribution with parameters $a=i$ and $b=n-i+1$. The following table shows the information when $n=7$.

$\displaystyle \begin{array}{ccccccc} \text{ } &\text{ } & \text{beta parameter } a & \text{ } & \text{beta parameter } b& \text{ } & E(Y_i) \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ Y_1 &\text{ } & 1 & \text{ } & 7 & \text{ } & \displaystyle \frac{1}{8}\\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ Y_2 &\text{ } & 2 & \text{ } & 6 & \text{ } & \displaystyle \frac{2}{8} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ Y_3 &\text{ } & 3 & \text{ } & 5 & \text{ } & \displaystyle \frac{3}{8} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ Y_4 &\text{ } & 4 & \text{ } & 4 & \text{ } & \displaystyle \frac{4}{8} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ Y_5 &\text{ } & 5 & \text{ } & 3 & \text{ } & \displaystyle \frac{5}{8} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ Y_6 &\text{ } & 6 & \text{ } & 2 & \text{ } & \displaystyle \frac{6}{8} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ Y_7 &\text{ } & 7 & \text{ } & 1 & \text{ } & \displaystyle \frac{7}{8} \\ \end{array}$

See here for a fuller discussion on the beta distribution.

_______________________________________________________________________________________________

The Non-Parametric Angle

In descriptive statistics, the sample statistics are used as point estimates for population parameters. For example, the sample mean can be used as an estimate for the population mean and the sample median is used as an estimate of the population median and so on. Such techniques are part of non-parametric statistics since there are no assumptions made about the probability distributions of the variables being assessed.

Let’s focus on estimation of percentiles. We do not need to assume a population probability distribution. Simply generate the sample data from the population. Then rank the sample data from the smallest to the largest. Use the sample item in the “middle” to estimate the population median. Likewise, the population 75th percentile is estimated by the sample item that ranks higher than approximately 75% of the sample items and ranks below 25% of the sample items. And so on. In other words, order statistics can be used as estimates of population percentiles. This makes intuitive sense. The estimate will always be correct from the perspective of the sample. For example, the estimate of the 75th percentile is chosen to rank approximately above 75% of the sample. But will the estimate chosen this way rank above 75% of the population? The remainder of the post shows that the answer is yes. On average the estimate will rank above the appropriate percentage of the population. Thus the non-parametric approach of using order statistics as estimate of population percentiles makes mathematical sense as well. The estimate is “unbiased” in the sense that it is expected to be rank correctly among the population values.

As before, $X_1,X_2,\cdots,X_n$ is the random sample and $Y_1 is the resulting ordered sample. We do not know the distribution from which the sample is obtained. To make the argument clear, let $f(x)$ be the density function and $F(x)$ be the CDF of the unknown population, respectively. We show that

regardless of the probability distribution from which the sample is generated, the expected area under the density curve $f(x)$ and to the left of $Y_i$ is $\displaystyle \frac{i}{n+1}$.

Thus the order statistic $Y_i$ is expected to be greater than $\displaystyle \biggl(100 \times \frac{i}{n+1}\biggr)$% of the population. This shows that it is mathematically justified to use the order statistics $Y_1 as estimates of the population percentiles.

For example, if the sample size $n$ is 11, then the middle sample item is $Y_6$. Then the area under the density curve of the unknown population distribution and to the left of $Y_6$ is expected to be 6/12 = 0.5. So $Y_6$ is expected to be greater than 50% of the distribution, even though the form of the distribution is unknown.

Note that $F(X)$ has a uniform distribution on the interval $(0,1)$ for any continuous random variable $X$. So $F(X_1),F(X_2),\cdots,F(X_n)$ is like a random sample drawn from the uniform distribution. Furthermore $F(Y_1) is an order sample from the uniform distribution since $F(x)$ as a CDF is an increasing function. Thus each item $F(Y_i)$ is an order statistic. Recall the result described in $(2)$. The order statistic $F(Y_i)$ has a beta distribution with $a=i$ and $b=n-i+1$. Then the mean of $F(Y_i)$ is

$\displaystyle E[F(Y_i)]=\frac{a}{a+b}=\frac{i}{n+1} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)$

, which is the same as the mean in $(3)$. Note that the area under the density curve $f(x)$ and to the left of $Y_i$ can be conceptually expressed as $F(Y_i)$. The expected value of this area is the ratio indicated in $(4)$. This concludes the argument that when the order statistic $Y_i$ is used as an estimate of a certain population percentile, it is on average a correct estimate in that it is expected to rank above the correct percentage of the population.

_______________________________________________________________________________________________

Remarks

A subclass of the beta distribution can be naturally generated from random sampling of the uniform distribution, as described in $(2)$. The non-parametric approach is not only for producing point estimates . It can be used as an inference procedure as well. For example, a wildlife biologist may be interested in estimating the median weight of black bear in a certain region in Alaska. The procedure is to capture a sample of bears, sedate the bears and then take the weight measurements. Rank the sample. The middle sample item is then estimate of the population median weight of black bears. The ordered sample can also be used to form a confidence interval for the median bear weight. For an explanation on how to form such distribution-free confidence intervals, see here.

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$

# Introducing the beta distribution

This post is an introduction on the standard beta distribution, i.e. the one whose support is the unit interval $(0, 1)$ and is mathematically defined from the beta function. There are many generalizations of the standard beta distribution. The most straightforward one is to resize the standard beta so that the support is an interval other than $(0, 1)$, which is discussed at the end of this post. There are many forms of generalized beta distribution. One subsequent post will define one such version.

The beta distribution and its many generalizations are important in economic modeling (e.g. modeling of income distribution). It can also be applied in actuarial modeling, e.g. for modeling insurances losses. In Bayesian analysis, the beta distribution can be used as a conjugate prior for the binomial model. The beta distribution has an interesting and important connection with order statistics and non-parametric inference, which is the subject of the next post.

_______________________________________________________________________________________________

The Beta Density Function

Let $a$ and $b$ be positive real numbers. The previous post establishes the following fact about the beta function.

$\displaystyle B(a,b)=\int_0^1 t^{a-1} \ (1-t)^{b-1} \ dt=\frac{\Gamma(a) \ \Gamma(b)}{\Gamma(a+b)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

As a result, the following is a density function.

$\displaystyle f(x)=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ x^{a-1} \ (1-x)^{b-1}; \ \ \ \ \ \ \ \ 0

A random variable $X$ is said to follow the beta distribution with parameters $a$ and $b$ if its pdf is $(1)$. In particular, if $a$ and $b$ are positive integers, then the beta density function is:

$\displaystyle f(x)=\frac{(a+b-1)!}{(a-1)! \ (b-1)!} \ x^{a-1} \ (1-x)^{b-1}; \ \ \ \ \ \ \ \ 0

It is instructive to look at the role of the parameters $a$ and $b$ in the beta distribution. It turns out that both parameters $a$ and $b$ play a role in determining the shape of the distribution. If $a$ and $b$ are less than 1, the beta density is a U shape where the density curve goes up to infinity as $x \rightarrow 0$ and $x \rightarrow 1$ (see the U shaped red curve in Figure 1).

If $a=b$, then the beta density curves are symmetric about the vertical line at $x=0.5$. When $a=b$, the value of the density curve at $x$ is identical to the value at $1-x$ (see Figure 1). Note that when $a=b=1$, the beta distribution is the uniform distribution on $(0,1)$.

Figure 1

If $a$ and $b$ are greater than 1 and if $a, then the beta density curve is skewed to the right (positively skewed). This means that a short portion of the left side of $(0,1)$ gets assigned more probabilities and the side to the right gets assigned less probabilities. More specifically, when $a, the random quantity that is described by the beta distribution is concentrated more on the lower end of the interval $(0, 1)$ (see Figure 2). The greater the $b$ as compared to $a$, the more pronounced the skewness.

Figure 2

If $a$ and $b$ are greater than 1 and if $a>b$, the shape is the opposite (see Figure 3). The skewness is to the left (negatively skewed). Most of the probabilities are concentrated on a relatively short segment on the right side of $(0, 1)$. Then the longer side to the left is assigned much less probabilities. Hence the distribution has a longer left tail.

Figure 3

_______________________________________________________________________________________________

Distributional Quantities Based on Moments

As the above figures of density curves show, both parameters $a$ and $b$ drive the shape of the beta distribution. Several of the distributional quantities are also easily derived from $a$ and $b$. For example, the moments of the beta distribution are easily derived. As a result, many distributional quantities that are based on moments can be calculated in a straightforward fashion, e.g. coefficient of variation, skewness, kurtosis. These in turn can shed more light on the beta distribution.

\displaystyle \begin{aligned} E(X^k)&=\int_0^1 x^k \ \frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ x^{a-1} \ (1-x)^{b-1} \ dx \\&=\int_0^1 \frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ x^{a+k-1} \ (1-x)^{b-1} \ dx \\&=\frac{\Gamma(a+k)}{\Gamma(a)} \frac{\Gamma(a+b)}{\Gamma(a+b+k)} \ \int_0^1 \frac{\Gamma(a+b+k)}{\Gamma(a+k) \ \Gamma(b)} \ x^{a+k-1} \ (1-x)^{b-1} \ dx\\&=\frac{\Gamma(a+k)}{\Gamma(a)} \frac{\Gamma(a+b)}{\Gamma(a+b+k)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ k \text{ is a real number with } k>-a \\&=\frac{a(a+1) \cdots (a+k-1)}{(a+b)(a+b+1) \cdots (a+b+k-1)} \ \ \ \ \ \ \ \ \ k \text{ is a positive integer} \ \ \ \ \ \ \ \ \ \ \ \ (4) \end{aligned}

The trick is to make the integrand the density of a beta distribution (in this case beta parameters $a+k$ and $b$). Then the contents that can be taken out of the integral would be the answer. To get the final expression of $E(X^k)$, we rely on a fact of the gamma function, which is that $\Gamma(\alpha+k)=\Gamma(\alpha) \ \alpha (\alpha+1) \cdots (\alpha+k-1)$. Thus the first four moments of the beta distribution are:

$\displaystyle E(X)=\frac{a}{a+b}$

$\displaystyle E(X^2)=\frac{a(a+1)}{(a+b)(a+b+1)}$

$\displaystyle E(X^3)=\frac{a(a+1)(a+2)}{(a+b)(a+b+1)(a+b+2)}$

$\displaystyle E(X^4)=\frac{a(a+1)(a+2)(a+3)}{(a+b)(a+b+1)(a+b+2)(a+b+3)}$

Here are some calculations that can be made using the first four moments established above.

$\displaystyle \begin{array}{lllll} \text{ } &\text{ } & \text{Definition} & \text{ } & \text{Standard Beta Distribution} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Mean} &\text{ } & \text{ } & \text{ } & \displaystyle \frac{a}{a+b} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Variance} &\text{ } & E(X^2)-E(X)^2 & \text{ } & \displaystyle \frac{ab}{(a+b)^2 \ (a+b+1)} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Coefficient of Variation} &\text{ } & \displaystyle \frac{\sqrt{Var(X)}}{E(X)}=\frac{\sigma}{\mu} & \text{ } & \displaystyle \sqrt{\frac{b}{a(a+b+1)}} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Skewness} &\text{ } & \displaystyle E\biggl[\biggl(\frac{X-\mu}{\sigma}\biggr)^3\biggr] & \text{ } & \displaystyle \frac{2(b-a) \sqrt{a+b+1}}{(a+b+2) \sqrt{ab}} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Kurtosis} &\text{ } & \displaystyle E\biggl[\biggl(\frac{X-\mu}{\sigma}\biggr)^4\biggr] & \text{ } & \displaystyle \frac{6[(a-b)^2 (a+b+1)-ab(a+b+2)]}{ab(a+b+2)(a+b+3)} +3 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Excess Kurtosis} &\text{ } & \displaystyle E\biggl[\biggl(\frac{X-\mu}{\sigma}\biggr)^4\biggr]-3 & \text{ } & \displaystyle \frac{6[(a-b)^2 (a+b+1)-ab(a+b+2)]}{ab(a+b+2)(a+b+3)} \end{array}$

Note the skewness coefficient agrees with the diagrams shown above. When the parameters $a$ and $b$ are equal, there is no skewness. When $b$ is greater than $a$, the skewness is positive (skewed to the right). The skewness is more pronounced when the parameter $b$ is substantially greater than the parameter $a$. Similarly, when $b$ is smaller than $a$, the skewness is negative.

Another interesting observation is about the variance. If one of the parameters $a$ and $b$ is fixed (it does not matter which one) while the other goes to infinity, the variance will go to zero. To see this, let $a$ be fixed. In the expression for $Var(X)$, divide the numerator and denominator by $b$ and obtain:

$\displaystyle Var(X)=\frac{a}{(a+b)^2 \ (\frac{a}{b}+1+\frac{1}{b})}$

The quantity $\frac{a}{b}+1+\frac{1}{b}$ in the denominator approaches 1 and the quantity $(a+b)^2$ approaches infinity as $b \rightarrow \infty$. This means that the spread is driven by the parameters $a$ and $b$ too. The spread is narrower and narrower as one of the parameters becomes larger and larger. This phenomenon is also reflected in the above diagrams of beta density curves.

_______________________________________________________________________________________________

Other Distributional Quantities

The following is the cumulative distribution function of the beta distribution.

$\displaystyle F(x)=\int_0^x \frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ t^{a-1} \ (1-t)^{b-1} \ dt \ \ \ \ \ \ \ \ \ 0

The CDF can also be expressed as $\displaystyle F(x)=\frac{B(a,b,x)}{B(a,b)}$ where $B(a,b,x)$ is the incomplete beta function defined by:

$\displaystyle B(a,b,x)=\int_0^x t^{a-1} \ (1-t)^{b-1} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (6)$

The CDF as exhibited in $(5)$ has no closed form. As a result, the values of the CDF will have to be obtained thru numerical approximation or using software. Likewise, percentiles of the beta distribution can only be approximated. For the distributional quantities not discussed here (e.g. moment generating function), see the Wikepedia entry on the beta function.

_______________________________________________________________________________________________

Extending the Support

The standard beta distribution takes on values in the unit interval $(0,1)$. If the random quantity to be modeled can extend beyond the unit interval, the beta distribution can also be transformed to match the situation. Let $X$ be the standard beta distribution with parameters $a$ and $b$. Let $\theta$ be a positive number. Let $Y=\theta \ X$. The support of $Y$ is then the interval $(0, \theta)$. As a constant multiple of the standard beta $X$, the random variable $Y$ would have similar shapes and properties. The following are the density function and CDF.

\displaystyle \begin{aligned} f_Y(y)&=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ \biggl(\frac{y}{\theta}\biggr)^{a-1} \ \biggl[1-\biggl(\frac{y}{\theta}\biggr)\biggr]^{b-1} \ \frac{1}{\theta} \ \ \ \ \ \ \ \ 0

$\displaystyle F_Y(y)=\int_0^{\frac{y}{\theta}} \frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ t^{a-1} \ (1-t)^{b-1} \ dt \ \ \ \ \ \ \ \ \ 0

$\displaystyle F_Y(y)=\frac{B(a,b,\frac{y}{\theta})}{B(a,b)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (8a)$

Note that $(8a)$ expresses the CDF of $Y$ using the incomplete beta function defined in $(6)$. All the distributional quantities of $X$ discussed above can be obtained for $Y$ by applying the appropriate multiplier. For example, $E(Y^k)=\theta^k E(X)$. As a result, a similar table can be obtained for the transformed beta distribution.

$\displaystyle \begin{array}{lllll} \text{ } &\text{ } & \text{Definition} & \text{ } & \text{Transformed Beta } Y=\theta X \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Mean} &\text{ } & \text{ } & \text{ } & \displaystyle \theta \ \frac{a}{a+b} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Variance} &\text{ } & E(Y^2)-E(Y)^2 & \text{ } & \displaystyle \theta^2 \ \frac{ab}{(a+b)^2 \ (a+b+1)} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Coefficient of Variation} &\text{ } & \displaystyle \frac{\sqrt{Var(Y)}}{E(Y)}=\frac{\sigma_Y}{\mu_Y} & \text{ } & \displaystyle \sqrt{\frac{b}{a(a+b+1)}} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Skewness} &\text{ } & \displaystyle E\biggl[\biggl(\frac{Y-\mu_Y}{\sigma_Y}\biggr)^3\biggr] & \text{ } & \displaystyle \frac{2(b-a) \sqrt{a+b+1}}{(a+b+2) \sqrt{ab}} \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Kurtosis} &\text{ } & \displaystyle E\biggl[\biggl(\frac{Y-\mu_Y}{\sigma_Y}\biggr)^4\biggr] & \text{ } & \displaystyle \frac{6[(a-b)^2 (a+b+1)-ab(a+b+2)]}{ab(a+b+2)(a+b+3)} +3 \\ \text{ } & \text{ } & \text{ } & \text{ } & \text{ } \\ \text{Excess Kurtosis} &\text{ } & \displaystyle E\biggl[\biggl(\frac{Y-\mu_Y}{\sigma_Y}\biggr)^4\biggr]-3 & \text{ } & \displaystyle \frac{6[(a-b)^2 (a+b+1)-ab(a+b+2)]}{ab(a+b+2)(a+b+3)} \end{array}$

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$

# Introducing the beta function

The gamma distribution is mathematically defined from the gamma function. This post gives a brief introduction to the beta function. The goal is to establish one property that is the basis for defining the beta distribution.

_______________________________________________________________________________________________

The Beta Function

For any positive constants $a$ and $b$, the beta function is defined to be the following integral:

$\displaystyle B(a,b)=\int_0^1 t^{a-1} \ (1-t)^{b-1} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (0)$

The beta function can be evaluated directly if the parameters $a$ and $b$ are not too large. For example, $B(3,2)$ is the integral $\displaystyle \int_0^1 t^2 (1-t) \ dt$, which is $1/12$. Evaluating $(0)$ in a case by case basis does not shed light on the beta function. Direct calculation can also be cumbersome (e.g. for large parameters that are integers) or challenging (e.g. for parameters $a$ and $b$ that are fractional). It turns out that the evaluation of the beta function $B(a,b)$ is based on the gamma function.

_______________________________________________________________________________________________

Connection to the Gamma Function

The remainder of the post is to establish the following value of the beta function:

$\displaystyle B(a,b)=\int_0^1 t^{a-1} \ (1-t)^{b-1} \ dt=\frac{\Gamma(a) \ \Gamma(b)}{\Gamma(a+b)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

To start the proof of $(1)$, let $X$ and $Y$ be two independent random variables such that $X$ follows a gamma distribution with shape parameter $a$ and rate parameter $\beta$ and that $Y$ follows a gamma distribution with shape parameter $b$ and rate parameter $\beta$. It does not matter what $\beta$ is, as long as it is the rate parameter for both $X$ and $Y$. Then the sum $S=X+Y$ has a gamma distribution with shape parameter $a+b$ and rate parameter $\beta$. The following is the density function for $S=X+Y$.

$\displaystyle f_S(s)=\frac{1}{\Gamma(a+b)} \ \beta^{a+b} \ s^{a+b-1} \ e^{-\beta s} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

The density function of $S=X+Y$ can also be derived from the convolution formula using the density functions of $X$ and $Y$ as follows:

\displaystyle \begin{aligned} f_S(s)&=\int_0^s f_Y(s-x) \ f_X(x) \ dx \ \ \ \ \ \text{(convolution)} \\&=\int_0^s \frac{1}{\Gamma(b)} \ \beta^{b} \ (s-x)^{b-1} \ e^{-\beta (s-x)} \ \frac{1}{\Gamma(a)} \ \beta^{a} \ x^{a-1} \ e^{-\beta x} \ dx \\&=\frac{\beta^{a+b}}{\Gamma(a) \ \Gamma(b)} \ e^{-\beta s} \ \int_0^s x^{a-1} \ (s-x)^{b-1} \ dx \\&=\frac{\beta^{a+b}}{\Gamma(a) \ \Gamma(b)} \ e^{-\beta s} \ s^{a+b-1} \ \int_0^1 t^{a-1} \ (1-t)^{b-1} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3) \end{aligned}

See here for more information on how to use the convolution formula. The last step in $(3)$ is obtained by a change of variable in the integral from the step immediately above it by letting $x=st$. The last step in $(3)$ must equal to $(2)$. Setting the two equal would produce the equality in $(1)$.

Note that if the function $t^{a-1} \ (1-t)^{b-1}$ is normalized by the value $B(a,b)$, it would be a density function, which is the beta distribution. The following is the density function of the beta distribution.

$\displaystyle f(x)=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ x^{a-1} \ (1-x)^{b-1}; \ \ \ \ \ \ \ \ 0

The beta distribution is further examined in the next post.

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$

# The gamma distribution from the point of view of a Poisson process

In the previous post, the gamma distribution is defined from the gamma function. This post shows that the gamma distribution can arise from a Poisson process.

_______________________________________________________________________________________________

The Poisson Process

Consider an experiment in which events that are of interest occur at random in a time interval. The goal here is to derive two families of random variables, one continuous and one discrete. Starting at time 0, record the time of the occurrence of the first event. Then record the time at which the second random event occurs and so on (these are the continuous random variables). Out of these measurements, we can derive discrete random variables by counting the number of random events in a fixed time interval.

The recording of times of the occurrences of the random events is like placing markings on a time line to denote the arrivals of the random events. We are interested in counting the number of markings in a fixed interval. We are also interested in measuring the length from the starting point to the first marking and to the second marking and so on. Because of this interpretation, the random process discussed here can also describe random events occurring along a spatial interval, i.e. intervals in terms of distance or volume or other spatial measurements.

A Poisson process is a random process described above in which several criteria are satisfied. We show that in a Poisson process, the number of occurrences of random events in a fixed time interval follows a Poisson distribution and the time until the $n$th random event follows a Gamma distribution.

A good example of a Poisson process is the well known experiment in radioactivity conducted by Rutherford and Geiger in 1910. In this experiment, $\alpha$-particles were emitted from a polonium source and the number of $\alpha$-particles were counted during an interval of 7.5 seconds (2,608 many such time intervals were observed). In these 2,608 intervals, a total of 10,097 particles were observed. Thus the mean count per period of 7.5 seconds is 10097 / 2608 = 3.87.

In the Rutherford and Geiger experiment in 1910, a random event is the observation of an $\alpha$-particle. The random events occur at an average of 3.87 per unit time interval (7.5 seconds).

One of the criteria in a Poisson process is that in a very short time interval, the chance of having more than one random event is essentially zero. So either one random event will occur or none will occur in a very short time interval. Considering the occurrence of a random event as a success, there is either a success or a failure in a very short time interval. So a very short time interval in a Poisson process can be regarded as a Bernoulli trial.

The second criterion is that the experiment remains constant over time. Specifically this means that the probability of a random event occurring in a given subinterval is proportional to the length of that subinterval and not on where the subinterval is in the original interval. Any counting process that satisfies this criterion is said to possess stationary increments. For example, in the 1910 radioactivity study, $\alpha$-particles were emitted at the rate of $\lambda=$ 3.87 per 7.5 seconds. So the probability of one $\alpha$-particle emitted from the radioactive source in a one-second interval is 3.87/7.5 = 0.516. Then the probability of observing one $\alpha$-particle in a half-second interval is 0.516/2 = 0.258. For a quarter-second interval, the probability is 0.258/2 = 0.129. So if we observe half as long, it will be half as likely to observe the occurrence of a random event. On the other hand, it does not matter when the quarter-second subinterval is, whether at the beginning or toward the end of the original interval of 7.5 seconds.

The third criterion is that non-overlapping subintervals are mutually independent in the sense that what happens in one subinterval (i.e. the occurrence or non-occurrence of a random event) will have no influence on the occurrence of a random event in another subinterval. Any counting process that satisfies this criterion is said to possess independent increments. In the Rutherford and Geiger experiment, the observation of one particle in one half-second period does not imply that a particle will necessarily be observed in the next half-second.

To summarize, the following are the three criteria of a Poisson process:

Suppose that on average $\lambda$ random events occur in a time interval of length 1.

1. The probability of having more than one random event occurring in a very short time interval is essentially zero.
2. For a very short subinterval of length $\frac{1}{n}$ where $n$ is a sufficiently large integer, the probability of a random event occurring in this subinterval is $\frac{\lambda}{n}$.
3. The numbers of random events occurring in non-overlapping time intervals are independent.

_______________________________________________________________________________________________

The Poisson Distribution

We are now ready to derive the Poisson distribution from a Poisson random process.

Consider random events generated in a Poisson process and let $Y$ be the number of random events observed in a unit time interval. Break up the unit time interval into $n$ non-overlapping subintervals of equal size where $n$ is a large integer. Each subinterval can have one or no random event. The probability of one random event in a subinterval is $\lambda/n$. The subintervals are independent. In other words, the three criteria of a Poisson process described above ensure that the $n$ subintervals are independent Bernoulli trials. As a result, the number of events occurring in these $n$ subintervals is a binomial distribution with $n$ trials and probability of success $\lambda/n$. This binomial distribution is an approximation of the random variable $Y$. The binomial distribution can get more and more granular. The resulting limit is a Poisson distribution, which coincides with the distribution for $Y$. The fact that the Poisson distribution is the limiting case of the binomial distribution is discussed here and here.

It follows that $Y$ follows the Poisson distribution with mean $\lambda$. The following is the probability function.

$\displaystyle P(Y=y)=\frac{e^{-\lambda} \ \lambda^y}{y!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ y=0,1,2,\cdots$

In the 1910 radioactivity study, the number of $\alpha$-particles observed in a 7.5-second period has a Poisson distribution with the mean of $\lambda=3.87$ particles per 7.5 seconds.

Sometimes it may be necessary to count the random events not in a unit time interval but in a smaller or larger time interval of length $t$. In a sense, the new unit time is $t$ and the new average rate of the Poisson process is then $\lambda t$. Then the idea of taking granular binomial distributions will lead to a Poisson distribution. Let $Y_t$ be the number of occurrences of the random events in a time interval of length $t$. Then $Y_t$ follows a Poisson distribution with mean $\lambda t$. The following is the probability function.

$\displaystyle P(Y_t=x)=\frac{e^{-\lambda t} \ (\lambda t)^y}{y!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ y=0,1,2,\cdots$

In the 1910 radioactivity study, the number of $\alpha$-particles observed in a 3.75-second period has a Poisson distribution with the mean of $\lambda=3.87/2=1.935$ particles per 3.75 seconds.

_______________________________________________________________________________________________

The Gamma Distribution as Derived from a Poisson Process

With the Poisson process and Poisson distribution properly set up and defined, we can now derive the gamma distribution. As before, we work with a Poisson process in which the random events arrive at an average rate of $\lambda$ per unit time. Let $W_1$ be the waiting time until the occurrence of the first random event, $W_2$ be the waiting time until the occurrence of the second random event and so on. First examine the random variable $W_1$.

Consider the probability $P(W_1>t)$. The event $W_1>t$ means that the first random event takes place after time $t$. This means that there must be no occurrence of the random event in question from time 0 to time $t$. It follows that

$P(W_1>t)=P(Y_t=0)=e^{-\lambda t}$

As a result, $P(W_1 \le t)=1-e^{-\lambda t}$, which is the cumulative distribution of $W_1$. Taking the derivative, the probability density function of $W_1$ is $f_{W_1}(t)=\lambda e^{-\lambda t}$. This is the density function of the exponential distribution with mean $\frac{1}{\lambda}$. Recall that $\lambda$ is the rate of the Poisson process, i.e. the random events arrive at the mean rate of $\lambda$ per unit time. Then the mean time between two consecutive events is $\frac{1}{\lambda}$.

Consider the probability $P(W_2>t)$. The event $W_2>t$ means that the second random event takes place after time $t$. This means that there can be at most one occurrence of the random events in question from time 0 to time $t$. It follows that

$P(W_2>t)=P(Y_t=0)+P(Y_t=1)=e^{-\lambda t}+\lambda t \ e^{-\lambda t}$

As a result, $P(W_2 \le t)=1-e^{-\lambda t}-\lambda t \ e^{-\lambda t}$, which is the cdf of the waiting time $W_2$. Taking the derivative, the probability density function of $W_2$ is $f_{W_2}(t)=\lambda^2 \ t \ e^{-\lambda t}$. This is the density function of the gamma distribution with shape parameter 2 and rate parameter $\lambda$

By the same reasoning, the waiting time until the $n^{th}$ random event, $W_n$, follows a gamma distribution with shape parameter $n$ and rate parameter $\lambda$. The survival function, cdf and the density function are:

$\displaystyle P(W_n>t)=\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}$

$\displaystyle P(W_n \le t)=1-\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}=\sum \limits_{k=n}^{\infty} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}$

$\displaystyle f_{W_n}(t)=\frac{1}{(n-1)!} \ \lambda^n \ t^{n-1} \ e^{-\lambda t}$

The survival function $P(W_n>t)$ is identical to $P(Y_t \le n-1)$. The equivalence is through the translation: the event $W_n>t$ is equivalent to the event that there can be at most $n-1$ random events occurring from time 0 to time $t$.

Example 1
Let’s have a quick example of calculation for the gamma distribution. In the study by Rutherford and Geiger in 1910, the average rate of arrivals of $\alpha$-particles is 3.87 per 7.5-second period, giving the average rate of 0.516 particles per second. On average, it takes $0.516^{-1}$ = 1.94 seconds to wait for the next particle. The probability that it takes more than 3 seconds of waiting time for the first particle to arrive is $e^{-0.516 (3)}$ = 0.213.

How long would it take to wait for the second particle? On average it would take $2 \cdot 0.516^{-1}$ = 3.88 seconds. The probability that it takes more than 5 seconds of waiting time for the second particle to arrive is

$e^{-0.516 (5)}+0.516 \cdot 5 e^{-0.516 (5)}=3.58 \cdot e^{-2.58}$ = 0.271

_______________________________________________________________________________________________

Remarks

The above discussion shows that the gamma distribution arises naturally from a Poisson process, a random experiment that satisfies three assumptions that deal with independence and uniformity in time. The gamma distribution derived from a Poisson process has two parameters $n$ and $\lambda$ where $n$ is a positive integer and is the shape parameter and $\lambda$ is the rate parameter. If the random variable $W$ follows this distribution, its pdf is:

$\displaystyle f_W(w)=\frac{1}{(n-1)!} \ \lambda^n \ w^{n-1} \ e^{-\lambda w} \ \ \ \ \ \ \ \ \ \ \ \ w>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

The above pdf can be interpreted as the density function for the waiting time until the arrival of the $n$th random event in a Poisson process with an average rate of arrivals at $\lambda$ per unit time. The density function may be derived from an actual Poisson process or it may be just describing some random quantity that has nothing to do with any Poisson process. But the Poisson process interpretation is still useful. One advantage of the Poisson interpretation is that the survival function and the cdf would have an expression in closed form.

$\displaystyle P(W>w)=\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda w} \ (\lambda w)^k}{k!} \ \ \ \ \ \ \ \ \ \ \ \ w>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

\displaystyle \begin{aligned} P(W \le w)&=1-\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda w} \ (\lambda w)^k}{k!} \\&=\sum \limits_{k=n}^{\infty} \frac{e^{-\lambda w} \ (\lambda w)^k}{k!} \ \ \ \ \ \ \ \ \ \ \ \ w>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3) \end{aligned}

In the Poisson process interpretation, $P(W>w)$ is the probability that the $n$th random event occurs after time $w$. This means that in the interval $(0, w)$, there are at most $n-1$ random events. Thus the gamma survival function is identical to the cdf of a Poisson distribution. Even when $W$ is simply a model of some random quantity that has nothing to do with a Poisson process, such interpretation can still be used to derive the survival function and the cdf of such a gamma distribution.

The gamma distribution described in the density function $(1)$ has a shape parameter that is a positive integer. This special case of the gamma distribution sometimes go by the name Erlang distribution and is important in queuing theory.

In general the shape parameter does not have to be integers; it can be any positive real number. For the more general gamma distribution, see the previous post.

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$

# Introducing the gamma distribution

The gamma distribution is a probability distribution that is useful in actuarial modeling. Due to its mathematical properties, there is considerable flexibility in the modeling process. For example, since it has two parameters (a scale parameter and a shape parameter), the gamma distribution is capable of representing a variety of distribution shapes and dispersion patterns. This post gives an account of how the distribution arises mathematically and discusses some of its mathematically properties. The next post discusses how the gamma distribution can arise naturally as the waiting time between two events in a Poisson process.

_______________________________________________________________________________________________

The Gamma Function

From a mathematically point of view, in defining the gamma distribution, the place to start is the gamma function. For any real number $\alpha>0$, define:

$\displaystyle \Gamma(\alpha)=\int_0^\infty t^{\alpha-1} \ e^{-t} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (0)$

The above improper integral converges for every positive $\alpha$. The proof that the improper integral converges and other basic facts can be found here.

When the integral in $(0)$ has “incomplete” limits, the resulting functions are called incomplete gamma functions. The following are called the upper incomplete gamma function and lower incomplete gamma function, respectively.

$\displaystyle \Gamma(\alpha, x)=\int_x^\infty t^{\alpha-1} \ e^{-t} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

$\displaystyle \gamma(\alpha, x)=\int_0^x t^{\alpha-1} \ e^{-t} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

_______________________________________________________________________________________________

The Gamma Probability Density Function

Notice that the integrand in $(0)$ is a positive value for every $t>0$. Thus the integrand $t^{\alpha-1} \ e^{-t}$ is a density function if it is normalized by $\Gamma(\alpha)$.

$\displaystyle f_T(t)=\frac{1}{\Gamma(\alpha)} \ t^{\alpha-1} \ e^{-t} \ \ \ \ \ \ \ t>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)$

The function in $(3)$ is a probability density function since the integral is one when it is integrated over the interval $(0,\infty)$. For convenience, let $T$ be the random variable having this density function. The density function in $(3)$ only has one parameter, which is $\alpha$ (the shape parameter). To add the second parameter, transform the random variable $T$ by multiplying a constant. This can be done in two ways. The following are the probability density functions for the random variables $X=\theta T$ and $Y=\frac{1}{\beta} \ T$, respectively.

$\displaystyle f_X(x)=\frac{1}{\Gamma(\alpha)} \ \biggl(\frac{1}{\theta}\biggr)^\alpha \ x^{\alpha-1} \ e^{-\frac{x}{\theta}} \ \ \ \ \ \ \ x>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4a)$

$\displaystyle f_Y(y)=\frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ y^{\alpha-1} \ e^{-\beta y} \ \ \ \ \ \ \ \ \ \ \ \ y>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4b)$

A random variable $X$ is said to follow the gamma distribution with shape parameter $\alpha$ and scale parameter $\theta$ if $(4a)$ is its probability density function (pdf). A random variable $Y$ is said to follow the gamma distribution with shape parameter $\alpha$ and rate parameter $\beta$ if $(4b)$ is its pdf.

The parameter $\theta$ is the scale parameter since it is the case that the larger the value, the more spread out the distribution. The parameter $\beta$ is the rate parameter in the family of gamma distribution. The rate parameter is defined as the reciprocal of the scale parameter.

The following figure (Figure 1) demonstrates the role of the shape parameter $\alpha$. With the scale parameter $\theta$ kept at 2, the gamma distribution becomes less skewed as $\alpha$ increases.

Figure 1

The following figure (Figure 2) demonstrates the role of the scale parameter $\theta$. With the shape parameter $\alpha$ kept at 2, all the gamma distributions have the same skewness. However, the gamma distributions become more spread out as $\theta$ increases.

Figure 2

There are sevreral important subclasses of the gamma distribution. When the shape parameter $\alpha=1$, the gamma distribution becomes the exponential distribution with mean $\theta$ or $\frac{1}{\beta}$ depending on the parametrization. When the shape parameter $\alpha$ is any positive integer, the resulting subclass of gamma distribution is called the Erlang distribution. A Chi-square distribution is a gamma distribution with shape parameter $\alpha=\frac{k}{2}$ and scale parameter $\theta=2$ where $k$ is a positive integer (the degrees of freedom). The gamma density curves in Figure 1 are chi-square distributions. Their degrees of freedom are 2, 4, 6, 10 and 20. Chi-square distribution with 2 degrees of freedom would be an exponential distribution.

Between the two parametrizations presented here, the version with the scale parameter is the more appropriate model in the settings where a parameter is needed for describing the magnitude of the mean and the spread. The parametrization with $\alpha$ and $\beta$ is sometimes easier to work with. For example, it is more common in Bayesian analysis where the gamma distribution can be used as a conjugate prior distribution for a parameter that is a rate (e.g. the rate parameter of a Poisson distribution).

_______________________________________________________________________________________________

Some Distributional Quantities

The gamma distribution is a two-parameter family of distributions. Here’s some of the basic distributional quantities that are of interest.

_________________________________________
$X$ is a random variable with the gamma distribution with shape parameter $\alpha$ and scale parameter $\theta$.

$Y$ is a random variable with the gamma distribution with shape parameter $\alpha$ and rate parameter $\beta$.

_________________________________________
Probability Density Function (PDF)

$\displaystyle \begin{array}{ll} \displaystyle f_X(x)=\frac{1}{\Gamma(\alpha)} \ \biggl(\frac{1}{\theta}\biggr)^\alpha \ x^{\alpha-1} \ e^{-\frac{x}{\theta}} & \ \ \ \ \ \ \ \ \ \ x>0 \\ \text{ } & \text{ } \\ \displaystyle f_Y(y)=\frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ y^{\alpha-1} \ e^{-\beta y} & \ \ \ \ \ \ \ \ \ \ \displaystyle y>0 \end{array}$

_________________________________________
Cumulative Distribution Function (CDF)

\displaystyle \begin{aligned} F_X(x)&=\int_0^x \ \frac{1}{\Gamma(\alpha)} \ \biggl(\frac{1}{\theta}\biggr)^\alpha \ t^{\alpha-1} \ e^{-\frac{t}{\theta}} \ dt \\&\text{ } \\&=\frac{\gamma(\alpha, \frac{x}{\theta})}{\Gamma(\alpha)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (\text{Alternative formulation}) \end{aligned}

\displaystyle \begin{aligned} F_Y(y)&=\int_0^y \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ t^{\alpha-1} \ e^{-\beta t} \ dt \\&\text{ } \\&=\frac{\gamma(\alpha, \beta y)}{\Gamma(\alpha)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (\text{Alternative formulation}) \end{aligned}

_________________________________________
Mean and Variance

$\displaystyle \begin{array}{ll} \displaystyle E(X)=\alpha \ \theta & \ \ \ \ \ \ \ \ \ \ Var(X)=\alpha \ \theta^2 \\ \text{ } & \text{ } \\ \displaystyle E(Y)=\frac{\alpha}{\beta} & \ \ \ \ \ \ \ \ \ \ \displaystyle Var(Y)=\frac{\alpha}{\beta^2} \end{array}$

_________________________________________
Higher Moments

$\displaystyle E(X^k) = \left\{ \begin{array}{ll} \displaystyle \frac{\theta^k \ \Gamma(\alpha+k)}{\Gamma(\alpha)} &\ \ \ \ \ \ k>-\alpha \\ \text{ } & \text{ } \\ \theta^k \ \alpha (\alpha+1) \cdots (\alpha+k-1) &\ \ \ \ \ \ k \text{ is a positive integer} \end{array} \right.$

$\displaystyle E(Y^k) = \left\{ \begin{array}{ll} \displaystyle \frac{\Gamma(\alpha+k)}{\beta^k \ \Gamma(\alpha)} &\ \ \ \ \ \ k>-\alpha \\ \text{ } & \text{ } \\ \displaystyle \frac{\alpha (\alpha+1) \cdots (\alpha+k-1)}{\beta^k} &\ \ \ \ \ \ k \text{ is a positive integer} \end{array} \right.$

_________________________________________
Moment Generating Function

$\displaystyle \begin{array}{ll} \displaystyle M_X(t)=\biggl(\frac{1}{1-\theta t} \biggr)^\alpha & \ \ \ \ \ \displaystyle t<\frac{1}{\theta} \\ \text{ } & \text{ } \\ \displaystyle M_Y(t)=\biggl(\frac{\beta}{\beta- t} \biggr)^\alpha & \ \ \ \ \ \displaystyle t<\beta \end{array}$

_________________________________________
Mode

$\displaystyle \begin{array}{ll} \displaystyle \theta \ (\alpha-1) & \ \ \ \ \ \alpha>1 \ \text{else } 0 \\ \text{ } & \text{ } \\ \displaystyle \frac{\alpha-1}{\beta} & \ \ \ \ \ \alpha>1 \ \text{else } 0 \end{array}$

_________________________________________
Coefficient of Variation

$\displaystyle CV = \frac{1}{\sqrt{\alpha}}$

_________________________________________
Coefficient of Skewness

$\displaystyle \gamma = \frac{2}{\sqrt{\alpha}}$

_________________________________________
Kurtosis

$\displaystyle \begin{array}{ll} \displaystyle 3+\frac{6}{\alpha} & \ \ \ \ \ \text{Kurtosis} \\ \text{ } & \text{ } \\ \displaystyle \frac{6}{\alpha} & \ \ \ \ \ \text{Excess Kurtosis} \end{array}$

There is no simple closed form for the cumulative distribution function, except for the case of $\alpha=1$ (i.e. the exponential distribution) and the case of $\alpha$ being a positive integer (see next post). As a result, the distributional quantities that required solving for $x$ in the CDF have no closed form, e.g. median and other percentiles. As stated above, the CDF can be expressed using the incomplete gamma function, which can be estimated numerically. For the distributional quantities with no closed form, either use numerical estimation or use software.

The calculation for some of the distributional quantities is quite straightforward. For example, to calculate any higher moment $E(X^k)$, simply adjust the integrand to be an appropriate gamma density function. Then the result will be what can be moved outside the integral, as shown in the following.

\displaystyle \begin{aligned}E(X^k)&=\int_0^\infty x^k \cdot \frac{1}{\Gamma(\alpha)} \ \biggl(\frac{1}{\theta}\biggr)^\alpha \ x^{\alpha-1} \ e^{-\frac{x}{\theta}} \ dx=\int_0^\infty \frac{1}{\Gamma(\alpha)} \ \biggl(\frac{1}{\theta}\biggr)^\alpha \ x^{\alpha+k-1} \ e^{-\frac{x}{\theta}} \ dx \\&=\frac{\theta^k \Gamma(\alpha+k)}{\Gamma(\alpha)} \int_0^\infty \frac{1}{\Gamma(\alpha+k)} \ \biggl(\frac{1}{\theta}\biggr)^{\alpha+k} \ x^{\alpha+k-1} \ e^{-\frac{x}{\theta}} \ dx \\&=\frac{\theta^k \Gamma(\alpha+k)}{\Gamma(\alpha)} \\&=\theta^k \ \alpha (\alpha+1) \cdots (\alpha+k-1) \end{aligned}

Once the higher moments are known, some of the other calculations follow. For example, the coefficient of variation is defined by the ratio of the standard deviation to the mean. This ratio is the standardized measure of dispersion of a probability distribution. The coefficient of skewness is the ratio of the third central moment to the third power of the standard deviation, i.e. $E[(X-\mu)^3]/\sigma^3$. see here for a discussion on skewness. The kurtosis is the ratio of the fourth central moment to the fourth power of the standard deviation, i.e. $E[(X-\mu)^4]/\sigma^4$. The excess kurtosis is obtained by subtracting 3 from the kurtosis.

_______________________________________________________________________________________________

Discussion

The product of two gamma moment generating functions with the same scale parameter $\theta$ (or rate parameter $\beta$) is also an MGF for a gamma distribution. This points to the fact that the independent sum of two gamma distribution (with the same scale parameter or rate parameter) is a gamma distribution. Specifically if $X_1$ follows a gamma distribution with the shape parameter $\alpha_1$ and $X_2$ follows a gamma distribution with shape parameter $\alpha_2$ and that they are independent, then the sum $X_1+X_2$ has a gamma distribution with shape parameter $\alpha_1+\alpha_2$.

We now revisit Figure 1 and Figure 2. The skewness of a gamma distribution is driven only by the shape parameter $\alpha$. The gamma skewness is $\frac{2}{\sqrt{\alpha}}$. The higher the $\alpha$, the less skewed the gamma distribution is (or the more symmetric it looks). This is borne out by Figure 1. There is another angle via the central limit theorem that is borne out by Figure 1. The gamma densities with larger value of $\alpha$ can also be thought of as the independent sum of many gamma distributions with smaller $\alpha$ values. For example, the gamma with $\alpha=10$ and $\theta=2$ can be regarded as the independent sum of 10 exponential distributions each with mean 2. By the central limit theorem, any gamma distribution with large value of $\alpha$ will tend to look symmetric.

In Figure 2, all gamma densities have the same $\alpha=2$. Thus they all have the same skewness (about 0.707). It is clear that as the scale parameter $\theta$ increases, the densities become more spread out while remaining skewed density curves.

The support of the gamma distribution is the interval $(0,\infty)$. Thus it is plausible model for random quantities that take on positive values, e.g. insurance losses or insurance claim amounts. With the gamma density curves being positively skewed (skewed to the right), the gamma distribution is a good candidate for random quantities that are concentrated more on the lower end of the interval $(0,\infty)$.

Though the gamma distribution is positively skewed, it is considered to have a light (right) tail. The notion of having a light tail or heavy tail is a relative concept. The gamma distribution has a light right tail as compared to the Pareto distribution. The Pareto distribution significantly puts more probability on larger values (the gamma distribution with same mean and variance will put significantly less probabilities on the larger values). In terms of modeling insurance losses, the gamma distribution will be a more suitable model for losses that are not catastrophic in nature.

One tell tale sign of a distribution with a light tail is that all positive moments exist. For the gamma distribution, $E(X^k)$ exists for all positive integer $k$. In fact, the gamma distribution has a property stronger than the mere fact that all moments exist, i.e. it has a moment generating function. So even though the Gamma distribution has a right tail that is infinitely long (it extents out to infinity), the amount of probabilities is almost negligible after some limit (as compared to the Pareto distribution for example). See here for a more detailed discussion on tail weights and the Pareto distribution.

The next post discusses how the gamma distribution can arise naturally as the waiting time between two events in a Poisson process.

_______________________________________________________________________________________________

Evaluating the Gamma Function

In many calculations discussed in this blog, it is necessary at times to evaluate the gamma function. Of course if the argument is a positive integer, the gamma function is simply the factorial function. Some special values of the gamma function are:

$\Gamma(\frac{1}{2})=\sqrt{\pi}$

$\Gamma(\frac{3}{2})=\frac{1}{2} \sqrt{\pi}$

$\Gamma(\frac{5}{2})=\frac{3}{4} \sqrt{\pi}$

$\Gamma(\frac{7}{2})=\frac{15}{8} \sqrt{\pi}$

If special values are not known, it is possible to use software to evaluate the gamma function. We demonstrate how it is done using Excel. There is no dedicated function in Excel for evaluating the gamma function. However, Excel has a function for the PDF of a gamma distribution. To evaluate $\Gamma(\alpha)$, consider the density function in $(4b)$ with parameters $\alpha$, $\beta=1$ and $y=1$. We have the following:

$\displaystyle f_Y(1)=\frac{1}{\Gamma(\alpha)} \ 1^\alpha \ 1^{\alpha-1} \ e^{- 1}=\frac{1}{\Gamma(\alpha)}$

$\displaystyle \Gamma(\alpha)=\frac{e^{-1}}{f_Y(1)}$

The key to evaluate $\Gamma(\alpha)$ in Excel is to evaluate the gamma PDF with $\alpha$ and $\beta=1$ at the x-value of 1. The following shows the formula for evaluating the PDF.

=GAMMADIST(1, $\alpha$, 1, FALSE)

Then the gamma function can be evaluated by evaluating the following formula in Excel:

=EXP(-1) / GAMMADIST(1, $\alpha$, 1, FALSE)

For example, $\Gamma(3.6)=3.717023853$ which is obtained by the following formula in Excel:

=EXP(-1) / GAMMADIST(1, 3.6, 1, FALSE)

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$

# Introducing the gamma function

The gamma distribution is a probability distribution that is useful in actuarial modeling. From a mathematical point of view, the gamma function is the starting point of defining the gamma distribution. This post discusses the basic facts that are needed for defining the gamma distribution. Here’s the definition of the gamma function.

For any real number $\alpha>0$, define:

$\displaystyle \Gamma(\alpha)=\int_0^\infty t^{\alpha-1} \ e^{-t} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (0)$

The following is the graph of the gamma function. It has a U shape. As $x \rightarrow 0$, the graph goes up to infinity. As $x \rightarrow \infty$, the graph increases without bound. As is seen below, the gamma function coincides with the factorial function at the integers.

_______________________________________________________________________________________________

The Convergence

The gamma function is defined by the improper integral as described in $(0)$. The improper integral converges. In fact, the integral converges for all complex numbers $\alpha$ with positive real part. For our purposes at hand, we restrict $\alpha$ to be positive real numbers. Showing that the integral in $(0)$ converges is important. For example, it will be nice to know that the density function of the gamma distribution sums to 1.0. To show that the integral in $(0)$ converges, we first make the observation: for some positive real number $M$ and for all $t>M$,

$\displaystyle t^{\alpha-1}

This is saying that the quantity $\displaystyle e^{\frac{t}{2}}$ dominates the quantity $\displaystyle t^{\alpha-1}$ when $t$ is sufficiently large. This is because the exponential function $\displaystyle e^{\frac{t}{2}}$ increases at a much faster rate than the polynomial function $\displaystyle t^{\alpha-1}$. Then multiply both sides of $(1)$ by $\displaystyle e^{-t}$ to obtain the following:

$\displaystyle t^{\alpha-1} e^{-t}

Now, break up the interval in $(0)$ into two pieces:

$\displaystyle \Gamma(\alpha)=\int_0^M t^{\alpha-1} \ e^{-t} \ dt +\int_M^\infty t^{\alpha-1} \ e^{-t} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)$

A basic fact. For any continuous function $h(t)$ defined over a closed interval $[a,b]$ of finite length, the integral $\displaystyle \int_a^b h(t) \ dt$ exists and has a finite value. As a result, the first integral in $(3)$ exists. So we just focus on the second integral.

$\displaystyle \int_M^\infty t^{\alpha-1} \ e^{-t} \ dt < \int_M^\infty e^{-\frac{t}{2}} \ dt=2 e^{-\frac{M}{2}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)$

Note that in $(4)$, the inequality in $(2)$ is used. Due to $(4)$, the second integral in $(3)$ is finite. With both integrals in $(3)$ finite, it follows that the improper integral in the definition of $\Gamma(\alpha)$ is finite.

_______________________________________________________________________________________________

Some Properties

The gamma function is a well known function in mathematics and has wide and deep implications. Here, we simply state several basic facts that are needed. In the discussion here, One useful fact is that the gamma function is the factorial function shifted down by one when the argument $\alpha$ is a positive integer. Thus the gamma function generalizes the factorial function. Another useful fact is that the gamma function satisfies a recursive relation.

• If $n$ is a positive integer, $\Gamma(n)=(n-1)!$.
• For all $\alpha>0$, $\Gamma(\alpha+1)=\alpha \Gamma(\alpha)$.

The recursive relation $\Gamma(\alpha+1)=\alpha \Gamma(\alpha)$ is derived by using integration by parts. It is clear from definition that $\Gamma(1)=1$. By an induction argument, $\Gamma(n)=(n-1)!$ for all integers greater than 1.

As result, $\Gamma(1)=0!=1$, $\Gamma(2)=1!=1$ and $\Gamma(3)=2!=2$ and so on. Given that $\Gamma(\frac{1}{2})=\sqrt{\pi}$, the recursive relation tells us that $\Gamma(\frac{3}{2})=\frac{1}{2} \sqrt{\pi}$ and $\Gamma(\frac{5}{2})=\frac{3}{4} \sqrt{\pi}$ and so on. The above recursive relation can be further extended:

• For any positive integer $k$, $\Gamma(\alpha+k)=\alpha (\alpha+1) \cdots (\alpha + k-1) \ \Gamma(\alpha)$.

When the integral in $(0)$ has “incomplete” limits, the resulting functions are called incomplete gamma functions. The following are called the upper incomplete gamma function and lower incomplete gamma function, respectively.

$\displaystyle \Gamma(\alpha, x)=\int_x^\infty t^{\alpha-1} \ e^{-t} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

$\displaystyle \gamma(\alpha, x)=\int_0^x t^{\alpha-1} \ e^{-t} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

_______________________________________________________________________________________________
The next post shows how the gamma distribution arises naturally from the gamma function.

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$