Introducing the beta function

The gamma distribution is mathematically defined from the gamma function. This post gives a brief introduction to the beta function. The goal is to establish one property that is the basis for defining the beta distribution.

_______________________________________________________________________________________________

The Beta Function

For any positive constants a and b, the beta function is defined to be the following integral:

    \displaystyle B(a,b)=\int_0^1 t^{a-1} \ (1-t)^{b-1} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (0)

The beta function can be evaluated directly if the parameters a and b are not too large. For example, B(3,2) is the integral \displaystyle \int_0^1 t^2 (1-t) \ dt, which is 1/12. Evaluating (0) in a case by case basis does not shed light on the beta function. Direct calculation can also be cumbersome (e.g. for large parameters that are integers) or challenging (e.g. for parameters a and b that are fractional). It turns out that the evaluation of the beta function B(a,b) is based on the gamma function.

_______________________________________________________________________________________________

Connection to the Gamma Function

The remainder of the post is to establish the following value of the beta function:

    \displaystyle B(a,b)=\int_0^1 t^{a-1} \ (1-t)^{b-1} \ dt=\frac{\Gamma(a) \ \Gamma(b)}{\Gamma(a+b)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

To start the proof of (1), let X and Y be two independent random variables such that X follows a gamma distribution with shape parameter a and rate parameter \beta and that Y follows a gamma distribution with shape parameter b and rate parameter \beta. It does not matter what \beta is, as long as it is the rate parameter for both X and Y. Then the sum S=X+Y has a gamma distribution with shape parameter a+b and rate parameter \beta. The following is the density function for S=X+Y.

    \displaystyle f_S(s)=\frac{1}{\Gamma(a+b)} \ \beta^{a+b} \ s^{a+b-1} \ e^{-\beta s}  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

The density function of S=X+Y can also be derived from the convolution formula using the density functions of X and Y as follows:

    \displaystyle \begin{aligned} f_S(s)&=\int_0^s f_Y(s-x) \ f_X(x) \ dx \ \ \ \ \ \text{(convolution)} \\&=\int_0^s \frac{1}{\Gamma(b)} \ \beta^{b} \ (s-x)^{b-1} \ e^{-\beta (s-x)} \ \frac{1}{\Gamma(a)} \ \beta^{a} \ x^{a-1} \ e^{-\beta x} \ dx \\&=\frac{\beta^{a+b}}{\Gamma(a) \ \Gamma(b)} \ e^{-\beta s} \ \int_0^s x^{a-1} \ (s-x)^{b-1} \ dx \\&=\frac{\beta^{a+b}}{\Gamma(a) \ \Gamma(b)} \ e^{-\beta s} \ s^{a+b-1} \ \int_0^1 t^{a-1} \ (1-t)^{b-1} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)  \end{aligned}

See here for more information on how to use the convolution formula. The last step in (3) is obtained by a change of variable in the integral from the step immediately above it by letting x=st. The last step in (3) must equal to (2). Setting the two equal would produce the equality in (1).

Note that if the function t^{a-1} \ (1-t)^{b-1} is normalized by the value B(a,b), it would be a density function, which is the beta distribution. The following is the density function of the beta distribution.

    \displaystyle f(x)=\frac{\Gamma(a+b)}{\Gamma(a) \ \Gamma(b)} \ x^{a-1} \ (1-x)^{b-1}; \ \ \ \ \ \ \ \ 0<x<1  \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)

The beta distribution is further examined in the next post.

_______________________________________________________________________________________________
\copyright \ 2016 - \text{Dan Ma}

The gamma distribution from the point of view of a Poisson process

In the previous post, the gamma distribution is defined from the gamma function. This post shows that the gamma distribution can arise from a Poisson process.

_______________________________________________________________________________________________

The Poisson Process

Consider an experiment in which events that are of interest occur at random in a time interval. The goal here is to derive two families of random variables, one continuous and one discrete. Starting at time 0, record the time of the occurrence of the first event. Then record the time at which the second random event occurs and so on (these are the continuous random variables). Out of these measurements, we can derive discrete random variables by counting the number of random events in a fixed time interval.

The recording of times of the occurrences of the random events is like placing markings on a time line to denote the arrivals of the random events. We are interested in counting the number of markings in a fixed interval. We are also interested in measuring the length from the starting point to the first marking and to the second marking and so on. Because of this interpretation, the random process discussed here can also describe random events occurring along a spatial interval, i.e. intervals in terms of distance or volume or other spatial measurements.

A Poisson process is a random process described above in which several criteria are satisfied. We show that in a Poisson process, the number of occurrences of random events in a fixed time interval follows a Poisson distribution and the time until the nth random event follows a Gamma distribution.

A good example of a Poisson process is the well known experiment in radioactivity conducted by Rutherford and Geiger in 1910. In this experiment, \alpha-particles were emitted from a polonium source and the number of \alpha-particles were counted during an interval of 7.5 seconds (2,608 many such time intervals were observed). In these 2,608 intervals, a total of 10,097 particles were observed. Thus the mean count per period of 7.5 seconds is 10097 / 2608 = 3.87.

In the Rutherford and Geiger experiment in 1910, a random event is the observation of an \alpha-particle. The random events occur at an average of 3.87 per unit time interval (7.5 seconds).

One of the criteria in a Poisson process is that in a very short time interval, the chance of having more than one random event is essentially zero. So either one random event will occur or none will occur in a very short time interval. Considering the occurrence of a random event as a success, there is either a success or a failure in a very short time interval. So a very short time interval in a Poisson process can be regarded as a Bernoulli trial.

The second criterion is that the experiment remains constant over time. Specifically this means that the probability of a random event occurring in a given subinterval is proportional to the length of that subinterval and not on where the subinterval is in the original interval. Any counting process that satisfies this criterion is said to possess stationary increments. For example, in the 1910 radioactivity study, \alpha-particles were emitted at the rate of \lambda= 3.87 per 7.5 seconds. So the probability of one \alpha-particle emitted from the radioactive source in a one-second interval is 3.87/7.5 = 0.516. Then the probability of observing one \alpha-particle in a half-second interval is 0.516/2 = 0.258. For a quarter-second interval, the probability is 0.258/2 = 0.129. So if we observe half as long, it will be half as likely to observe the occurrence of a random event. On the other hand, it does not matter when the quarter-second subinterval is, whether at the beginning or toward the end of the original interval of 7.5 seconds.

The third criterion is that non-overlapping subintervals are mutually independent in the sense that what happens in one subinterval (i.e. the occurrence or non-occurrence of a random event) will have no influence on the occurrence of a random event in another subinterval. Any counting process that satisfies this criterion is said to possess independent increments. In the Rutherford and Geiger experiment, the observation of one particle in one half-second period does not imply that a particle will necessarily be observed in the next half-second.

To summarize, the following are the three criteria of a Poisson process:

    Suppose that on average \lambda random events occur in a time interval of length 1.

    1. The probability of having more than one random event occurring in a very short time interval is essentially zero.
    2. For a very short subinterval of length \frac{1}{n} where n is a sufficiently large integer, the probability of a random event occurring in this subinterval is \frac{\lambda}{n}.
    3. The numbers of random events occurring in non-overlapping time intervals are independent.

_______________________________________________________________________________________________

The Poisson Distribution

We are now ready to derive the Poisson distribution from a Poisson random process.

Consider random events generated in a Poisson process and let Y be the number of random events observed in a unit time interval. Break up the unit time interval into n non-overlapping subintervals of equal size where n is a large integer. Each subinterval can have one or no random event. The probability of one random event in a subinterval is \lambda/n. The subintervals are independent. In other words, the three criteria of a Poisson process described above ensure that the n subintervals are independent Bernoulli trials. As a result, the number of events occurring in these n subintervals is a binomial distribution with n trials and probability of success \lambda/n. This binomial distribution is an approximation of the random variable Y. The binomial distribution can get more and more granular. The resulting limit is a Poisson distribution, which coincides with the distribution for Y. The fact that the Poisson distribution is the limiting case of the binomial distribution is discussed here and here.

It follows that Y follows the Poisson distribution with mean \lambda. The following is the probability function.

    \displaystyle P(Y=y)=\frac{e^{-\lambda} \ \lambda^y}{y!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ y=0,1,2,\cdots

In the 1910 radioactivity study, the number of \alpha-particles observed in a 7.5-second period has a Poisson distribution with the mean of \lambda=3.87 particles per 7.5 seconds.

Sometimes it may be necessary to count the random events not in a unit time interval but in a smaller or larger time interval of length t. In a sense, the new unit time is t and the new average rate of the Poisson process is then \lambda t. Then the idea of taking granular binomial distributions will lead to a Poisson distribution. Let Y_t be the number of occurrences of the random events in a time interval of length t. Then Y_t follows a Poisson distribution with mean \lambda t. The following is the probability function.

    \displaystyle P(Y_t=x)=\frac{e^{-\lambda t} \ (\lambda t)^y}{y!} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ y=0,1,2,\cdots

In the 1910 radioactivity study, the number of \alpha-particles observed in a 3.75-second period has a Poisson distribution with the mean of \lambda=3.87/2=1.935 particles per 3.75 seconds.

_______________________________________________________________________________________________

The Gamma Distribution as Derived from a Poisson Process

With the Poisson process and Poisson distribution properly set up and defined, we can now derive the gamma distribution. As before, we work with a Poisson process in which the random events arrive at an average rate of \lambda per unit time. Let W_1 be the waiting time until the occurrence of the first random event, W_2 be the waiting time until the occurrence of the second random event and so on. First examine the random variable W_1.

Consider the probability P(W_1>t). The event W_1>t means that the first random event takes place after time t. This means that there must be no occurrence of the random event in question from time 0 to time t. It follows that

    P(W_1>t)=P(Y_t=0)=e^{-\lambda t}

As a result, P(W_1 \le t)=1-e^{-\lambda t}, which is the cumulative distribution of W_1. Taking the derivative, the probability density function of W_1 is f_{W_1}(t)=\lambda e^{-\lambda t}. This is the density function of the exponential distribution with mean \frac{1}{\lambda}. Recall that \lambda is the rate of the Poisson process, i.e. the random events arrive at the mean rate of \lambda per unit time. Then the mean time between two consecutive events is \frac{1}{\lambda}.

Consider the probability P(W_2>t). The event W_2>t means that the second random event takes place after time t. This means that there can be at most one occurrence of the random events in question from time 0 to time t. It follows that

    P(W_2>t)=P(Y_t=0)+P(Y_t=1)=e^{-\lambda t}+\lambda t \ e^{-\lambda t}

As a result, P(W_2 \le t)=1-e^{-\lambda t}-\lambda t \ e^{-\lambda t}, which is the cdf of the waiting time W_2. Taking the derivative, the probability density function of W_2 is f_{W_2}(t)=\lambda^2 \ t \ e^{-\lambda t}. This is the density function of the gamma distribution with shape parameter 2 and rate parameter \lambda

By the same reasoning, the waiting time until the n^{th} random event, W_n, follows a gamma distribution with shape parameter n and rate parameter \lambda. The survival function, cdf and the density function are:

    \displaystyle P(W_n>t)=\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}

    \displaystyle P(W_n \le t)=1-\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}=\sum \limits_{k=n}^{\infty} \frac{e^{-\lambda t} \ (\lambda t)^k}{k!}

    \displaystyle f_{W_n}(t)=\frac{1}{(n-1)!} \ \lambda^n \ t^{n-1} \ e^{-\lambda t}

The survival function P(W_n>t) is identical to P(Y_t \le n-1). The equivalence is through the translation: the event W_n>t is equivalent to the event that there can be at most n-1 random events occurring from time 0 to time t.

Example 1
Let’s have a quick example of calculation for the gamma distribution. In the study by Rutherford and Geiger in 1910, the average rate of arrivals of \alpha-particles is 3.87 per 7.5-second period, giving the average rate of 0.516 particles per second. On average, it takes 0.516^{-1} = 1.94 seconds to wait for the next particle. The probability that it takes more than 3 seconds of waiting time for the first particle to arrive is e^{-0.516 (3)} = 0.213.

How long would it take to wait for the second particle? On average it would take 2 \cdot 0.516^{-1} = 3.88 seconds. The probability that it takes more than 5 seconds of waiting time for the second particle to arrive is

    e^{-0.516 (5)}+0.516 \cdot 5 e^{-0.516 (5)}=3.58 \cdot e^{-2.58} = 0.271

_______________________________________________________________________________________________

Remarks

The above discussion shows that the gamma distribution arises naturally from a Poisson process, a random experiment that satisfies three assumptions that deal with independence and uniformity in time. The gamma distribution derived from a Poisson process has two parameters n and \lambda where n is a positive integer and is the shape parameter and \lambda is the rate parameter. If the random variable W follows this distribution, its pdf is:

    \displaystyle f_W(w)=\frac{1}{(n-1)!} \ \lambda^n \ w^{n-1} \ e^{-\lambda w} \ \ \ \ \ \ \ \ \ \ \ \ w>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

The above pdf can be interpreted as the density function for the waiting time until the arrival of the nth random event in a Poisson process with an average rate of arrivals at \lambda per unit time. The density function may be derived from an actual Poisson process or it may be just describing some random quantity that has nothing to do with any Poisson process. But the Poisson process interpretation is still useful. One advantage of the Poisson interpretation is that the survival function and the cdf would have an expression in closed form.

    \displaystyle P(W>w)=\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda w} \ (\lambda w)^k}{k!} \ \ \ \ \ \ \ \ \ \ \ \ w>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

    \displaystyle \begin{aligned} P(W \le w)&=1-\sum \limits_{k=0}^{n-1} \frac{e^{-\lambda w} \ (\lambda w)^k}{k!} \\&=\sum \limits_{k=n}^{\infty} \frac{e^{-\lambda w} \ (\lambda w)^k}{k!} \ \ \ \ \ \ \ \ \ \ \ \ w>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3) \end{aligned}

In the Poisson process interpretation, P(W>w) is the probability that the nth random event occurs after time w. This means that in the interval (0, w), there are at most n-1 random events. Thus the gamma survival function is identical to the cdf of a Poisson distribution. Even when W is simply a model of some random quantity that has nothing to do with a Poisson process, such interpretation can still be used to derive the survival function and the cdf of such a gamma distribution.

The gamma distribution described in the density function (1) has a shape parameter that is a positive integer. This special case of the gamma distribution sometimes go by the name Erlang distribution and is important in queuing theory.

In general the shape parameter does not have to be integers; it can be any positive real number. For the more general gamma distribution, see the previous post.

_______________________________________________________________________________________________
\copyright \ 2016 - \text{Dan Ma}

Introducing the gamma distribution

The gamma distribution is a probability distribution that is useful in actuarial modeling. Due to its mathematical properties, there is considerable flexibility in the modeling process. For example, since it has two parameters (a scale parameter and a shape parameter), the gamma distribution is capable of representing a variety of distribution shapes and dispersion patterns. This post gives an account of how the distribution arises mathematically and discusses some of its mathematically properties. The next post discusses how the gamma distribution can arise naturally as the waiting time between two events in a Poisson process.

_______________________________________________________________________________________________

The Gamma Function

From a mathematically point of view, in defining the gamma distribution, the place to start is the gamma function. For any real number \alpha>0, define:

    \displaystyle \Gamma(\alpha)=\int_0^\infty t^{\alpha-1} \ e^{-t} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (0)

The above improper integral converges for every positive \alpha. The proof that the improper integral converges and other basic facts can be found here.

When the integral in (0) has “incomplete” limits, the resulting functions are called incomplete gamma functions. The following are called the upper incomplete gamma function and lower incomplete gamma function, respectively.

    \displaystyle \Gamma(\alpha, x)=\int_x^\infty t^{\alpha-1} \ e^{-t} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

    \displaystyle \gamma(\alpha, x)=\int_0^x t^{\alpha-1} \ e^{-t} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

_______________________________________________________________________________________________

The Gamma Probability Density Function

Notice that the integrand in (0) is a positive value for every t>0. Thus the integrand t^{\alpha-1} \ e^{-t} is a density function if it is normalized by \Gamma(\alpha).

    \displaystyle f_T(t)=\frac{1}{\Gamma(\alpha)} \ t^{\alpha-1} \ e^{-t} \ \ \ \ \ \ \ t>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)

The function in (3) is a probability density function since the integral is one when it is integrated over the interval (0,\infty). For convenience, let T be the random variable having this density function. The density function in (3) only has one parameter, which is \alpha (the shape parameter). To add the second parameter, transform the random variable T by multiplying a constant. This can be done in two ways. The following are the probability density functions for the random variables X=\theta T and Y=\frac{1}{\beta} \ T, respectively.

    \displaystyle f_X(x)=\frac{1}{\Gamma(\alpha)} \ \biggl(\frac{1}{\theta}\biggr)^\alpha \ x^{\alpha-1} \ e^{-\frac{x}{\theta}} \ \ \ \ \ \ \ x>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4a)

    \displaystyle f_Y(y)=\frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ y^{\alpha-1} \ e^{-\beta y} \ \ \ \ \ \ \ \ \ \ \ \ y>0 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4b)

A random variable X is said to follow the gamma distribution with shape parameter \alpha and scale parameter \theta if (4a) is its probability density function (pdf). A random variable Y is said to follow the gamma distribution with shape parameter \alpha and rate parameter \beta if (4b) is its pdf.

The parameter \theta is the scale parameter since it is the case that the larger the value, the more spread out the distribution. The parameter \beta is the rate parameter in the family of gamma distribution. The rate parameter is defined as the reciprocal of the scale parameter.

The following figure (Figure 1) demonstrates the role of the shape parameter \alpha. With the scale parameter \theta kept at 2, the gamma distribution becomes less skewed as \alpha increases.

Figure 1
Gamma Densities aplha 1-10 theta 2

The following figure (Figure 2) demonstrates the role of the scale parameter \theta. With the shape parameter \alpha kept at 2, all the gamma distributions have the same skewness. However, the gamma distributions become more spread out as \theta increases.

Figure 2
Gamma Densities aplha 2 theta 1-10

There are sevreral important subclasses of the gamma distribution. When the shape parameter \alpha=1, the gamma distribution becomes the exponential distribution with mean \theta or \frac{1}{\beta} depending on the parametrization. When the shape parameter \alpha is any positive integer, the resulting subclass of gamma distribution is called the Erlang distribution. A Chi-square distribution is a gamma distribution with shape parameter \alpha=\frac{k}{2} and scale parameter \theta=2 where k is a positive integer (the degrees of freedom). The gamma density curves in Figure 1 are chi-square distributions. Their degrees of freedom are 2, 4, 6, 10 and 20. Chi-square distribution with 2 degrees of freedom would be an exponential distribution.

Between the two parametrizations presented here, the version with the scale parameter is the more appropriate model in the settings where a parameter is needed for describing the magnitude of the mean and the spread. The parametrization with \alpha and \beta is sometimes easier to work with. For example, it is more common in Bayesian analysis where the gamma distribution can be used as a conjugate prior distribution for a parameter that is a rate (e.g. the rate parameter of a Poisson distribution).

_______________________________________________________________________________________________

Some Distributional Quantities

The gamma distribution is a two-parameter family of distributions. Here’s some of the basic distributional quantities that are of interest.

    _________________________________________
    X is a random variable with the gamma distribution with shape parameter \alpha and scale parameter \theta.

    Y is a random variable with the gamma distribution with shape parameter \alpha and rate parameter \beta.

    _________________________________________
    Probability Density Function (PDF)

    \displaystyle \begin{array}{ll} \displaystyle f_X(x)=\frac{1}{\Gamma(\alpha)} \ \biggl(\frac{1}{\theta}\biggr)^\alpha \ x^{\alpha-1} \ e^{-\frac{x}{\theta}} & \ \ \ \ \ \ \ \ \ \ x>0  \\ \text{ } & \text{ } \\ \displaystyle f_Y(y)=\frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ y^{\alpha-1} \ e^{-\beta y} & \ \ \ \ \ \ \ \ \ \ \displaystyle y>0   \end{array}

    _________________________________________
    Cumulative Distribution Function (CDF)

    \displaystyle \begin{aligned} F_X(x)&=\int_0^x \ \frac{1}{\Gamma(\alpha)} \ \biggl(\frac{1}{\theta}\biggr)^\alpha \ t^{\alpha-1} \ e^{-\frac{t}{\theta}} \ dt \\&\text{ } \\&=\frac{\gamma(\alpha, \frac{x}{\theta})}{\Gamma(\alpha)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (\text{Alternative formulation}) \end{aligned}

    \displaystyle \begin{aligned} F_Y(y)&=\int_0^y \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ t^{\alpha-1} \ e^{-\beta t} \ dt \\&\text{ }  \\&=\frac{\gamma(\alpha, \beta y)}{\Gamma(\alpha)} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (\text{Alternative formulation}) \end{aligned}

    _________________________________________
    Mean and Variance

    \displaystyle \begin{array}{ll} \displaystyle E(X)=\alpha \ \theta & \ \ \ \ \ \ \ \ \ \ Var(X)=\alpha \ \theta^2  \\ \text{ } & \text{ } \\ \displaystyle E(Y)=\frac{\alpha}{\beta} & \ \ \ \ \ \ \ \ \ \ \displaystyle Var(Y)=\frac{\alpha}{\beta^2}   \end{array}

    _________________________________________
    Higher Moments

    \displaystyle  E(X^k) = \left\{ \begin{array}{ll}                     \displaystyle  \frac{\theta^k \ \Gamma(\alpha+k)}{\Gamma(\alpha)} &\ \ \ \ \ \ k>-\alpha \\           \text{ } & \text{ } \\           \theta^k \ \alpha (\alpha+1) \cdots (\alpha+k-1) &\ \ \ \ \ \ k \text{ is a positive integer}           \end{array} \right.

    \displaystyle  E(Y^k) = \left\{ \begin{array}{ll}                     \displaystyle  \frac{\Gamma(\alpha+k)}{\beta^k \ \Gamma(\alpha)} &\ \ \ \ \ \ k>-\alpha \\           \text{ } & \text{ } \\           \displaystyle  \frac{\alpha (\alpha+1) \cdots (\alpha+k-1)}{\beta^k} &\ \ \ \ \ \ k \text{ is a positive integer}           \end{array} \right.

    _________________________________________
    Moment Generating Function

    \displaystyle \begin{array}{ll} \displaystyle M_X(t)=\biggl(\frac{1}{1-\theta t} \biggr)^\alpha & \ \ \ \ \ \displaystyle t<\frac{1}{\theta}  \\ \text{ } & \text{ } \\ \displaystyle M_Y(t)=\biggl(\frac{\beta}{\beta- t} \biggr)^\alpha & \ \ \ \ \ \displaystyle t<\beta   \end{array}

    _________________________________________
    Mode

    \displaystyle \begin{array}{ll} \displaystyle \theta \ (\alpha-1) & \ \ \ \ \ \alpha>1 \ \text{else } 0  \\ \text{ } & \text{ } \\ \displaystyle \frac{\alpha-1}{\beta} & \ \ \ \ \ \alpha>1 \ \text{else } 0   \end{array}

    _________________________________________
    Coefficient of Variation

    \displaystyle CV = \frac{1}{\sqrt{\alpha}}

    _________________________________________
    Coefficient of Skewness

    \displaystyle \gamma = \frac{2}{\sqrt{\alpha}}

    _________________________________________
    Kurtosis

    \displaystyle \begin{array}{ll} \displaystyle 3+\frac{6}{\alpha} & \ \ \ \ \ \text{Kurtosis}  \\ \text{ } & \text{ } \\ \displaystyle \frac{6}{\alpha} & \ \ \ \ \ \text{Excess Kurtosis}   \end{array}

There is no simple closed form for the cumulative distribution function, except for the case of \alpha=1 (i.e. the exponential distribution) and the case of \alpha being a positive integer (see next post). As a result, the distributional quantities that required solving for x in the CDF have no closed form, e.g. median and other percentiles. As stated above, the CDF can be expressed using the incomplete gamma function, which can be estimated numerically. For the distributional quantities with no closed form, either use numerical estimation or use software.

The calculation for some of the distributional quantities is quite straightforward. For example, to calculate any higher moment E(X^k), simply adjust the integrand to be an appropriate gamma density function. Then the result will be what can be moved outside the integral, as shown in the following.

    \displaystyle \begin{aligned}E(X^k)&=\int_0^\infty x^k \cdot \frac{1}{\Gamma(\alpha)} \ \biggl(\frac{1}{\theta}\biggr)^\alpha \ x^{\alpha-1} \ e^{-\frac{x}{\theta}} \ dx=\int_0^\infty \frac{1}{\Gamma(\alpha)} \ \biggl(\frac{1}{\theta}\biggr)^\alpha \ x^{\alpha+k-1} \ e^{-\frac{x}{\theta}} \ dx \\&=\frac{\theta^k \Gamma(\alpha+k)}{\Gamma(\alpha)} \int_0^\infty \frac{1}{\Gamma(\alpha+k)} \ \biggl(\frac{1}{\theta}\biggr)^{\alpha+k} \ x^{\alpha+k-1} \ e^{-\frac{x}{\theta}} \ dx \\&=\frac{\theta^k \Gamma(\alpha+k)}{\Gamma(\alpha)} \\&=\theta^k \ \alpha (\alpha+1) \cdots (\alpha+k-1) \end{aligned}

Once the higher moments are known, some of the other calculations follow. For example, the coefficient of variation is defined by the ratio of the standard deviation to the mean. This ratio is the standardized measure of dispersion of a probability distribution. The coefficient of skewness is the ratio of the third central moment to the third power of the standard deviation, i.e. E[(X-\mu)^3]/\sigma^3. see here for a discussion on skewness. The kurtosis is the ratio of the fourth central moment to the fourth power of the standard deviation, i.e. E[(X-\mu)^4]/\sigma^4. The excess kurtosis is obtained by subtracting 3 from the kurtosis.

_______________________________________________________________________________________________

Discussion

The product of two gamma moment generating functions with the same scale parameter \theta (or rate parameter \beta) is also an MGF for a gamma distribution. This points to the fact that the independent sum of two gamma distribution (with the same scale parameter or rate parameter) is a gamma distribution. Specifically if X_1 follows a gamma distribution with the shape parameter \alpha_1 and X_2 follows a gamma distribution with shape parameter \alpha_2 and that they are independent, then the sum X_1+X_2 has a gamma distribution with shape parameter \alpha_1+\alpha_2.

We now revisit Figure 1 and Figure 2. The skewness of a gamma distribution is driven only by the shape parameter \alpha. The gamma skewness is \frac{2}{\sqrt{\alpha}}. The higher the \alpha, the less skewed the gamma distribution is (or the more symmetric it looks). This is borne out by Figure 1. There is another angle via the central limit theorem that is borne out by Figure 1. The gamma densities with larger value of \alpha can also be thought of as the independent sum of many gamma distributions with smaller \alpha values. For example, the gamma with \alpha=10 and \theta=2 can be regarded as the independent sum of 10 exponential distributions each with mean 2. By the central limit theorem, any gamma distribution with large value of \alpha will tend to look symmetric.

In Figure 2, all gamma densities have the same \alpha=2. Thus they all have the same skewness (about 0.707). It is clear that as the scale parameter \theta increases, the densities become more spread out while remaining skewed density curves.

The support of the gamma distribution is the interval (0,\infty). Thus it is plausible model for random quantities that take on positive values, e.g. insurance losses or insurance claim amounts. With the gamma density curves being positively skewed (skewed to the right), the gamma distribution is a good candidate for random quantities that are concentrated more on the lower end of the interval (0,\infty).

Though the gamma distribution is positively skewed, it is considered to have a light (right) tail. The notion of having a light tail or heavy tail is a relative concept. The gamma distribution has a light right tail as compared to the Pareto distribution. The Pareto distribution significantly puts more probability on larger values (the gamma distribution with same mean and variance will put significantly less probabilities on the larger values). In terms of modeling insurance losses, the gamma distribution will be a more suitable model for losses that are not catastrophic in nature.

One tell tale sign of a distribution with a light tail is that all positive moments exist. For the gamma distribution, E(X^k) exists for all positive integer k. In fact, the gamma distribution has a property stronger than the mere fact that all moments exist, i.e. it has a moment generating function. So even though the Gamma distribution has a right tail that is infinitely long (it extents out to infinity), the amount of probabilities is almost negligible after some limit (as compared to the Pareto distribution for example). See here for a more detailed discussion on tail weights and the Pareto distribution.

The next post discusses how the gamma distribution can arise naturally as the waiting time between two events in a Poisson process.

_______________________________________________________________________________________________

Evaluating the Gamma Function

In many calculations discussed in this blog, it is necessary at times to evaluate the gamma function. Of course if the argument is a positive integer, the gamma function is simply the factorial function. Some special values of the gamma function are:

    \Gamma(\frac{1}{2})=\sqrt{\pi}

    \Gamma(\frac{3}{2})=\frac{1}{2} \sqrt{\pi}

    \Gamma(\frac{5}{2})=\frac{3}{4} \sqrt{\pi}

    \Gamma(\frac{7}{2})=\frac{15}{8} \sqrt{\pi}

If special values are not known, it is possible to use software to evaluate the gamma function. We demonstrate how it is done using Excel. There is no dedicated function in Excel for evaluating the gamma function. However, Excel has a function for the PDF of a gamma distribution. To evaluate \Gamma(\alpha), consider the density function in (4b) with parameters \alpha, \beta=1 and y=1. We have the following:

    \displaystyle f_Y(1)=\frac{1}{\Gamma(\alpha)} \ 1^\alpha \ 1^{\alpha-1} \ e^{- 1}=\frac{1}{\Gamma(\alpha)}

    \displaystyle \Gamma(\alpha)=\frac{e^{-1}}{f_Y(1)}

The key to evaluate \Gamma(\alpha) in Excel is to evaluate the gamma PDF with \alpha and \beta=1 at the x-value of 1. The following shows the formula for evaluating the PDF.

    =GAMMADIST(1, \alpha, 1, FALSE)

Then the gamma function can be evaluated by evaluating the following formula in Excel:

    =EXP(-1) / GAMMADIST(1, \alpha, 1, FALSE)

For example, \Gamma(3.6)=3.717023853 which is obtained by the following formula in Excel:

    =EXP(-1) / GAMMADIST(1, 3.6, 1, FALSE)

_______________________________________________________________________________________________
\copyright \ 2016 - \text{Dan Ma}

Introducing the gamma function

The gamma distribution is a probability distribution that is useful in actuarial modeling. From a mathematical point of view, the gamma function is the starting point of defining the gamma distribution. This post discusses the basic facts that are needed for defining the gamma distribution. Here’s the definition of the gamma function.

For any real number \alpha>0, define:

    \displaystyle \Gamma(\alpha)=\int_0^\infty t^{\alpha-1} \ e^{-t} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (0)

The following is the graph of the gamma function. It has a U shape. As x \rightarrow 0, the graph goes up to infinity. As x \rightarrow \infty, the graph increases without bound. As is seen below, the gamma function coincides with the factorial function at the integers.

Gamma Function Graph

_______________________________________________________________________________________________

The Convergence

The gamma function is defined by the improper integral as described in (0). The improper integral converges. In fact, the integral converges for all complex numbers \alpha with positive real part. For our purposes at hand, we restrict \alpha to be positive real numbers. Showing that the integral in (0) converges is important. For example, it will be nice to know that the density function of the gamma distribution sums to 1.0. To show that the integral in (0) converges, we first make the observation: for some positive real number M and for all t>M,

    \displaystyle t^{\alpha-1} <e^{\frac{t}{2}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

This is saying that the quantity \displaystyle e^{\frac{t}{2}} dominates the quantity \displaystyle t^{\alpha-1} when t is sufficiently large. This is because the exponential function \displaystyle e^{\frac{t}{2}} increases at a much faster rate than the polynomial function \displaystyle t^{\alpha-1}. Then multiply both sides of (1) by \displaystyle e^{-t} to obtain the following:

    \displaystyle t^{\alpha-1} e^{-t}<e^{-\frac{t}{2}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

Now, break up the interval in (0) into two pieces:

    \displaystyle \Gamma(\alpha)=\int_0^M t^{\alpha-1} \ e^{-t} \ dt +\int_M^\infty t^{\alpha-1} \ e^{-t} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)

A basic fact. For any continuous function h(t) defined over a closed interval [a,b] of finite length, the integral \displaystyle \int_a^b h(t) \ dt exists and has a finite value. As a result, the first integral in (3) exists. So we just focus on the second integral.

    \displaystyle \int_M^\infty t^{\alpha-1} \ e^{-t} \ dt < \int_M^\infty e^{-\frac{t}{2}} \ dt=2 e^{-\frac{M}{2}} \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)

Note that in (4), the inequality in (2) is used. Due to (4), the second integral in (3) is finite. With both integrals in (3) finite, it follows that the improper integral in the definition of \Gamma(\alpha) is finite.

_______________________________________________________________________________________________

Some Properties

The gamma function is a well known function in mathematics and has wide and deep implications. Here, we simply state several basic facts that are needed. In the discussion here, One useful fact is that the gamma function is the factorial function shifted down by one when the argument \alpha is a positive integer. Thus the gamma function generalizes the factorial function. Another useful fact is that the gamma function satisfies a recursive relation.

  • If n is a positive integer, \Gamma(n)=(n-1)!.
  • For all \alpha>0, \Gamma(\alpha+1)=\alpha \Gamma(\alpha).

The recursive relation \Gamma(\alpha+1)=\alpha \Gamma(\alpha) is derived by using integration by parts. It is clear from definition that \Gamma(1)=1. By an induction argument, \Gamma(n)=(n-1)! for all integers greater than 1.

As result, \Gamma(1)=0!=1, \Gamma(2)=1!=1 and \Gamma(3)=2!=2 and so on. Given that \Gamma(\frac{1}{2})=\sqrt{\pi}, the recursive relation tells us that \Gamma(\frac{3}{2})=\frac{1}{2} \sqrt{\pi} and \Gamma(\frac{5}{2})=\frac{3}{4} \sqrt{\pi} and so on. The above recursive relation can be further extended:

  • For any positive integer k, \Gamma(\alpha+k)=\alpha (\alpha+1) \cdots (\alpha + k-1) \ \Gamma(\alpha).

When the integral in (0) has “incomplete” limits, the resulting functions are called incomplete gamma functions. The following are called the upper incomplete gamma function and lower incomplete gamma function, respectively.

    \displaystyle \Gamma(\alpha, x)=\int_x^\infty t^{\alpha-1} \ e^{-t} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)

    \displaystyle \gamma(\alpha, x)=\int_0^x t^{\alpha-1} \ e^{-t} \ dt \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)

_______________________________________________________________________________________________
The next post shows how the gamma distribution arises naturally from the gamma function.

_______________________________________________________________________________________________
\copyright \ 2016 - \text{Dan Ma}