# The (a,b,1) class

This post is a continuation of the preceding post on (a,b,0) class. This post introduces the class of discrete discrete distributions called the (a,b,1) class.

The discussion in this post has a great deal of technical details. A concise summary of the (a,b,0) class and (a,b,1) class is found here.

The (a,b,1) Class

A counting distribution is a discrete probability distribution that takes on the non-negative integers. Let $N$ be a random variable that is a counting distribution. For each integer $k=0,1,2,\cdots$, let $P_k=P[N=k]$ if $N$ is the counting distribution being considered. Recall that a counting distribution is a member of the (a,b,0) class of distributions if the following recursive relation holds for some constants $a$ and $b$.

(1)……….$\displaystyle \frac{P_k}{P_{k-1}}=a + \frac{b}{k} \ \ \ \ \ \ \ \ \ \ \ \ \ k=1,2,3,\cdots$

For a member of the (a,b,0) class, the initial probability $P_0$ is fixed (the sum of all the $P_k$ must sum to 1). Thus a member of the (a,b,0) class has two parameters, namely $a$ and $b$ in the recursive relation (1). A counting distribution is a member of the (a,b,1) class of distributions if the following recursive relation holds for some constants $a$ and $b$.

(2)……….$\displaystyle \frac{P_k}{P_{k-1}}=a + \frac{b}{k} \ \ \ \ \ \ \ \ \ \ \ \ \ k=2,3,4 \cdots$

The recursion in the (a,b,1) class begins at $k=2$. That means that the initial probability $P_0$ must be an assumed value. The probability $P_1$ is then the value such that the sum $P_1+P_2+\cdots$ is $1-P_0$. Thus a member of the (a,b,1) class has three parameters: $a$, $b$ and $P_0$.

There are two subclasses in the (a,b,1) class of distributions. They are determined by whether $P_0=0$ or $P_0>0$ The (a,b,1) distributions in the first category are called the zero-truncated distributions. The (a,b,1) distributions in the second category are called zero-modified distributions.

This is how we will proceed. Using a given distribution in the (a,b,0) class as a starting point, we show how to derive the zero-truncated distribution. From a given zero-truncated distribution, we show how to derive the zero-modified distribution.

The name of the (a,b,1) distribution has the same (a,b,0) name with either zero-truncated or zero-modified as the prefix. For example, if the starting point is the negative binomial distribution in the (a,b,0) class, then the derived distributions in the (a.b.1) class are the zero-truncated negative binomial distribution and the zero-modified negative binomial distribution.

There are only three distributions in the (a,b,0) class – Poisson, binomial and negative binomial. Then the (a,b,1) class contains the zero-truncated and zero-modified versions of these three distributions. However, the (a,b,1) class contained distributions that are not modifications of the (a,b,0) distributions. We discuss three such additional distributions – extended truncated negative binomial (ETNB) distribution, logarithmic distribution and Sibuya distribution. These three distributions that are not derived from an (a,b,0) distribution are discussed in a separate section below.

We present three examples demonstrating how using a base (a,b,0) negative binomial distribution to derive a zero-truncated negative binomial distribution (Example 1) and a zero-modified negative binomial distribution (Example 2). We also give an example for ETNB distribution (Example 3).

Notations

To facilitate discussion, let’s fix some notations. To clearly denote the distributions, notations without superscripts and subscripts refer to the (a,b,0) distributions. Notations with the superscript T (or subscript T) refer to the zero-truncated distributions in the (a,b,1) class. Likewise notations with the superscript M (or subscript M) refer to the zero-modified distributions in the (a,b,1) class.

For example, the following are the probability function (pf) and the probability generating function (pgf) of a distribution from the (a,b,0) class.

(3)……….\displaystyle \begin{aligned}&P_k=P[N=k] \\&P(z)=\sum \limits_{k=0}^\infty P_k z^k \end{aligned}

The following shows the notations for the pf and pgf for a zero-truncated distribution from the (a,b,1) class.

(4)……….\displaystyle \begin{aligned}&P_k^T=P[N_T=k] \\&P^T(z)=\sum \limits_{k=1}^\infty P_k^T z^k \end{aligned}

The following shows the notations for the pf and pgf for a zero-modified distribution from the (a,b,1) class.

(5)……….\displaystyle \begin{aligned}&P_k^M=P[N_M=k] \\&P^M(z)=\sum \limits_{k=0}^\infty P_k^M z^k \end{aligned}

Whenever it is convenient to do so, $N$ is a random variable from (a,b,0) while $N_T$ and $N_M$ are to denote random variables for the zero-truncated distribution and zero-modified distribution, respectively.

Zero-Truncated Distributions

The focus in this section is on the zero-truncated distributions that originate from the (a,b,0) class. The three distributions indicated above (ETNB, logarithmic and Shibuya) are discussed in a separate section below.

Suppose we start with a distribution from the (a,b,0) class, with the notations $P_k$, $P(z)$ and $N$ as indicated above. We show how to derive the corresponding zero-truncated distribution in the (a,b,1) class. For the zero-truncated distribution, there are two ways to compute probabilities. One is the recursion relation:

(6)……….$\displaystyle \frac{P_k^T}{P_{k-1}^T}=a + \frac{b}{k} \ \ \ \ \ \ \ \ \ \ \ \ \ k=2,3,4 \cdots$

The recursion relation (6) is identical to the one in (2). The recursion begins at $k=2$. The zero-truncated probabilities can also be derived from the (a,b,0) probabilities as follows:

(7)……….$\displaystyle P_k^T=\frac{1}{1-P_0} P_k \ \ \ \ \ \ \ \ \ \ \ \ \ k=1,2,3,4 \cdots$

The probabilities $P_k^T$ in (7) can be regarded as conditional probabilities – the probability that $N=k$ given that $N>0$. From a procedural standpoint, the probabilities $P_k^T$ are the (a,b,0) probabilities $P_k$ multiplied by $1/(1-P_0)$ to make the probabilities sum to 1. With the probabilities established, the probability generating function (pgf) and mean and moments of the zero-truncated distribution can also be expressed in terms of the corresponding quantities of the (a,b,0) distribution.

(8)……….$\displaystyle P^T(z)=\sum \limits_{k=1}^\infty P_k^T \ z^k =\frac{1}{1-P_0} \ [P(z)-P_0]$

(9)……….$\displaystyle E[N_T]=\frac{1}{1-P_0} \ E[N]$

(10)……..$\displaystyle E[N_T^2]=\frac{1}{1-P_0} \ E[N^2]$

(11)……..\displaystyle \begin{aligned} Var[N_T]&=\frac{1}{1-P_0} \ E[N^2]-\biggl( \frac{1}{1-P_0} \ E[N]\biggr)^2 \\&=\frac{1}{1-P_0} \ Var[N]+\biggl(1-\frac{1}{1-P_0} \biggr) \ \frac{1}{1-P_0} \ E[N]^2 \end{aligned}

The goal of the above items is to inform on the zero-truncated distribution based on information from the (a,b,0) distribution. They can also be derived based on definitions using the probability function (7). The following shows the factorial means of the zero-truncated distribution.

(12)……..$\displaystyle \mu_{(1)}=E[N_T]=\frac{a+b}{(1-a) (1-P_0)}$
……..

(13)……..\displaystyle \begin{aligned} \mu_{(j)}&=E \{ N_T \ [N_T-1] \ [N_T-2] \cdots [N_T-(j-1)] \}\\&\text{ } \\&=\frac{(a j+b) \ \mu_{(j-1)}}{1-a} \end{aligned}

The first factorial mean is identical to the mean of $N_T$. The $P_0$ in $\mu_{(1)}$ is the value of zero probability for the corresponding member in the (a,b,0) class. The higher factorial moments are derived recursively as in the (a,b,0) case. The raw moments $E[N_T^k]$ can be derived using the factorial moments. The variance, as derived from the factorial moments, is:

(14)……..$\displaystyle Var[N_T]=\frac{(a+b) \ [1-(a+b+1) \ P_0]}{[(1-a) \ (1-P_0) ]^2}$

Example 1
It is helpful to go through an example. First, we set up an (a,b,0) distribution – an negative binomial distribution with parameters $r=2$ and $\theta=3$. The (a,b,0) parameters are $a=3/4$ and $b=3/4$. The following gives the pf and the recursive relation for this negative binomial distribution, as well as the mean, variance and pgf.

……….$\displaystyle P_k=(1+k) \ \frac{1}{16} \ \biggl(\frac{3}{4} \biggr)^k \ \ \ \ \ \ \ \ k=0,1,2,3,\cdots$

……….$\displaystyle P_k=\biggl(\frac{3}{4}+\frac{3}{4} \ \frac{1}{k} \biggr) \ P_{k-1} \ \ \ \ \ \ \ \ k=1,2,3,\cdots$

……….$\displaystyle E[N]=6$

……….$\displaystyle Var[N]=24$

……….$\displaystyle P(z)=[1-3 \ (z-1)]^{-2}$

The following table shows the first 5 probabilities for the zero-truncated negative binomial distribution.

Table – Zero-Truncated Negative Binomial

$\bold k$ (a,b,0) $\bold P_{\bold k}$ Zero-Truncated $\bold P_{\bold k}^{\bold T}$
0 $\displaystyle \frac{1}{16}$
1 $\displaystyle \frac{3}{32}$ $\displaystyle \frac{3}{30}$
2 $\displaystyle \frac{27}{256}$ $\displaystyle \frac{27}{240}$
3 $\displaystyle \frac{27}{256}$ $\displaystyle \frac{27}{240}$
4 $\displaystyle \frac{405}{4096}$ $\displaystyle \frac{405}{3840}$
5 $\displaystyle \frac{729}{8192}$ $\displaystyle \frac{729}{7680}$

The probabilities $P_k$ are generated by either the (a,b,0) pf or the recursive relation. The probabilities $P_k^T$ are generated by the recursive relation (7) or by the recursive relation (6). The following lists the mean, variance and pgf of the zero-truncated negative binomial example.

……….$\displaystyle E[N_T]=\frac{16}{15} \ 6=\frac{32}{5}=6.4$

……….$\displaystyle Var[N_T]=\frac{576}{25}=23.04$

……….$\displaystyle P^T(z)=\frac{16}{15} \biggl( [1-3 \ (z-1)]^{-2}-\frac{1}{16} \biggr)$

Zero-Modified Distributions

The goal of this section is to derive a zero-modified distribution from a zero-truncated distribution, either derived from an (a,b,0) distribution as discussed in the preceding section, or a truncated distribution not originated from (a,b,0) class.

We now take a zero-truncated distribution as a given and derive the probabilities $P_k^M$ and other distributional quantities. As in the case of zero-truncated distribution, one way to generate probabilities is through the recursion process:

(15)……..$\displaystyle \frac{P_k^M}{P_{k-1}^M}=a + \frac{b}{k} \ \ \ \ \ \ \ \ \ \ \ \ \ k=2,3,4 \cdots$

The probability $P_0^M>0$ is an assumed value. The probability $P_1^M$ is the value that ensures that all the probabilities sum to 1. As indicated the recursion begins at $k=2$. Another way to calculate probabilities is through the zero-truncated distribution:

(16)……..$\displaystyle P_k^M=(1-P_0^M) \ P_k^T \ \ \ \ \ \ \ \ \ \ \ \ \ k=1,2,3,4 \cdots$

Of course, if the zero-truncated distribution is based on a distribution from the (a,b,0) class, we can express the zero-modified probabilities as, after plugging (7) into (16):

(17)……..$\displaystyle P_k^M=\frac{1-P_0^M}{1-P_0} \ P_k \ \ \ \ \ \ \ \ \ \ \ \ \ k=1,2,3,4 \cdots$

Further distributional quantities can now be derived:

(18)……..$\displaystyle P^M(z)=P_0^M \cdot 1+(1-P_0^M) \ P^T(z)$

(19)……..$\displaystyle E[N_M]=(1-P_0^M) \ E[N_T]$

(20)……..$\displaystyle Var[N_M]=(1-P_0^M) \ Var[N_T]+P_0^M \ (1-P_0^M) \ E[N_T]^2$

The result (18) is the pgf of the zero-modified distribution based on the pgf of the given zero-truncated distribution. In words, (19) says that the mean of the modified distribution is $1-P_0^M$ times the mean of the given zero-truncated distribution. In words, (20) says that the variance of the zero-modified distribution is $1-P_0^M$ times the variance of the given zero-truncated distribution plus $P_0^M (1-P_0^M)$ times the square of the mean of the truncated distribution.

If the given zero-truncated distribution is actually obtained from a member of the (a,b,0) class, then the above three results can be expressed in terms of (a,b,0) information, after plugging the corresponding information for $N_T$ into (18), (19) and (20).

(21)……..$\displaystyle P^M(z)=\biggl(1-\frac{1-P_0^M}{1-P_0} \biggr) \cdot 1+\frac{1-P_0^M}{1-P_0} \ P(z)$

(22)……..$\displaystyle E[N_M]=\frac{1-P_0^M}{1-P_0} \ E[N]$

(23)……..$\displaystyle Var[N_M]=\frac{1-P_0^M}{1-P_0} \ Var[N]+ \biggl(1-\frac{1-P_0^M}{1-P_0} \biggr) \ \frac{1-P_0^M}{1-P_0} \ E[N]^2$

Example 2
Consider the zero-truncated negative binomial distribution considered in Example 1. We now generate information on the corresponding zero-modified negative binomial distribution with the assumed value of $P_0^M=0.2$. The following table gives several probabilities.

Table – Zero-Modified Negative Binomial

$\bold k$ (a,b,0) $\bold P_{\bold k}$ Zero-Truncated $\bold P_{\bold k}^{\bold T}$ Zero-Modified $\bold P_{\bold k}^{\bold M}$
0 $\displaystyle \frac{1}{16}$ 0.2
1 $\displaystyle \frac{3}{32}$ $\displaystyle \frac{3}{30}$ $\displaystyle \frac{2.4}{30}$
2 $\displaystyle \frac{27}{256}$ $\displaystyle \frac{27}{240}$ $\displaystyle \frac{21.6}{240}$
3 $\displaystyle \frac{27}{256}$ $\displaystyle \frac{27}{240}$ $\displaystyle \frac{21.6}{240}$
4 $\displaystyle \frac{405}{4096}$ $\displaystyle \frac{405}{3840}$ $\displaystyle \frac{326.4}{3840}$
5 $\displaystyle \frac{729}{8192}$ $\displaystyle \frac{729}{7680}$ $\displaystyle \frac{583.2}{7680}$

The zero-modified probabilities $P_k^M$ are calculated according to (16). With the assumed value $P_0^M=0.2$, $1-P_0^M=0.8$. We simply multiply each zero-truncated probability by 0.8. Using (18), (19) and (20), we obtain the mean, variance and pgf of the zero-modified negative binomial example.

……….$\displaystyle E[N_M]=0.8 \ E[N_T]=0.8 (6.4)=5.12$

……….$\displaystyle Var[N_M]=24.9856$

……….$\displaystyle P^M(z)=0.2+0.8 \biggl[ \frac{16}{15} \biggl( [1-3 \ (z-1)]^{-2}-\frac{1}{16} \biggr) \biggr]$

Additional Zero-Truncated Distributions

As indicated earlier, the (a,b,1) class contains distributions other than the ones derived from the three (a,b,0) distributions. These distributions also have the zero-truncated versions as well as the zero-modified versions. We discuss the truncated versions. They are: the extended truncated negative binomial (ETNB) distribution, the logarithmic distribution and the Sibuya distribution. The extended truncated negative binomial (ETNB) distribution is resulted from relaxing the r parameter of the negative binomial distribution. The logarithmic distribution and Sibuya distribution are derived from the ETNB distribution. The modified versions of these three distributions can then be obtained by going through the process outlined in the preceding section.

ETNB
Recall that the (a,b,0) negative binomial distribution has two parameters $r$ and $\theta$. The following gives the parameters $a$ and $b$ used in the (a,b,0) recursion and the first two probabilities.

……….$\displaystyle a=\frac{\theta}{1+\theta} \ \ \ \ \ \ \ \ \ \ \ \ \ \ b=(r-1) \ \frac{\theta}{1+\theta} \ \ \ \ \ \ r>0, \ \theta >0$

……….$\displaystyle P_0=\biggl( \frac{1}{1+\theta} \biggr)^r \ \ \ \ \ \ P_1=r \ \biggl( \frac{1}{1+\theta} \biggr)^r \biggl( \frac{\theta}{1+\theta} \biggr)=\frac{r \ \theta}{(1+\theta)^{r+1}}$

The extended negative binomial distribution is resulted from extending the $r$ parameter so that $-1 is applicable in addition to the usual $r>0$. With the extension of $r$, the ETNB probabilities are generated according to the truncated probabilities of (7). In effect, we are pretending that we are starting from a base (a,b,0) negative binomial distribution even the $r$ parameter could be such that $-1. Thus the two parameters of the zero-truncated ETNB distribution are given by the following:

(24)……..$\displaystyle a=\frac{\theta}{1+\theta} \ \ \ \ \ \ b=(r-1) \ \frac{\theta}{1+\theta} \ \ \ \ \ \ -10$

What do we do with the ETNB parameters indicated in (24)? Using these $a$ and $b$, we can generate the “negative binomial” probabilities $P_k$ according to the recursive relation (1) with $P_0=[1/(1+\theta)]^r$. However, with $r$ being negative, these values of $P_k$ are not probabilities (in fact they are negative). However, the value of $1/(1-P_0)$ is also negative when $r$ is negative. Using (7), the zero-truncated probabilities $P_k^T$ are positive. Thus the “negative binomial” distribution using a negative $r$ is not really a distribution. It is just a device to define ETNB distribution.

Using the idea in the preceding paragraph, we can also come up with direct formula for the ETNB probabilities $P_k$. The following gives the first three probabilities.

……….$\displaystyle P_1^T=\frac{1}{1-P_0} \ P_1=\frac{1}{1-P_0} \ \frac{r \ \theta}{(1+\theta)^{r+1}}=\frac{r \theta}{(1+\theta)^{r+1}-(1+\theta) }$

……….\displaystyle \begin{aligned} P_2^T&=\frac{1}{1-P_0} \ P_2\\&=\frac{1}{1-P_0} \ \frac{r (r+1)}{2} \frac{1}{(1+\theta)^r} \ \biggl(\frac{\theta}{1+\theta} \biggr)^2 \\&=\frac{r (r+1)}{2} \ \frac{1}{(1+\theta)^r-1} \ \biggl(\frac{\theta}{1+\theta} \biggr)^2 \end{aligned}

……….\displaystyle \begin{aligned} P_3^T&=\frac{1}{1-P_0} \ P_3\\&=\frac{1}{1-P_0} \ \frac{r (r+1) (r+2)}{3!} \frac{1}{(1+\theta)^r} \ \biggl(\frac{\theta}{1+\theta} \biggr)^3 \\&=\frac{r (r+1) (r+2)}{3!} \ \frac{1}{(1+\theta)^r-1} \ \biggl(\frac{\theta}{1+\theta} \biggr)^3 \end{aligned}

Based on the pattern of the above three probabilities, the ENTB probability $P_k$, $k=1,2,3,\cdots$, is:

(25)……..$\displaystyle P_k=\frac{1}{1-P_0} \ P_k=\frac{r (r+1) \cdots (r+k-1)}{k!} \ \frac{1}{(1+\theta)^r-1} \ \biggl(\frac{\theta}{1+\theta} \biggr)^k$

All other distributional quantities such as pgf and means and higher moments can be derived based on the ETNB pf $P_k^T$ For example, the mean, variance and pgf are:

(26)……..$\displaystyle E[N_T]=\frac{1}{1-P_0} \ E[N]=\frac{1}{1-P_0} \ r \ \theta=\frac{r \ \theta}{1-(1+\theta)^{-r}}$

(27)……..\displaystyle \begin{aligned} Var[N_T]&=\frac{1}{1-P_0} \ Var[N]+\biggl(1-\frac{1}{1-P_0} \biggr) \ \frac{1}{1-P_0} \ E[N]^2 \\&=r \ \theta \ \frac{(1+\theta)-(1+\theta+ r \ \theta) \ (1+\theta)^{- r}}{[1-(1+\theta)^{-r}]^2} \end{aligned}

(28)……..\displaystyle \begin{aligned} P^T(z)&=\frac{1}{1-P_0} \ (P(z)-P_0) \\&=\frac{1}{1-(1+\theta)^{-r}} \ \biggl[ \biggl(1-\theta (z-1) \biggr)^{-r} -(1+\theta)^{-r}\biggr] \end{aligned}

Logarithmic Distribution
This is a truncated distribution that is derived from ETNB by letting $r \rightarrow 0$. The following shows the information that is needed for the recursive generation of probabilities.

(29)……..$\displaystyle a=\frac{\theta}{1+\theta} \ \ \ \ \ \ \ \ b=-\frac{\theta}{1+\theta}$

(30)……..$\displaystyle P_1^T=\frac{\theta}{(1+\theta) \ \ln(1+\theta)}$

The parameter $b$ is obtained by letting $r \rightarrow 0$ in the $b$ for ETNB. The logarithmic $P_1^T$ is from taking the limit of the ETNB $P_1^T$ as $r \rightarrow 0$ (using the L’Hopital’s rule). The rest of the pf $P_k$ for $k=2,3,\cdots$ can be generated from the recursive relation (6). Unlike a zero-truncated distribution that is derived from an (a,b,0) distribution, the distributional quantities of the logarithmic distribution cannot be derived from an (a,b,0) distribution. Thus in order to gain more information about the logarithmic, its pf must be used. The mean and variance for the logarithmic distribution are:

(31)……..$\displaystyle E[N_T]=\frac{\theta}{\ln(1+\theta) }$

(32)……..$\displaystyle Var[N_T]=\frac{\theta \biggl[1+\theta-\theta / \ln(1+\theta) \biggr]}{\ln(1+\theta)}$

Sibuya Distribution
This is a truncated distribution that is derived from ETNB by letting $\theta \rightarrow \infty$ and making $-1. The following shows the information that is needed for the recursive generation of probabilities.

(33)……..$\displaystyle a=1 \ \ \ \ \ \ \ \ b=r-1$

(34)……..$\displaystyle P_1^T=-r$

All of the three items are obtained by letting $\theta \rightarrow \infty$ in the corresponding items in ETNB. To see that $\displaystyle P_1^T=-r$, rewrite the ETNB $P_1^T$ as follows:

……….$\displaystyle P_1^T=\frac{r \theta}{(1+\theta)^{r+1}-(1+\theta)}=\frac{r \theta}{(1+\theta) \ [(1+\theta)^r-1]}=r \ \frac{\theta}{1+\theta} \ \frac{1}{(1+\theta)^r-1}$

As $\theta \rightarrow \infty$, the ratio $\theta / (1+\theta)$ goes to 1. As $\theta \rightarrow \infty$, $(1+\theta)^r$ goes to 0 because $r$ is negative. Thus the above $P_1^T$ goes to $-r$. With the $a$ and $b$ in (34) and the $P_1^T$ in (35), the rest of the Shibuya pf can be generated by the recursive relation in (6). Note that the mean does not exist for the Shibuya distribution. The following is the pgf of the Sibuya distribution.

(35)……..$\displaystyle P^T(z)=1-(1-z)^{-r}$

Once these three zero-truncated distributions are obtained, we can derive the zero-modified versions of these distributions in the process described earlier.

Example 3
We demonstrate how ETNB is calculated. Let $r=-\frac{1}{2}$ and $\theta=3$. Then the parameters for the “artificial” negative binomial distribution are:

……….$\displaystyle a=\frac{3}{4} \ \ \ \ \ \ \ \ b=\biggl(-\frac{1}{2}-1 \biggr) \frac{3}{4}=-\frac{9}{8}$

The $P_0$ for the artificial negative binomial distribution is $P_0=(1/4)^{-0.5}=2$, making $1/(1-P_0)=-1$. We generate the fake negative binomial probabilities recursively using the $a$ and $b$. Then we multiply by the $1/(1-P_0)=-1$ to get the zero-truncated probabilities according to (7).

Table – Zero-Truncated ETNB

$\bold k$ Artificial $\bold P_{\bold k}$ Zero-Truncated $\bold P_\bold k^{\bold T}$
0 $\displaystyle 2$
1 $\displaystyle -\frac{3}{4}$ $\displaystyle \frac{3}{4}$
2 $\displaystyle -\frac{9}{64}$ $\displaystyle \frac{9}{64}$
3 $\displaystyle -\frac{27}{512}$ $\displaystyle \frac{27}{512}$
4 $\displaystyle -\frac{405}{16384}$ $\displaystyle \frac{405}{16384}$
5 $\displaystyle -\frac{1701}{131072}$ $\displaystyle \frac{1701}{131072}$

The column labeled artificial $P_k$ is obviously not probabilities. It is generated recursively using $a=3/4$ and $b=-9/8$. Then multiply the column labeled artificial $P_k$ by $1/(1-P_0)=-1$ to obtain the ETNB probabilities, which can also be computed directly using (25).

Using (26) and (27), the ETNB mean and variance are $E[N_T]=3/2$ and $Var[N_T]=3/2$. With an assumed value of $P_0^M=0.1$, we generate the first 5 zero-modified ETNB probabilities in the following table.

Table – Zero-Modified ETNB

$\bold k$ Artificial $\bold P_{\bold k}$ Zero-Truncated $\bold P_\bold k^{\bold T}$ Zero-Modified $\bold P_\bold k^{\bold M}$
0 $\displaystyle 2$ 0.1
1 $\displaystyle -\frac{3}{4}$ $\displaystyle \frac{3}{4}$ $\displaystyle \frac{2.7}{4}$
2 $\displaystyle -\frac{9}{64}$ $\displaystyle \frac{9}{64}$ $\displaystyle \frac{8.1}{64}$
3 $\displaystyle -\frac{27}{512}$ $\displaystyle \frac{27}{512}$ $\displaystyle \frac{24.3}{512}$
4 $\displaystyle -\frac{405}{16384}$ $\displaystyle \frac{405}{16384}$ $\displaystyle \frac{364.5}{16384}$
5 $\displaystyle -\frac{1701}{131072}$ $\displaystyle \frac{1701}{131072}$ $\displaystyle \frac{1530.9}{131072}$

With the assumed value of $P_0^M=0.1$, the zero-modified probabilities are obtained by multiplying the zero-truncated probabilities by $1-P_0^M=0.9$. Using (19) and (20), the zero-modified ETNB mean and variance are: $E[N_M]=1.35$ and $Var[N_M]=1.5525$.

Practice Problems

The discussion in this post has a great deal of technical details. A concise summary of the (a,b,0) class and (a,b,1) class is found here.

Practice problems on (a,b,0) class

Practice problems on (a,b,1) class

Dan Ma actuarial topics
Dan Ma actuarial
Dan Ma math

Daniel Ma actuarial
Daniel Ma mathematics
Daniel Ma actuarial topics

$\copyright$ 2019 – Dan Ma

# The (a,b,0) class

This post introduces the class of discrete distributions called the (a,b,0) class.

A counting distribution is a discrete random variable that takes on values of non-negative integers 0,1,2, … Examples include the Poisson distribution, the binomial distribution and the negative binomial distribution (see here for a discussion). These distributions are potential models for the number of occurrences for some random events of interest, e.g. the number of losses in actuarial applications. The discussion below shows that the notion of (a,b,0) class is another way to describe the big three counting distributions of Poisson, binomial and negative binomial. The notion of (a,b,1) class is a generalization of the (a,b,0) class and is defined in a subsequent post.

The (a,b,0) Class

The (a,b,0) class is at heart a recursive algorithm to generate probabilities. Let’s fix some notations. Let $N$ be a counting random variable. For each $k=0,1,2,3,\cdots$, let $P_k=P(N=k)$. The counting random variable $N$ is said to be a member of the (a,b,0) class of distributions if for some constants $a$ and $b$ the following recursive relation holds

$\displaystyle (1) \ \ \ \ \ \frac{P_k}{P_{k-1}}=a + \frac{b}{k} \ \ \ \ \ \ \ \ \ \ \ \ \ k=1,2,3,\cdots$

Note that the recursive relation (1) generates all the probabilities $P_k$ for all integers $k$ starting at 1. The relation (1) does not account for $P_0$. Does that mean that the initial probability $P_0$ can be any arbitrary probability value? Note that the recursive relation (1) means that each $P_k$ is ultimately expressed in terms of $P_0$.

$P_0=P_0$

$\displaystyle P_1=(a+b) P_0$

$\displaystyle P_2=\biggl(a+\frac{b}{2} \biggr) (a+b) P_0$

$\cdots$

$\displaystyle P_k=\biggl(a+\frac{b}{k} \biggr) \biggl(a+\frac{b}{k-1} \biggr) \cdots \biggl(a+\frac{b}{2} \biggr) (a+b) P_0$

$\cdots$

When $a$ and $b$ are fixed, the value of $P_0$ is also fixed since the probabilities must sum to 1. In fact $P_0$ is the following value.

$\displaystyle (2) \ \ \ \ \ P_0 =W^{-1}$

where $\displaystyle W=\biggl[ 1+(a+b)+\biggl(a+\frac{b}{2} \biggr)(a+b)+\cdots+ \biggl \{ \biggl(a+\frac{b}{k} \biggr) \cdots \biggl(a+\frac{b}{2} \biggr)(a+b) \biggr \} +\cdots \biggr]$

Thus a member of the (a,b,0) class has two parameters, namely $a$ and $b$, which completely determine the distribution.

Example 1
As an example, let $a=0$ and $b=\lambda$ where $\lambda>0$ is a fixed positive constant. Using (1), we see that

$\displaystyle P_1=\lambda \ P_0$
$\displaystyle P_2=\frac{1}{2!} \ \lambda^2 P_0$
$\displaystyle P_3=\frac{1}{3!} \ \lambda^3 P_0$
……..$\cdots$
……..$\cdots$
……..$\cdots$
$\displaystyle P_n=\frac{1}{n!} \lambda^n P_0$
……..$\cdots$
……..$\cdots$
……..$\cdots$

According to (2), $P_0=e^{-\lambda}$

\displaystyle \begin{aligned} P_0&=\biggl(1+ \lambda+\frac{1}{2!} \ \lambda^2 +\cdots+\frac{1}{n!} \lambda^n +\cdots \biggr)^{-1} \\&=(e^{\lambda})^{-1}\\&=e^{-\lambda} \end{aligned}

With $P_0=e^{-\lambda}$, the probabilities $P_n$ are from a Poisson distribution. Thus, when the parameter $a$ is 0, and the parameter $b$ is a positive constant, the corresponding distribution from the (a,b,0) class is a Poisson distribution.

Only Three Members in the (a,b,0) Class

In essence, the (a,b,0) class has only three members, namely the big 3 discrete distributions – the Poisson distribution, the binomial distribution and the negative binomial distribution, with each distribution represented by a different sign of the parameter $a$. Using the recursive relation (1), it can be shown that each of the big three distributions belongs to the (a,b,0) class. The following table shows the parameters $a$ and $b$ in the three cases.

Table 1

Distribution Usual Parameters Parameter a Parameter b
Poisson $\lambda$ 0 $\lambda$
Binomial $n$ and $p$ $\displaystyle -\frac{p}{1-p}$ $\displaystyle (n+1) \ \frac{p}{1-p}$
Negative binomial $r$ and $p$ $1-p$ $(r-1) \ (1-p)$
Negative binomial $r$ and $\theta$ $\displaystyle \frac{\theta}{1+\theta}$ $\displaystyle (r-1) \ \frac{\theta}{1+\theta}$
Geometric $p$ $1-p$ 0
Geometric $\theta$ $\displaystyle \frac{\theta}{1+\theta}$ 0

Table 1 shows how to parametrize the three distributions. For example, for the binomial distribution with parameters $n$ (the number of trials) and $p$ (the probability of success), the (a,b,0) parameters are $a=-p/(1-p)$ and $b=-(n+1) a$. The two rows for negative binomial reflect two different parametrizations. Of course, the geometric distribution is simply a negative binomial distribution when the parameter $r=1$. Essentially Table 1 consists of three different distributions.

Table 1 works in the opposite direction as well. Any set of (a,b,0) parameters $a$ and $b$ must fit into one of the distributions listed in Table 1. In other words, the recursive relation (1) produces no new counting distribution. Any counting distribution satisfying (1) must be one of the big 3 counting distributions listed in Table 1.

Note that under the recursive relation (1), not all combinations of $a$ and $b$ will make a probability distribution. For example, when both $a$ and $b$ are negative constants, the resulting probabilities $P_k$ are negative for odd $k$. When $a+b<0$, the resulting probabilities $P_k$ cannot be reliably positive in all instances. When $a+b=0$, $P_0=1$, i.e. the distribution is a point mass at 0. So we would like to restrict the attention on the case where $a+b>0$.

To echo the point made previously, it is the case that when $a+b>0$ and when the recursive relation (1) produces a viable probability distribution, the resulting distribution must be one of the three distributions listed in Table 1. This point is not entirely obvious. Any interested reader can see chapter 6 of [1].

Table 1 indicates that the sign of the parameter $a$ determines the form of the (a,b,0) distribution. If $a=0$, it is a Poisson distribution. If $a$ is negative, it is a binomial distribution. If $a$ is positive, it is a negative binomial distribution.

Examples

We now present a few more examples illustrating the working of the (a,b,0) recursive relation.

Example 2
This example illustrates that knowing three consecutive probabilities of a member of the (a,b,0) class determines the entire distribution. For example, suppose we know that

$P_1=0.0567$
$P_2=0.07938$
$P_3=0.09261$

These three consecutive probabilities produce the following two linear equations of $a$ and $b$.

$\displaystyle \frac{P_2}{P_1}=\frac{0.07938}{0.0567}=a+\frac{b}{2}$
$\displaystyle \frac{P_3}{P_2}=\frac{0.09261}{0.07938}=a+\frac{b}{3}$

Solving these two linear equations produces $a=0.7$ and $b=1.4$. Since $a$ is positive, this is a negative binomial distribution. The corresponding negative binomial parameters are $r=3$ and $p=0.3$. With this information, the (a,b,0) distribution in question is completely determined. The following are the several distributional quantities.

$P_0=0.3^3=0.027$
$P_4=(0.7+\frac{1.4}{4}) \ P_3=0.0972405$
$\displaystyle E(N)=r \frac{1-p}{p}=3 \frac{0.7}{0.3}=7$
$\displaystyle Var(N)=r \frac{1-p}{p^2}=3 \frac{0.7}{0.3^2}=\frac{7}{0.3}=23.3333$

Example 3
Actually any three given probabilities determine the entire (a,b,0) distribution. They do not have to be consecutive. Suppose we are given the following probabilities.

$P_1=0.33554432$
$P_2=0.29360128$
$P_4=0.0458752$

Applying the recursive relation (1) produces the following equations.

$\displaystyle \frac{P_2}{P_1}=\frac{0.29360128}{0.33554432}=a+\frac{b}{2}$
$\displaystyle \frac{P_3}{P_2}=\frac{P_3}{0.29360128}=a+\frac{b}{3}$
$\displaystyle \frac{P_4}{P_3}=\frac{0.0458752}{P_3}=a+\frac{b}{4}$

The above 3 equations lead to the following two equations.

$\displaystyle \frac{P_2}{P_1}=\frac{0.29360128}{0.33554432}=a+\frac{b}{2}$
$\displaystyle P_4=0.0458752=\biggl(a+\frac{b}{4} \biggr) \biggl(a+\frac{b}{3} \biggr) \ 0.29360128$

Of the above two equations, one is a linear equation and one is a quadratic equation. Solving these two equations produces $a=-0.25$ and $b=2.25$. Since $a$ is negative, this is a binomial distribution. Using the translation in Table 1 gives the following equations.

$\displaystyle -\frac{p}{1-p}=-0.25 \ \ \ \ \ \ \ \ \ \ \ (n+1) \ \frac{p}{1-p}=2.25$

Solving these equations gives $n=8$ and $p=0.2$. The (a,b,0) distribution in question is then completely determined.

Factorial Moments

Another distributional quantity that can give insight into the (a,b,0) class is the factorial moment. For any random variable $X$, its $n$th factorial moment is

$(3) \ \ \ \ \ \mu_{(n)}=E[X (X-1) (X-2) \cdots (X-(n-1))]$

For example, the first three factorial moments are:

$\mu_{(1)}=E[X]$

$\mu_{(2)}=E[X (X-1)]$

$\mu_{(3)}=E[X (X-1) (X-2)]$

For any member of the (a,b,0) class with parameters $a$ and $b$, the first factorial moment is:

$\displaystyle (4) \ \ \ \ \ \mu_{(1)}=\frac{a+b}{1-a}$

The higher (a,b,0) factorial moments can be obtained recursively as follows:

$\displaystyle (5) \ \ \ \ \ \mu_{(j)}=\frac{a j +b}{1-a} \ \mu_{(j-1)} \ \ \ \ \ \ \ j \ge 2$

The recursive formula (5) is a good way to determine the raw moments of the member of the (a,b,0) class. For example, the following calculate the second raw moment and the variance of the random variable $N$, assumed to be a member of the (a,b,0) class with parameters $a$ and $b$.

$\displaystyle \mu_{(1)}=E[N]=\frac{a+b}{1-a}$

$\displaystyle \mu_{(2)}=\frac{2 a +b}{1-a} \ \frac{a+b}{1-a}=\frac{(2a+b) (a+b)}{(1-a)^2}=E[N (N-1)]=E[N^2]-E[N]$

$\displaystyle E[N^2]=\frac{(2a+b) (a+b)}{(1-a)^2}+\frac{a+b}{1-a}=\frac{(a+b) (a+b+1)}{(1-a)^2}$

$\displaystyle Var(N)=E[N^2]-E[N]^2=\frac{a+b}{(1-a)^2}$

One interesting characteristic of the (a,b,0) class is that knowing limited distributional information determines the distribution. Example 2 and Example 3 show that knowing three point masses completely determines the (a,b,0) distribution. The above derivation shows that knowing the mean and the variance also completely determines the (a,b,0) distribution.

Fitting (a,b,0) Distributions

If the (a,b,0) recursive formula in (1) generates no new distributions, why study (a,b,0) class and why not just focus on Poisson, binomial and negative binomial distribution individually? One reason for studying the recursive (a,b,0) formula is that it gives a graphical way to choose an appropriate member of the (a,b,0) class. To see this, rewrite (1) as follows:

$\displaystyle (6) \ \ \ \ \ k \ \frac{P_k}{P_{k-1}}=a k+ b \ \ \ \ \ \ \ \ \ \ \ \ \ k=1,2,3,\cdots$

Note that the quantity on the right side of (6) is a linear function of the integers $k$. If we plot the left hand side quantity of (6) with $k$ on the x-axis, the plot should be a linear one with the slope being the parameter $a$ and the y-intercept being the parameter $b$ (of course assuming it is an (a,b,0) distribution).

The relation (6) is a way to quickly determine whether a given sample is taken from a member of the (a,b,0) class. To do this, calculate the ratio of two consecutive data categories times $k$. In other words, compute ratio such as the following for values of $k$:

$\displaystyle (7) \ \ \ \ \ k \ \frac{\hat{P}_k}{\hat{P}_{k-1}}=k \ \frac{n_k}{n_{k-1}} \ \ \ \ \ \ \ \ \ \ \ \ \ k=1,2,3,\cdots$

where $n_k$ is the observed frequency for the category $k$. The ratio of $n_k$ to $n_{k-1}$ multiplied by $k$ is a stand-in for the left hand side of (6). Then plot these values against $k$. A linear trend that is observed in the graph is evidence that the data in the sample is taken from an (a,b,0) distribution.

The slope of the plotted line gives an indication of which (a,b,0) member to use. If the plot is approximately horizontal, then the Poisson model is appropriate. If the plot is a line with negative slope, then the binomial model is more appropriate. If the plot is approximately a line with positive slope, use the negative binomial model. For this approach to work properly, large observed data set is preferred.

The (a,b,1) Class

It is possible that the (a,b,0) distributions do not adequately describe a random counting phenomenon being observed. For example, the sample data may indicate that the probability at zero may be larger than is indicated by the distributions in the (a,b,0) class. One alternative is to assign a larger value for $P_0$ and recursively generate the subsequent probabilities $P_k$ for $k=2,3,\cdots$. The class of the distributions defined by this recursive scheme is called the (a,b,1) class, which is discussed in the next post.

Practice Problems

Practice problems on (a,b,0) class

Practice problems on (a,b,1) class

Reference

1. Panjer H. H., Wilmot G. E., Insurance Risk Models, Society of Actuaries, Chicago, 1992.

Dan Ma actuarial topics
Dan Ma actuarial
Dan Ma math

Daniel Ma actuarial
Daniel Ma mathematics
Daniel Ma actuarial topics

$\copyright$ 2018 – Dan Ma

Revised December 5, 2019

# The big 3 claim frequency models

We now turn the attention to discrete distributions. In particular the focus is on counting distributions. These are the discrete distributions that have positive probabilities only on the non-negative integers 0, 1, 2, 3, … One important application is on finding suitable counting distributions for modeling the number of losses or the number claims to an insurer or more generally the number of other random events that are of interest in actuarial applications. From a claim perspective, these counting distributions would be models for claim frequency. Combining frequency models with models for claim severity would provide a more complete picture of the exposure of risks to the insurer than using claim severity alone. This post and several subsequent posts are preparation for the discussion on modeling aggregate losses and claims. Another upcoming topic would be the effect of insurance coverage modifications (e.g. deductibles) on the claim frequency and claim severity.

This post focuses on the three commonly used counting distributions – Poisson, binomial and negative binomial (the big 3). These three distributions are the basis for defining a large class of other counting distributions.

Probability Generating Function

Let $Y$ be a random variable with positive probabilities only on the non-negative integers, i.e. $P(Y=k)$ is positive only for $k=0,1,2,\cdots$. The function $P(Y=k)$ is the probability of the occurrence of the event $Y=k$, i.e. the observed value of the random variable $Y$ is $k$. It is called the probability function of the random variable $Y$ (also called probability mass function). From the probability function, many other distributional quantities can be derived, e.g. mean, variance and higher moments.

We can also elicit information about $Y$ from its generating function. The generating function (or probability generating function) of $Y$ is defined by:

$\displaystyle P_Y(z)=p_0+p_1 z +p_2 z^2 + \cdots=\sum \limits_{j=0}^\infty p_j z^j$

where each $p_j=P(Y=j)$. The generating function $P_Y(z)$ is defined wherever the infinite sum converges. At minimum, $P_Y(z)$ converges for $\lvert z \lvert \le 1$. Some $P_Y(z)$ converges for all real $z$, e.g. when $Y$ has a Poisson distribution (see below).

One reason for paying attention to generating function is that the moments of $Y$ can be generated from $P_Y(z)$. The $n$th moment of $Y$ is derived from the result of taking the $n$th derivative of $P_Y(z)$ and evaluating at $z=1$.

$\displaystyle E[Y (Y-1) (Y-2) \cdots (Y-(n-1)]=P_Y^{(n)}(1)$

The above expectation is said to be a factorial moment. It follows that $E(Y)=P_{Y}^{(1)}(1)$. Since $E[Y (Y-1)]=P_Y^{(2)}(1)$, the second moment is $E(Y^2)=P_{Y}^{(1)}(1)+P_Y^{(2)}(1)$. In general, the $n$th moment $E(Y^n)$ can be expressed in terms of $P_Y^{(k)}(1)$ for all $k \ge n$.

Another application of generating function is that $P_Y(z)$ encodes the probability function $P(Y=k)$, which is obtained by taking the derivatives of $P_Y(z)$ and then evaluated at $z=0$.

$\displaystyle P(Y=n)=\frac{P_{Y}^{(n)}(0)}{n!}$

where $n=0,1,2,3,\cdots$. Another useful property about generating function is that the probability distribution of a random variable is uniquely determined by its generating function. This fundamental property is useful in determining the distribution of an independent sum. The generating function of an independent sum of random variables is simply the product of the individual generating functions. If the product is the generating function of a certain distribution, then the independent sum must be of the same distribution.

For a more detailed discussion on probability generating function, see this blog post in a companion blog.

Poisson Distribution

We now describe the three counting distributions indicated at the beginning of the post. We start with the Poisson distribution. Consider a random variable $X$ that only takes on the non-negative integers. For each $k=0,1,2,\cdots$, let $p_k=P(X=k)$.

The random variable $X$ has a Poisson distribution if its probability function is:

$\displaystyle p_k=\frac{e^{-\lambda} \lambda^k}{k!} \ \ \ \ \ \ k=0,1,2,\cdots$

for some positive constant $\lambda$. This constant $\lambda$ is the parameter of the Poisson distribution in question. It is also the mean and variance of the Poisson distribution. The following is the probability generating function of the Poisson distribution.

$\displaystyle P_X(z)=e^{\lambda \ (z-1)}$

The Poisson generating function is defined for all real numbers $z$. The mean and variance and higher moments can be computed using the generating function.

$E(X)=P_X^{(1)}(1)=\lambda$

$E[X (X-1)]=P_X^{(2)}(1)=\lambda^2$

$E[X^2]=P_X^{(1)}(1)+P_X^{(2)}(1)=\lambda+\lambda^2$

$Var(X)=E(X^2)-E(X)^2=\lambda$

One interesting characteristic of the Poisson distribution is that its mean is the same the variance. From a mathematical standpoint, the Poisson distribution arises from the Poisson process (see a more detailed discussion here). Another discussion of the Poisson distribution is found here.

One useful characteristic of the Poisson distribution is that combining independent Poisson distributions results in another Poisson distribution. Suppose that $X_1,X_2,\cdots,X_n$ are independent Poisson random variables with means $\lambda_1,\lambda_2,\cdots,\lambda_n$, respectively. Then the probability generating function of the sum $X=X_1+X_2+\cdots +X_n$ is simply the product of the individual probability generating functions.

$\displaystyle P_X(z)=\prod \limits_{j=1}^n e^{\lambda_j (z-1)}=e^{\lambda \ (z-1)}$

where $\lambda=\lambda_1+\lambda_2+\cdots +\lambda_n$. The probability generating function of the sum $X$ is the generating function of a Poisson distribution. Thus independent sum of Poisson distributions is a Poisson distribution with parameter being the sum of the individual Poisson parameters.

Another useful property is that of splitting a Poisson distribution. For example, suppose that the number of claims $N$ in a given year follows a Poisson distribution with mean $\lambda$ per year. Also suppose that the claims can be classified into $W$ distinct types such that the probability of a claim being of type $i$ is $p_i$, $i=1,2,\cdots, W$ and such that $p_1+\cdots+p_W=1$. If we are interested in studying the number $N_i$ of claims in a year that are of type $i$, $i=1,2,\cdots,W$, then $N_1,N_2,\cdots,N_W$ are independent Poisson random variables with means $\lambda_1 p_1, \lambda_2 p_2,\cdots,\lambda_W p_W$, respectively. For a mathematical discussion of this Poisson splitting phenomenon, see this blog post in a companion blog.

Binomial Distribution

Consider a series of independent events each of which results in one of two distinct outcomes (one is called Success and the other Failure) in such a way that the probability of observing a Success in a trial is constant across all trials (these are called Bernoulli trials). For a binomial distribution, we are only interested in observing $n$ such trials and count the number of successes in these $n$ trials.

More specifically, let $p$ be the probability of observing a Success in a Bernoulli trial. Let $X$ be the number of Successes observed in $n$ independent trials. Then the random variable $X$ is said to have a binomial distribution with parameters $n$ and $p$.

Note that the random variable $X$ is the independent sum of $X_1,X_2,\cdots,X_n$ where $X_i$ is the number of Success in the $i$th Bernoulli trial. Thus $X_i$ is 1 with probability $p$ and is 0 with probability $1-p$. Its probability generating function would be:

$\displaystyle g(z)=(1-p) z^0 +p z^1=1-p+p z$

As a result, the probability generating function for $X$ would be $g(z)$ raised to $n$.

$\displaystyle P_X(z)=(1-p+p z)^n$

The generating function $P_X(z)$ is defined for all real values $z$. Differentiating $P_X(z)$ twice produces the mean and variance.

$E(X)=n p$

$E[X (X-1)]=n (n-1) p^2$

$Var(X)=n p (1-p)$

By differentiating $P_X(z)$ and evaluating at $z=0$, we obtain the probability function.

$\displaystyle P(X=k)=\frac{P_X^{(k)}(0)}{k!}=\binom{n}{k} p^k (1-p)^{n-k}$

where $k=0,1,2,\cdots,n$.

By taking the product of probability generating functions, it follows that the independent sum $Y=Y_1+Y_2+\cdots+Y_m$, where each $Y_i$ has a binomial distribution with parameters $n_i$ and $p$, has a binomial distribution with parameters $n=n_1+\cdots+n_m$ and $p$. In other words, as long as the probability of success $p$ is identical in the binomial distributions, the independent sum is always a binomial distribution.

Note that the variance of the binomial distribution is less than the mean. Thus the binomial distribution is suitable candidate for modeling frequency for the situations where the sample variance is smaller than the sample mean.

Negative Binomial Distribution

As mentioned above, the Poisson distribution requires that the mean and the variance are equaled. The binomial binomial distribution requires that the variance is smaller than the mean. Thus these two counting distributions are not appropriate in all cases. The negative binomial distribution is an excellent alternative to the Poisson distribution and the binomial distribution, especially in the cases where the observed variance is greater than the observed mean.

The negative binomial naturally arises from the same probability experiment that generates the binomial distribution. Consider a series of independent Bernoulli trials each of which results in one of two distinct outcomes (called success and failure) in such a way that the probability of success $p$ is constant across the trials. Instead of observing the outcomes in a fixed number of trials, we now observe the trials until $r$ number of success have occurred.

As we observe the Bernoulli trials, let’s $Y$ be the number of failures until the $r$th success has occurred. The random variable $Y$ has a negative binomial distribution with parameters $r$ and $p$. The parameter $r$ is necessarily a positive integer and the parameter $p$ is a real number between 0 and 1. The following is the probability function for the random variable $Y$.

\displaystyle \begin{aligned} P(Y=k)&=\binom{r+k-1}{k} \ p^r (1-p)^k \\&=\frac{(r+k-1)!}{k! \ (r-1)!} \ p^r (1-p)^k \ \ \ \ \ \ k=0,1,2,\cdots \\& \end{aligned}

In the above probability function, the parameter $r$ must be a positive integer. The binomial coefficient $\binom{r+k-1}{k}$ is computed by its usual definition. The above probability probability function can be relaxed so that $r$ can be any positive real number. The key to the relaxation is a reformulation of the binomial coefficient.

$\displaystyle \binom{n}{j}=\left\{ \begin{array}{ll} \displaystyle \frac{n (n-1) (n-2) \cdots (n-(j-1))}{j!} &\ n>j-1, j=1,2,3,\cdots \\ \text{ } & \text{ } \\ \displaystyle 1 &\ j=0 \end{array} \right.$

Note that in the above formulation, the $n$ in $\binom{n}{j}$ does not have to be an integer. If $n$ were to be a positive integer, the usual definition $\binom{n}{j}=\frac{n!}{j! (n-j)!}$ would lead to the same calculation. The reformulation is a generalization of the usual binomial coefficient definition.

With the new definition of binomial coefficient, the following is the probability function of the negative binomial distribution in the general case.

$\displaystyle P(Y=k)=\binom{r+k-1}{k} \ p^r (1-p)^k \ \ \ \ \ \ k=0,1,2,\cdots$

The following is the same probability function with the binomial coefficient explicitly written out.

$\displaystyle P(Y=k)=\left\{ \begin{array}{ll} \displaystyle p^r &\ k=0 \\ \text{ } & \text{ } \\ \displaystyle \frac{(k-1+r) \cdots (1+r) r}{k!} \ p^r \ (1-p)^k &\ k=1,2,\cdots \end{array} \right.$

For either of the above versions, the mean and variance are:

$\displaystyle E(Y)=\frac{r (1-p)}{p}$

$\displaystyle Var(Y)=\frac{r (1-p)}{p^2}$

Another formulation of the negative binomial distribution is that it is a Poisson-gamma mixture. The following is the probability function.

$\displaystyle P(Y=k)=\left\{ \begin{array}{ll} \displaystyle \biggl(\frac{1}{\beta+1} \biggr)^r &\ k=0 \\ \text{ } & \text{ } \\ \displaystyle \frac{(k-1+r) \cdots (1+r) r}{k!} \ \biggl(\frac{1}{\beta+1} \biggr)^r \biggl(\frac{\beta}{\beta+1} \biggr)^k &\ k=1,2,\cdots \end{array} \right.$

It is still a 2-parameter discrete distribution. The parameters $r$ and $\beta$ originate from the parameters of the gamma distribution in the Poisson-gamma mixture. The mean and variance are:

$\displaystyle E(Y) = r \ \beta$

$\displaystyle Var(Y)=r \ \beta \ (1+\beta)$

The negative binomial distribution has been discussed at length in blog posts in several companion blogs. For the natural interpretation of negative binomial distribution based on counting the number of failures until the $r$th success, see this blog post. This is an excellent introduction.

For the general version of the negative binomial distribution where the parameter $r$ can be any positive real number, see this blog post.

For the version of negative binomial distribution from a Poisson-Gamma mixture point of view, see this blog post.

This blog post has additional facts about the negative binomial distribution. This blog post summarizes the various versions as well as focusing on the calculation of probabilities.

More Counting Distributions

The three counting distributions – Poisson, binomial and negative binomial – provide a versatile tool kit in modeling the number of random events such as losses to the insured or claims to the insurer. The tool kit can be greatly expanded by modifying these three distributions to generate additional distributions. The new distributions belong to the (a,b,0) and (a,b,1) classes. This topic is discussed in the subsequent posts.

Dan Ma actuarial topics
Dan Ma actuarial
Dan Ma math

Daniel Ma actuarial
Daniel Ma mathematics
Daniel Ma actuarial topics

$\copyright$ 2018 – Dan Ma