# Tail-value-at-risk of a mixture

The risk measures of value-at-risk and tail-value-at-risk are discussed in the preceding post. This post extends the preceding post with an algorithm on evaluating the tail-value-at-risk of a mixture distribution with discrete mixing weights.

The preceding post introduces the notions of value-at-risk (VaR) and tail-value-at-risk (TVaR). These are two particular examples of risk measures that are useful for insurance companies and other enterprises in a risk management context. For $0, VaR at the security level $p$ gives the threshold that the probability of a loss more adverse than the threshold is at most $1-p$. Thus in our context VaR is a percentile of the loss distribution. TVaR is a conditional expected value. At the security level $p$, TVaR is the expected value of the losses given that the losses exceed the threshold VaR.

The preceding post gives several representations of TVaR. It also gives the formula for TVaR for several distributions – exponential, Pareto, normal and lognormal. We now discuss TVaR of a mixture distribution. If a distribution is the mixture of two distributions and if each of the individual distributions has a clear formulation of TVaR, can we mix the two TVaR's? The answer is that we can provided some adjustments are made. The following gives the formula.

Suppose that the loss $X$ is a mixture of two distributions represented by the random variables $X_1$ and $X_2$, with weights $w$ and $1-w$, respectively. Let $\pi_p$ be the $100p$th percentile of the loss $X$, i.e. $\pi_p=\text{VaR}_p(X)$. Then the tail-value-at-risk at the $100p$ percent security level is:

\displaystyle \begin{aligned} \text{TVaR}_p(X)&=\pi_p+\frac{1}{1-p} \biggl[w \times P(X_1>\pi_p) \times e_{X_1}(\pi_p)\\& \ \ +(1-w) \times P(X_2>\pi_p) \times e_{X_2}(\pi_p)\biggr] \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (a) \end{aligned}

The comparison is with formula (3) in the preceding post. That formula shows that TVaR is $\pi_p+e(\pi_p)$. In other words, TVaR is VaR plus the mean excess loss function evaluated at $\pi_p$. The content within the squared brackets in Formula (a) is a weighted average of the two individual mean excess loss functions with the adjustment of multiplying with the probabilities $P(X_1>\pi_p)$ and $P(X_2>\pi_p)$. This formula is useful if the $\pi_p$ (VaR) can be calculated and if the mean excess loss functions are accessible.

We give an example and then show the derivation of Formula (a).

Example 1
The mean excess loss function of an exponential distribution is constant. Let’s consider the mixture of two exponential distributions. Suppose that losses follow a mixture of two exponential distributions where one distribution has mean 5 (75% weight) and the other has mean 10 (25% weight). Determine the VaR and TVaR at the security level 99%.

First, calculate the 99th percentile of the mixture, which is the solution to the following equation.

$\displaystyle 0.75 e^{-x/5}+0.25 e^{-x/10}=1-0.99=0.01$

By letting $y=e^{-x/10}$, we solve the following equation.

$\displaystyle 0.75 y^2+0.25 y-0.01=0$

Use the quadratic formula to solve for $y$. Then solve for $x$. The following is the 99th percentile of the loss $X$.

$\displaystyle \pi_p=-10 \times \text{ln} \biggl(\frac{-1+\sqrt{1.48}}{6} \biggr)=33.2168$

The following gives the TVaR.

\displaystyle \begin{aligned} \text{TVaR}_p(X)&=\pi_p+\frac{1}{1-0.99} \biggl[0.75 \times e^{-\pi_p/5} \times 5 +0.25 \times e^{-\pi_p/10} \times 10 \biggr] \\&=42.7283 \end{aligned}

Note that the mean excess loss for the first exponential distribution is 5 and for the second one is 10 (the unconditional means). The survival functions $P(X_1>\pi_p)$ and $P(X_2>\pi_p)$ are also easy to evaluate. As long as the percentile $\pi_p$ of the mixture is calculated, the formula is very useful. In this example, the two exponential parameters are set so that the calculation of percentiles uses the quadratic formula. If the parameters are set differently, then we can use software to evaluate the required percentile.

Deriving the formula

Suppose that $X$ is the mixture of $X_1$, with weight $w$, and $X_2$, with weight $1-w$. The density function for $X_1$ is $f_1(x)$ and the density function for $X_2$ is $f_2(x)$. The density function of $X$ is then $f(x)=w f_1(x)+(1-w) f_2(x)$. We derive from the basic definition of TVaR. Let $\pi_p$ be the $100p$th percentile of $X$.

\displaystyle \begin{aligned} \text{TVaR}_p(X)&=\frac{\int_{\pi_p}^\infty x f(x) \ dx}{1-p}\\&=\pi_p+\frac{\int_{\pi_p}^\infty (x-\pi_p) f(x) \ dx}{1-p} \\&=\pi_p+\frac{\int_{\pi_p}^\infty (x-\pi_p) (w f_1(x)+(1-w) f_2(x)) \ dx}{1-p} \\&=\pi_p+\frac{1}{1-p} \biggl[w \int_{\pi_p}^\infty (x-\pi_p) f_1(x) \ dx +(1-w) \int_{\pi_p}^\infty (x-\pi_p) f_2(x) \ dx\biggr] \\&=\pi_p+\frac{1}{1-p} \biggl[w \ P(X_1>\pi_p) \ \frac{\int_{\pi_p}^\infty (x-\pi_p) f_1(x) \ dx}{P(X_1>\pi_p)}\\& \ \ +(1-w) \ P(X_2>\pi_p) \ \frac{\int_{\pi_p}^\infty (x-\pi_p) f_2(x) \ dx}{P(X_2>\pi_p)} \biggr] \\&=\pi_p+\frac{1}{1-p} \biggl[w \ P(X_1>\pi_p) \ e_{X_1}(\pi_p) +(1-w) \ P(X_2>\pi_p) \ e_{X_2}(\pi_p) \biggr] \end{aligned}

The formula derived here is for mixtures for two distributions. It is straightforward to extend it for mixtures of any finite-mixture.

Practice Problems

Practice problems are available in the companion blog to reinforce the concepts of value-at-risk and tail-value-at-risk. Practice Problems 10-G and 10-H in that link are for TVaR of mixtures.

actuarial
math

Daniel Ma
mathematics

$\copyright$ 2018 – Dan Ma

# Examples of mixtures

The notion of mixtures is discussed in this previous post. Many probability distributions useful for actuarial modeling are mixture distributions. The previous post touches on some examples – negative binomial distribution (a Poisson-Gamma mixture), Pareto distribution (an exponential-gamma mixture) and the normal-normal mixture. In this post we present additional examples. We discuss the following examples.

1. Poisson-Gamma mixture = Negative Binomial.
2. Normal-Normal mixture = Normal.
3. Exponential-Gamma mixture = Pareto.
4. Exponential-Inverse Gamma mixture = Pareto.
5. Gamma-Gamma mixture = Generalized Pareto.
6. Weibull-Exponential mixture = Loglogistic.
7. Gamma-Geometric mixture = Exponential.
8. Normal-Gamma mixture = Student t.

The first three examples are discussed in the previous post. We discuss the remaining examples in this post.

The Pareto Family

Examples 3 and 4 show that Pareto distributions are mixtures of exponential distributions with either gamma or inverse gamma mixing weights. In Example 3, $X \lvert \Theta$ is an exponential distribution with $\Theta$ being a rate parameter. When $\Theta$ follows a gamma distribution, the resulting mixture is a (Type I Lomax) Pareto distribution. In Example 4, $X \lvert \Theta$ is an exponential distribution with $\Theta$ being a scale parameter. When $\Theta$ follows an inverse gamma distribution, the resulting mixture is also a (Type I Lomax) Pareto distribution.

As a mixture, Example 5 is like Example 3, except that it is a gamma-gamma mixture resulting in a generalized Pareto distribution. Example 3 has been discussed in the previous post. We now discuss Example 4 and Example 5.

Example 4. Suppose that $X \lvert \Theta$ has an exponential distribution where $\Theta$ is a scale parameter.
Further suppose that the random parameter $\Theta$ follows an inverse gamma distribution with parameters $\alpha$ and $\beta$. Then the unconditional distribution for $X$ is a (Type I Lomax) Pareto distribution with shape parameter $\alpha$ and scale parameter $\beta$.

The following gives the cumulative distribution function (CDF) and survival function of the conditional random variable $X \lvert \Theta$.

$F(x \lvert \Theta)=1-e^{- x/\Theta}$

$S(x \lvert \Theta)=e^{- x/\Theta}$

The random parameter $\Theta$ follows an inverse gamma distribution with parameters $\alpha$ and $\beta$. The following is the pdf of $\Theta$:

$\displaystyle g(\theta)=\frac{1}{\Gamma(\alpha)} \ \biggl[\frac{\beta}{\theta}\biggr]^\alpha \ \frac{1}{\theta} \ e^{-\frac{\beta}{ \theta}} \ \ \ \ \ \theta>0$

We show that the unconditional survival function for $X$ is the survival function for the Pareto distribution with parameters $\alpha$ (shape parameter) and $\beta$ (scale parameter).

\displaystyle \begin{aligned} S(x)&=\int_0^\infty S(x \lvert \theta) \ g(\theta) \ d \theta \\&=\int_0^\infty e^{- x/\theta} \ \frac{1}{\Gamma(\alpha)} \ \biggl[\frac{\beta}{\theta}\biggr]^\alpha \ \frac{1}{\theta} \ e^{-\beta / \theta} \ d \theta \\&=\int_0^\infty \frac{1}{\Gamma(\alpha)} \ \biggl[\frac{\beta}{\theta}\biggr]^\alpha \ \frac{1}{\theta} \ e^{-(x+\beta) / \theta} \ d \theta \\&=\frac{\beta^\alpha}{(x+\beta)^\alpha} \ \int_0^\infty \frac{1}{\Gamma(\alpha)} \ \biggl[\frac{x+\beta}{\theta}\biggr]^\alpha \ \frac{1}{\theta} \ e^{-(x+\beta) / \theta} \ d \theta \\&=\biggl(\frac{\beta}{x+\beta} \biggr)^\alpha \end{aligned}

Note that the the integrand in the last integral is a density function for an inverse gamma distribution. Thus the integral is 1 and can be eliminated. The result that remains is the survival function for a Pareto distribution with parameters $\alpha$ and $\beta$. The following gives the CDF and density function of this Pareto distribution.

$\displaystyle F(x)=1-\biggl(\frac{\beta}{x+\beta} \biggr)^\alpha$

$\displaystyle f(x)=\frac{\alpha \ \beta^{\alpha}}{(x+\beta)^{\alpha+1}}$

See here for further information on Pareto Type I Lomax distribution.

Example 5. Suppose that $X \lvert \Theta$ has a gamma distribution with shape parameter $k$ (a known constant) and rate parameter $\Theta$. Further suppose that the random parameter $\Theta$ follows a gamma distribution with shape parameter $\alpha$ and rate parameter $\beta$. Then the unconditional distribution for $X$ is a generalized Pareto distribution with parameters $\alpha$, $\beta$ and $k$.

Conditional on $\Theta=\theta$, the following is the density function of $X$.

$\displaystyle f(x \lvert \theta)=\frac{1}{\Gamma(k)} \ \theta^k \ x^{k-1} \ e^{-\theta x} \ \ \ \ \ x>0$

The following is the density function of the random parameter $\Theta$.

$\displaystyle g(\theta)=\frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta} \ \ \ \ \ \ \theta>0$

The following gives the unconditional density function for $X$.

\displaystyle \begin{aligned} f(x)&=\int_0^\infty f(x \lvert \theta) \ g(\theta) \ d \theta \\&=\int_0^\infty \frac{1}{\Gamma(k)} \ \theta^k \ x^{k-1} \ e^{-\theta x} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta} \ d \theta \\&=\int_0^\infty \frac{1}{\Gamma(k)} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ x^{k-1} \ \theta^{\alpha+k-1} \ e^{-(x+\beta) \theta} \ d \theta \\&= \frac{1}{\Gamma(k)} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ x^{k-1} \frac{\Gamma(\alpha+k)}{(x+\beta)^{\alpha+k}} \int_0^\infty \frac{1}{\Gamma(\alpha+k)} \ (x+\beta)^{\alpha+k} \ \theta^{\alpha+k-1} \ e^{-(x+\beta) \theta} \ d \theta \\&=\frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \frac{\beta^\alpha \ x^{k-1}}{(x+\beta)^{\alpha+k}} \end{aligned}

Any distribution that has a density function described above is said to be a generalized Pareto distribution with the parameters $\alpha$, $\beta$ and $k$. Its CDF cannot be written in closed form but can be expressed using the incomplete beta function.

\displaystyle \begin{aligned} F(x)&=\int_0^x \frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \frac{\beta^\alpha \ t^{k-1}}{(t+\beta)^{\alpha+k}} \ dt \\&=\int_0^x \frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \biggl(\frac{t}{t+\beta} \biggr)^{k-1} \ \biggl(\frac{\beta}{t+\beta} \biggr)^{\alpha-1} \ \frac{\beta}{(t+\beta)^2} \ dt \\&=\frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \int_0^{\frac{x}{x+\beta}} u^{k-1} \ (1-u)^{\alpha-1} \ du, \ \ \ u=\frac{t}{t+\beta} \\&=\frac{\Gamma(\alpha+k)}{\Gamma(\alpha) \ \Gamma(k)} \ \int_0^{w} t^{k-1} \ (1-t)^{\alpha-1} \ dt, \ \ \ w=\frac{x}{x+\beta} \end{aligned}

The moments can be easily derived for the generalized Pareto distribution but on a limited basis. Since it is a mixture distribution, the unconditional mean is the weighted average of the conditional means.

\displaystyle \begin{aligned} E(X^w)&=\int_0^\infty E(X \lvert \theta) \ g(\theta) \ d \theta \\&=\int_0^\infty \frac{\Gamma(k+w)}{\theta^w \Gamma(k)} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta} \ d \theta \\&=\frac{\beta^w \ \Gamma(k+w) \ \Gamma(\alpha-w)}{\Gamma(k) \ \Gamma(\alpha)} \int_0^\infty \frac{1}{\Gamma(\alpha-w)} \ \beta^{\alpha-w} \ \theta^{\alpha-w-1} \ e^{-\beta \theta} \ d \theta \\&=\frac{\beta^w \ \Gamma(k+w) \ \Gamma(\alpha-w)}{\Gamma(k) \ \Gamma(\alpha)} \ \ \ \ -k

Note that $E(X)$ has a simple expression $E(X)=\frac{k \beta}{\alpha-1}$ when $1<\alpha$.

When the parameter $k=1$, the conditional distribution for $X \lvert \Theta$ is an exponential distribution. Then the situation reverts back to Example 3, leading to a Pareto distribution. Thus the Pareto distribution is a special case of the generalized Pareto distribution. Both the Pareto distribution and the generalized Pareto distribution have thicker and longer tails than the original conditional gamma distribution.

It turns out that the F distribution is also a special case of the generalized Pareto distribution. The F distribution with $r_1$ and $r_2$ degrees of freedom is the generalized Pareto distribution with parameters $k=r_1/2$, $\alpha=r_2/2$ and $\beta=r_2/r_1$. As a result, the following is the density function.

\displaystyle \begin{aligned} h(x)&=\frac{\Gamma(r_1/2 + r_2/2)}{\Gamma(r_1/2) \ \Gamma(r_2/2)} \ \frac{(r_2/r_1)^{r_2/2} \ x^{r_1/2-1}}{(x+r_2/r_1)^{r_1/2+r_2/2}} \\&=\frac{\Gamma(r_1/2 + r_2/2)}{\Gamma(r_1/2) \ \Gamma(r_2/2)} \ \frac{(r_1/r_2)^{r_1/2} \ x^{r_1/2-1}}{(1+(r_1/r_2)x)^{r_1/2+r_2/2}} \ \ \ \ 0

Another way to generate the F distribution is from taking a ratio of two chi-squared distributions (see Theorem 9 in this previous post). Of course, there is no need to use the explicit form of the density function of the F distribution. In a statistical application, the F distribution is accessed using tables or software.

The Loglogistic Distribution

The loglogistic distribution can be derived as a mixture of Weillbull distribution with exponential mixing weights.

Example 6. Suppose that $X \lvert \Lambda$ has a Weibull distribution with shape parameter $\gamma$ (a known constant) and a parameter $\Lambda$ such that the CDF of $X \lvert \Lambda$ is $F(x \lvert \Lambda)=1-e^{-\Lambda \ x^\gamma}$. Further suppose that the random parameter $\Lambda$ follows an exponential distribution with rate parameter $\theta^{\gamma}$. Then the unconditional distribution for $X$ is a loglogistic distribution with shape parameter $\gamma$ and scale parameter $\theta$.

The following gives the conditional survival function for $X \lvert \Lambda$ and the exponential mixing weight.

$\displaystyle S(x \lvert \lambda)=e^{-\lambda \ x^\gamma}$

$\displaystyle g(\lambda)=\theta^\gamma \ e^{-\theta^\gamma \ \lambda}$

The following gives the unconditional survival function and CDF of $X$ as well as the PDF.

\displaystyle \begin{aligned} S(x)&=\int_0^\infty S(x \lvert \lambda) \ g(\lambda) \ d \lambda \\&=\int_0^\infty e^{-\lambda \ x^\gamma} \ \theta^\gamma \ e^{-\theta^\gamma \ \lambda} \ d \lambda \\&=\int_0^\infty \theta^\gamma \ e^{-(x^\gamma+\theta^\gamma) \ \lambda} \ d \lambda \\&=\frac{\theta^\gamma}{(x^\gamma+\theta^\gamma)} \int_0^\infty (x^\gamma+\theta^\gamma) \ e^{-(x^\gamma+\theta^\gamma) \ \lambda} \ d \lambda \\&=\frac{\theta^\gamma}{x^\gamma+\theta^\gamma} \end{aligned}

\displaystyle \begin{aligned} F(x)&=1-S(x)=1-\frac{\theta^\gamma}{x^\gamma+\theta^\gamma} =\frac{x^\gamma}{x^\gamma+\theta^\gamma} =\frac{(x/\theta)^\gamma}{1+(x/\theta)^\gamma} \end{aligned}

$\displaystyle f(x)=\frac{d}{dx} \biggl( \frac{x^\gamma}{x^\gamma+\theta^\gamma} \biggr)=\frac{\gamma \ (x/\theta)^\gamma}{x [1+(x/\theta)^\gamma]^2}$

Any distribution that has any one of the above three distributional quantities is said to be a loglogistic distribution with shape parameter $\gamma$ and scale parameter $\theta$.

One interesting point about loglogistic distribution that an inverse loglogistic distribution is another loglogistic distribution. Suppose that $X$ has a loglogistic distribution with shape parameter $\gamma$ and scale parameter $\theta$. Let $Y=\frac{1}{X}$. Then $Y$ has a loglogistic distribution with shape parameter $\gamma$ and scale parameter $\theta^{-1}$.

\displaystyle \begin{aligned} P[Y \le y]&=P[\frac{1}{X} \le y] =P[X \ge y^{-1}] =\frac{\theta^\gamma}{y^{-\gamma}+\theta^\gamma} \\&=\frac{\theta^\gamma \ y^\gamma}{1+\theta^\gamma \ y^\gamma} \\&=\frac{y^\gamma}{(\theta^{-1})^\gamma+y^\gamma} \end{aligned}

The above is a survival function for the loglogistic distribution with the desired parameters. Thus there is no need to specially call out the inverse loglogistic distribution.

In order to find the mean and higher moments of the loglogistic distribution, we take the approach of identifying the conditional Weibull means and the weight these means by the exponential mixing weights. Note that the parameter $\Lambda$ in the conditional CDF $F(x \lvert \Lambda)=1-e^{-\Lambda \ x^\gamma}$ is not a scale parameter. The Weibull distribution in this conditional CDF is equivalent to a Weibull distribution with shape parameter $\gamma$ and scale parameter $\Lambda^{-1/\gamma}$. According to formula (4) in this previous post, the $k$th moment of this Weillbull distribution is

$\displaystyle E[ (X \lvert \Lambda)^k]=\Gamma \biggl(1+\frac{k}{\gamma} \biggr) \Lambda^{-k/\gamma}$

The following gives the unconditional $k$th moment of the Weibull-exponential mixure.

\displaystyle \begin{aligned} E[X^k]&=\int_0^\infty E[ (X \lvert \Lambda)^k] \ g(\lambda) \ d \lambda \\&=\int_0^\infty \Gamma \biggl(1+\frac{k}{\gamma} \biggr) \lambda^{-k/\gamma} \ \theta^\gamma \ e^{-\theta^\gamma \ \lambda} \ d \lambda\\&=\Gamma \biggl(1+\frac{k}{\gamma} \biggr) \ \theta^\gamma \int_0^\infty \lambda^{-k/\gamma} \ e^{-\theta^\gamma \ \lambda} \ d \lambda \\&=\theta^k \ \Gamma \biggl(1+\frac{k}{\gamma} \biggr) \int_0^\infty t^{-k/\gamma} \ e^{-t} \ dt \ \ \text{ where } t=\theta^\gamma \lambda \\&=\theta^k \ \Gamma \biggl(1+\frac{k}{\gamma} \biggr) \int_0^\infty t^{[(\gamma-k)/\gamma]-1} \ e^{-t} \ dt \\&=\theta^k \ \Gamma \biggl(1+\frac{k}{\gamma} \biggr) \ \Gamma \biggl(1-\frac{k}{\gamma} \biggr) \ \ \ \ -\gamma

The range $\gamma follows from the fact that the arguments of the gamma function must be positive. Thus the $k$th moments of the loglogistic distribution are limited by its shape parameter $\gamma$. If $\gamma=1$, then $E(X)$ does not exist. For a larger $\gamma$, more moments exist but always a finite number of moments. This is an indication that the loglogistic distribution has a thick (right) tail. This is not surprising since mixture distributions (loglogistic in this case) tend to have thicker tails than the conditional distributions (Weibull in this case). The thicker tail is a result of the uncertainty in the random parameter in the conditional distribution (the Weibull $\Lambda$ in this case).

Another Way to Obtain Exponential Distribution

We now consider Example 7. The following is a precise statement of the gamma-geometric mixture.

Example 7. Suppose that $X \lvert \alpha$ has a gamma distribution with shape parameter $\alpha$ that is a positive integer and rate parameter $\beta$ (a known constant). Further suppose that the random parameter $\alpha$ follows a geometric distribution with probability function $P[Y=\alpha]=p (1-p)^{\alpha-1}$ where $\alpha=1,2,3,\cdots$. Then the unconditional distribution for $X$ is an exponential distribution with rate parameter $\beta p$.

The conditional gamma distribution has an uncertain shape parameter $\alpha$ that can take on positive integers. The parameter $\alpha$ follows a geometric distribution. Here’s the ingredients that go into the mixture.

$\displaystyle f(x \lvert \alpha)=\frac{1}{(\alpha-1)!} \ \beta^\alpha \ x^{\alpha-1} \ e^{-\beta x}$

$P[Y=\alpha]=p (1-p)^{\alpha-1}$

The following is the unconditional probability density function of $X$.

\displaystyle \begin{aligned} f(x)&=\sum \limits_{\alpha=1}^\infty f(x \lvert \alpha) \ P[Y=\alpha] \\&=\sum \limits_{\alpha=1}^\infty \frac{1}{(\alpha-1)!} \ \beta^\alpha \ x^{\alpha-1} \ e^{-\beta x} \ p (1-p)^{\alpha-1} \\&=\beta p \ e^{-\beta x} \sum \limits_{\alpha=1}^\infty \frac{[\beta(1-p) x]^{\alpha-1}}{(\alpha-1)!} \\&=\beta p \ e^{-\beta x} \sum \limits_{\alpha=0}^\infty \frac{[\beta(1-p) x]^{\alpha}}{(\alpha)!} \\&=\beta p \ e^{-\beta x} \ e^{\beta(1-p) x} \end{aligned}

The above density function is that of an exponential distribution with rate parameter $\beta p$.

Student t Distribution

Example 3 (discussed in the previous post) involves a normal distribution with a random mean. Example 8 involves a normal distribution with mean 0 and an uncertain variance, which follows a gamma distribution such that the two gamma parameters are related to a common parameter $r$, which will be the degrees of freedom of the student t distribution. The following is a precise description of the normal-gamma mixture.

Example 8. Suppose that $X \lvert \Lambda$ has a normal distribution with mean 0 and variance $1/\Lambda$. Further suppose that the random parameter $\Lambda$ follows a gamma distribution with shape parameter $\alpha$ and scale parameter $\theta$ such that $2 \alpha=\frac{2}{\theta}=r$ is a positive integer. Then the unconditional distribution for $X$ is a student t distribution with $r$ degrees of freedom.

The following gives the ingredients of the normal-gamma mixture. The first item is the conditional density function of $X$ given $\Lambda$. The second is the density function of the mixing weight $\Lambda$.

$\displaystyle f(x \lvert \lambda)=\frac{1}{\sqrt{1/\lambda} \ \sqrt{2 \pi}} \ e^{-(\lambda/2) \ x^2}=\sqrt{\frac{\lambda}{2 \pi}} \ e^{-(\lambda/2) \ x^2}$

$\displaystyle g(\lambda)=\frac{1}{\Gamma(\alpha)} \biggl( \frac{1}{\theta} \biggr)^\alpha \ \lambda^{\alpha-1} \ e^{-\lambda/\theta}$

The following calculation derives the unconditional density function of $X$.

\displaystyle \begin{aligned} f(x)&=\int_{0}^\infty f(x \lvert \lambda) \ g(\lambda) \ d \lambda \\&=\int_{0}^\infty \sqrt{\frac{\lambda}{2 \pi}} \ e^{-(\lambda/2) \ x^2} \ \frac{1}{\Gamma(\alpha)} \biggl( \frac{1}{\theta} \biggr)^\alpha \ \lambda^{\alpha-1} \ e^{-\lambda/\theta} \ d \lambda \\&=\frac{1}{\Gamma(\alpha)} \ \biggl( \frac{1}{\theta} \biggr)^\alpha \ \frac{1}{\sqrt{2 \pi}} \int_0^\infty \lambda^{\alpha+\frac{1}{2}-1} e^{-(\frac{x^2}{2}+\frac{1}{\theta} ) \lambda} \ d \lambda \\&=\frac{\Gamma(\alpha+\frac{1}{2})}{\Gamma(\alpha)} \ \biggl( \frac{1}{\theta} \biggr)^\alpha \ \frac{1}{\sqrt{2 \pi}} \ \biggl(\frac{2 \theta}{\theta x^2+2} \biggr)^{\alpha+\frac{1}{2}} \\& \times \int_0^\infty \frac{1}{\Gamma(\alpha+\frac{1}{2})} \ \biggl(\frac{\theta x^2+2}{2 \theta} \biggr)^{\alpha+\frac{1}{2}} \lambda^{\alpha+\frac{1}{2}-1} e^{-\frac{\theta x^2+2}{2 \theta} \lambda} \ d \lambda \\&=\frac{\Gamma(\alpha+\frac{1}{2})}{\Gamma(\alpha)} \ \biggl( \frac{1}{\theta} \biggr)^\alpha \ \frac{1}{\sqrt{2 \pi}} \ \biggl(\frac{2 \theta}{\theta x^2+2} \biggr)^{\alpha+\frac{1}{2}} \ \ \ \ \ -\infty

The above density function is in terms of the two parameters $\alpha$ and $\theta$. In the assumptions, the two parameters are related to a common parameter $r$ such that $\alpha=\frac{r}{2}$ and $\theta=\frac{2}{r}$. The following derivation converts to the common $r$.

\displaystyle \begin{aligned} f(x)&=\frac{\Gamma(\frac{r}{2}+\frac{1}{2})}{\Gamma(\frac{r}{2})} \ \biggl( \frac{r}{2} \biggr)^{\frac{r}{2}} \ \frac{1}{\sqrt{2 \pi}} \ \biggl(\frac{2 \frac{2}{r}}{\frac{2}{r} x^2+2} \biggr)^{\frac{r}{2}+\frac{1}{2}} \\&=\frac{\Gamma(\frac{r}{2}+\frac{1}{2})}{\Gamma(\frac{r}{2})} \ \frac{r^{r/2}}{2^{r/2}} \ \frac{1}{2^{1/2} \sqrt{\pi}} \ \biggl(\frac{2/r}{x^2/r+1} \biggr)^{(r+1)/2} \\&=\frac{\Gamma \biggl(\displaystyle \frac{r+1}{2} \biggr)}{\Gamma \biggl(\displaystyle \frac{r}{2} \biggr)} \ \frac{1}{\sqrt{\pi r}} \ \frac{1 \ \ \ \ \ }{\biggl(1+\displaystyle \frac{x^2}{r} \biggr)^{(r+1)/2}} \ \ \ \ \ -\infty

The above density function is that of a student t distribution with $r$ degrees of freedom. Of course, in performing test of significance, the t distribution is accessed by using tables or software. A usual textbook definition of the student t distribution is the ratio of a normal distribution and a chi-squared distribution (see Theorem 6 in this previous post.

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 – Dan Ma

# Mixing probability distributions

This post discusses another way to generate new distributions from old, that of mixing distributions. The resulting distributions are called mixture distributions.

What is a Mixture?

First, let’s start with continuous mixture. Suppose that $X$ is a continuous random variable with probability density function (pdf) $f_{X \lvert \Theta}(x \lvert \theta)$ where $\theta$ is a parameter in the pdf. There may be other parameters in the distribution but they are not relevant at the moment (e.g. these other parameters may be known constants). Suppose that the parameter $\theta$ is an uncertain quantity and is a random variable with pdf $h_\Theta(\theta)$ (if $\Theta$ is a continuous random variable) or with probability function $P(\Theta=\theta)$ (if $\Theta$ a discrete random variable). Then taking the weighted average of $f_{X \lvert \Theta}(x \lvert \theta)$ with $h_\Theta(\theta)$ or $P(\Theta=\theta)$ as weight produces a mixture distribution. The following would be pdf of the resulting mixture distribution.

$\displaystyle (1a) \ \ \ \ \ f_X(x)=\int_{-\infty}^\infty f_{X \lvert \Theta}(x \lvert \theta) \ h_\Theta(\theta) \ d \theta$

$\displaystyle (1b) \ \ \ \ \ f_X(x)=\sum \limits_{\theta} \biggl(f_{X \lvert \Theta}(x \lvert \theta) \ P(\Theta=\theta) \biggr)$

Thus a continuous random variable $X$ is said to be a mixture (or has a mixture distribution) if its probability density function $f_X(x)$ is a weighted average of a family of pdfs $f_{X \lvert \Theta}(x \lvert \theta)$ where the weight is the density function or probability function of the random parameter $\Theta$. The random variable $\Theta$ is said to be the mixing random variable and its pdf or probability function is called the mixing weight.

Another definition of mixture distribution is that the cumulative distribution function (cdf) of the random variable $X$ is the weighted average of a family of cumulative distribution functions indexed by the mixing random variable $\Theta$.

$\displaystyle (2a) \ \ \ \ \ F_X(x)=\int_{-\infty}^\infty F_{X \lvert \Theta}(x \lvert \theta) \ h_\Theta(\theta) \ d \theta$

$\displaystyle (2b) \ \ \ \ \ F_X(x)=\sum \limits_{\theta} \biggl(F_{X \lvert \Theta}(x \lvert \theta) \ P(\Theta=\theta) \biggr)$

The idea of discrete mixture is similar. A discrete random variable $X$ is said to be a mixture if its probability function $P(X=x)$ or cumulative distribution function $P(X \le x)$ is a weighted average of a family of probability functions or cumulative distributions indexed by the mixing random variable $\Theta$. The mixing weight can be discrete or continuous. The following shows the probability function and the cdf of a discrete mixture distribution.

$\displaystyle (3a) \ \ \ \ \ P(X=x)=\int_{-\infty}^\infty P(X=x \lvert \Theta=\theta) \ h_\Theta(\theta) \ d \theta$

$\displaystyle (3b) \ \ \ \ \ P(X \le x)=\int_{-\infty}^\infty P(X \le x \lvert \Theta=\theta) \ h_\Theta(\theta) \ d \theta$

$\text{ }$

$\displaystyle (4a) \ \ \ \ \ P(X=x)=\sum \limits_{\theta} \biggl(P(X=x \lvert \Theta=\theta) \ P(\Theta=\theta) \biggr)$

$\displaystyle (4b) \ \ \ \ \ P(X \le x)=\sum \limits_{\theta} \biggl(P(X \le x \lvert \Theta=\theta) \ P(\Theta=\theta) \biggr)$

When the mixture distribution is a weighted average of finitely many distributions, it is called a $n$-point mixture where $n$ is the number of distributions. Suppose that there are $n$ distributions with pdfs

$f_1(x),f_2(x),\cdots,f_n(x)$ (continuous case)

or probability functions

$P(X_1=x),P(X_2=x),\cdots,P(X_n=x)$ (discrete case)

with mixing probabilities $p_1,p_2,\cdots,p_n$ where the sum of the $p_i$ is 1. Then the following gives the pdf or the probability function of the mixture distribution.

$\displaystyle (5a) \ \ \ \ \ f_X(x)=\sum \limits_{j=1}^n p_j \ f_j(x)$

$\displaystyle (5b) \ \ \ \ \ P(X=x)=\sum \limits_{j=1}^n p_j \ P(X_j=x)$

The cdf for the $n$-point mixture is similarly obtained by weighting the respective conditional cdfs as in (4b).

Distributional Quantities

Once the pdf (or probability function) or cdf of a mixture is established, the other distributional quantities can be derived from the pdf or cdf. Some of the distributional quantities can be obtained by taking weighted average of the corresponding conditional counterparts. For example, the following gives the survival function and moments of a mixture distribution. We assume that the mixing weight is continuous. For discrete mixing weight, simply replace the integral with summation.

$\displaystyle (6a) \ \ \ \ \ S_X(x)=\int_{-\infty}^\infty S_{X \lvert \Theta}(x \lvert \theta) \ h_\Theta(\theta) \ d \theta$

$\displaystyle (6b) \ \ \ \ \ E(X)=\int_{-\infty}^\infty E(X \lvert \theta) \ h_\Theta(\theta) \ d \theta$

$\displaystyle (6c) \ \ \ \ \ E(X^k)=\int_{-\infty}^\infty E(X^k \lvert \theta) \ h_\Theta(\theta) \ d \theta$

Once the moments are obtained, all distributional quantities that are based on moments can be evaluated, calculations such as variance, skewness, and kurtosis. Note that these quantities are not the weighted average of the conditional quantities. For example, variance of a mixture is not the weighted average of the variance of the conditional distributions. In fact, the variance of a mixture has two components.

$\displaystyle (7) \ \ \ \ \ Var(X)=E[Var(X \lvert \Theta)]+Var[E(X \lvert \Theta)]$

The relationship in (7) is called the law of total variance, which is the proper way of computing the unconditional variance $Var(X)$. The first component $E[Var(X \lvert \Theta)]$ is called the expected value of conditional variances, which is the weighted average of the conditional variances. The second component $Var[E(X \lvert \Theta)]$ is called the variance of the conditional means, which represents the additional variance as a result of the uncertainty in the parameter $\Theta$. If there is a great deal of variation among the conditional mean $E(X \lvert \Theta)$, the variation will be reflected in $Var(X)$ through the second component $Var[E(X \lvert \Theta)]$. This will be further illustrated in the examples below.

Motivation

Some of the examples discussed below have gamma distribution as mixing weights. See here for basic information on gamma distribution.

A natural interpretation of mixture is that of the uncertain parameter $\Theta$ in the conditional random variable $X \lvert \Theta$ describes an individual in a large population. For example, the parameter $\Theta$ describes a certain characteristics across the units in a population. In this section, we describe the idea of mixture in an insurance setting. The example is to mix Poisson distributions with a gamma distribution as mixing weight. We will see that the resulting mixture is a negative binomial distribution, which is more dispersed than the conditional Poisson distributions.

Consider a large group of insured drivers for auto collision coverage. Suppose that the claim frequency in a year for an insured driver has a Poisson distribution with mean $\theta$. The conditional probability function for the number of claims in a year for an insured driver is:

$\displaystyle P(X=x \lvert \Theta=\theta)=\frac{e^{-\theta} \ \theta^x}{x!} \ \ \ \ \ \ x=0,1,2,3,\cdots$ where $\theta>0$

The mean number of claims in a year for an insured driver is $\theta$. The parameter $\theta$ reflects the risk characteristics of an insured driver. Since the population of insured drivers is large, there is uncertainty in the parameter $\theta$. Thus it is more appropriate to regard $\theta$ as a random variable in order to capture the wide range of risk characteristics across the individuals in the population. As a result, the above probability function is not unconditional, but, rather, a conditional probability function of $X$.

What about the marginal (unconditional) probability function of $X$? Suppose that the pdf of $\Theta$ has a gamma distribution with the following pdf:

$\displaystyle h_{\Theta}(\theta)=\frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta}$

where $\alpha>0$ and $\beta>0$ are known parameters of the gamma distribution. Then the unconditional pdf of $X$ is the weighted average of the conditional Poisson distribution.

\displaystyle \begin{aligned} P(X=x)&=\int_0^\infty P(X=x \lvert \Theta=\theta) \ h_{\Theta}(\theta) \ d \theta \\&=\int_0^\infty \frac{e^{-\theta} \ \theta^x}{x!} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta} \\&= \frac{\beta^\alpha}{x! \Gamma(\alpha)} \int_0^\infty \theta^{x+\alpha-1} \ e^{(\beta+1) \theta} \ d \theta \\&=\frac{\beta^\alpha}{x! \Gamma(\alpha)} \ \frac{\Gamma(x+\alpha)}{(\beta+1)^{x+\alpha}} \int_0^\infty \frac{1}{\Gamma(x+\alpha)} \ (\beta+1)^{x+\alpha} \ \theta^{x+\alpha-1} \ e^{(\beta+1) \theta} \ d \theta \\&=\frac{\beta^\alpha}{x! \Gamma(\alpha)} \ \frac{\Gamma(x+\alpha)}{(\beta+1)^{x+\alpha}} \\&=\frac{\Gamma(x+\alpha)}{x! \ \Gamma(\alpha)} \ \biggl(\frac{\beta}{\beta+1} \biggr)^\alpha \biggl(\frac{1}{\beta+1} \biggr)^x \ \ x=0,1,2,\cdots \end{aligned}

Note that the integral in the 4th step is 1 since the integrand is a gamma density function. The probability function at the last step is that of a negative binomial distribution. If the parameter $\alpha$ is a positive integer, then the following gives the probability function of $X$ after simplifying the expression with gamma function.

$\displaystyle P(X=x)=\left\{ \begin{array}{ll} \displaystyle \biggl(\frac{\beta}{\beta+1} \biggr)^\alpha &\ x=0 \\ \text{ } & \text{ } \\ \displaystyle \frac{(x-1+\alpha) \cdots (1+\alpha) \alpha}{x!} \ \biggl(\frac{\beta}{\beta+1} \biggr)^\alpha \biggl(\frac{1}{\beta+1} \biggr)^x &\ x=1,2,\cdots \end{array} \right.$

This probability function can be further simplified as the following:

$\displaystyle P(X=x)=\binom{x+\alpha-1}{x} \biggl(\frac{\beta}{\beta+1} \biggr)^\alpha \biggl(\frac{1}{\beta+1} \biggr)^x$

where $x=0,1,2,\cdots$. This is one form of a negative binomial distribution. The mean is $E(X)=\frac{\alpha}{\beta}$ and the variance is $Var(X)=\frac{\alpha}{\beta} (1+\frac{1}{\beta})$. The variance of the negative binomial distribution is greater than the mean. In a Poisson distribution, the mean equals the variance. Thus the unconditional claim frequency $X$ is more dispersed than its conditional distributions. This is a characteristic of mixture distributions. The uncertainty in the parameter variable $\Theta$ has the effect of increasing the unconditional variance of the mixture distribution of $X$. Recall that the variance of a mixture distribution has two components, the weighted average of the conditional variances and the variance of the conditional means. The second component represents the additional variance introduced by the uncertainty in the parameter $\Theta$.

More Examples

We now further illustrate the notion of mixture with a few more examples. Many familiar distributions are mixture distribution. The negative binomial distribution is a mixture of Poisson distributions with gamma mixing weight as discussed above. The Pareto distribution, more specifically Pareto Type I Lomax, is a mixture of exponential distributions with gamma mixing weight (see Example 2 below). Example 3 discusses the normal-normal mixture. Example 1 demonstrates numerical calculation involving a finite mixture.

Example 1
Suppose that the size of an auto collision claim from a large group of insured drivers is a mixture of three exponential distributions with means 5, 8 and 10 (with respective weights 0.75, 0.15 and 0.10, respectively). Discuss the mixture distribution.

The pdf and cdf are the weighted averages of the respective exponential quantities.

\displaystyle \begin{aligned} f_X(x)&=0.75 \ (0.2 e^{-0.2x} )+0.15 \ (0.125 e^{-0.125x} )+0.10 (0.10 e^{-0.10x}) \\&\text{ } \\&=0.15 \ e^{-0.2x} +0.01875 \ e^{-0.125x}+0.01 \ e^{-0.10x} \end{aligned}

\displaystyle \begin{aligned} F_X(x)&=0.75 \ (1- e^{-0.2x} )+0.15 \ (1- e^{-0.125x} )+0.10 (1- e^{-0.10x}) \\&\text{ } \\&=1-0.75 \ e^{-0.2x} -0.15 \ e^{-0.125x}-0.10 \ e^{-0.10x} \end{aligned}

$\displaystyle S_X(x)=0.75 \ e^{-0.2x} +0.15 \ e^{-0.125x}+0.10 \ e^{-0.10x}$

For a randomly selected claim from this population of insured drivers, what is the probability that it exceeds 10? The answer is $S_X(10)=0.1813$. The pdf and cdf of the mixture will allow us to derive other distributional quantities such as moments and then using the moments to derive skewness and kurtosis. The moments for exponential distribution has a closed form. Then the moments of the mixture distribution is simply the weighted average of the exponential moments.

$\displaystyle E(X^k)=0.75 \ [5^k \ k!]+0.15 \ [8^k \ k!]+0.10 \ [10^k \ k!]$

where $k$ is a positive integer. The following evaluate the first four moments.

$\displaystyle E(X)=0.75 \ 5+0.15 \ 8+0.10 \ 10=5.95$

$\displaystyle E(X^2)=0.75 \ (5^2 \ 2!)+0.15 \ (8^2 \ 2!)+0.10 \ (10^2 \ 2!)=76.7$

$\displaystyle E(X^3)=0.75 \ (5^3 \ 3!)+0.15 \ (8^3 \ 3!)+0.10 \ (10^3 \ 3!)=1623.3$

$\displaystyle E(X^4)=0.75 \ (5^4 \ 4!)+0.15 \ (8^4 \ 4!)+0.10 \ (10^4 \ 4!)=49995.6$

The variance of $X$ is $Var(X)=76.7-5.95^2=41.2975$. The three conditional exponential variances are 25, 64 and 100. The weighted average of these would be 38.35. Because of the uncertainty resulting from not knowing which exponential distribution the claim is from, the unconditional variance is larger than 38.35.

The skewness of a distribution is the third central moments and the kurtosis is defined as the fourth central moment. Each of them can be expressed in terms of the raw moments up to the third or fourth raw moment.

$\displaystyle \gamma=E\biggl[\biggl( \frac{X-\mu}{\sigma} \biggr)^3\biggr]=\frac{E(X^3)-3 \mu \sigma^2-\mu^3}{(\sigma^2)^{1.5}}$

$\displaystyle \text{Kurt}[X]=E\biggl[\biggl( \frac{X-\mu}{\sigma} \biggr)^4\biggr]=\frac{E(X^4)-4 \mu E(X^3)+6 \mu^2 E(X^2)-3 \mu^4}{\sigma^4}$

Note that $\mu=E(X)$ and $\sigma^2=Var(X)$. The expressions on the right hand side are in terms of the raw moments $E(X^k)$ up to $k=4$. Plugging in the raw moments produces the skewness $\gamma=2.5453$ and kurtosis $\text{Kurt}[X]=14.0097$. The excess kurtosis is then 11.0097 (subtracting 3 from the kurtosis).

The skewness and excess kurtosis of an exponential distribution are always 2 and 6, respectively. One take way is that skewness and kurtosis of a mixture is not the weighted average of the conditional counterparts. In this particular case, the mixture is more skewed than the individual exponential distributions. Kurtosis is a measure of whether the data are heavy-tailed or light-tailed relative to a normal distribution (the kurtosis of a normal distribution is 3). Since the excess kurtosis for exponential distributions is 6, this mixture distribution is considered to be heavy tailed and to have higher likelihood of outliers.

Example 2 (Exponential-Gamma Mixture)
The Pareto distribution (Type I Lomax) is a mixture of exponential distributions with gamma mixing weight. Suppose $X$ has the exponential pdf $f_{X \lvert \Theta}(x \lvert \theta)=\theta \ e^{-\theta x}$, where $x>0$, conditional on the parameter $\Theta$. Suppose that the pdf of $\Theta$ has a gamma distribution with the following pdf:

$\displaystyle h_{\Theta}(\theta)=\frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta}$

Then the following gives the unconditional pdf of the random variable $X$.

\displaystyle \begin{aligned} f_X(x)&=\int_0^\infty f_{X \lvert \Theta}(x \lvert \theta) \ h_{\Theta}(\theta) \ d \theta \\&=\int_0^\infty \theta \ e^{-\theta x} \ \frac{1}{\Gamma(\alpha)} \ \beta^\alpha \ \theta^{\alpha-1} \ e^{-\beta \theta} \ d \theta \\&= \frac{\beta^\alpha}{\Gamma(\alpha)} \int_0^\infty \theta^{\alpha+1-1} \ e^{-(x+\beta) \theta} \ d \theta \\&= \frac{\beta^\alpha}{\Gamma(\alpha)} \frac{\Gamma(\alpha+1)}{(x+\beta)^{\alpha+1}} \int_0^\infty \frac{1}{\Gamma(\alpha+1)} \ (x+\beta)^{\alpha+1} \ \theta^{\alpha+1-1} \ e^{-(x+\beta) \theta} \ d \theta \\&=\frac{\beta^\alpha}{\Gamma(\alpha)} \frac{\Gamma(\alpha+1)}{(x+\beta)^{\alpha+1}} \\&= \frac{\alpha \ \beta^{\alpha}}{(x+\beta)^{\alpha+1}} \end{aligned}

The above is the density of the Pareto Type I Lomax distribution. Pareto distribution is discussed here. The example of exponential-gamma mixture is discussed here.

Example 3 (Normal-Normal Mixture)
Conditional on $\Theta=\theta$, consider a normal random variable $X$ with mean $\theta$ and variance $v$ where $v$ is known. The following is the conditional density function of $X$.

$\displaystyle f_{X \lvert \Theta}(x \lvert \theta)=\frac{1}{\sqrt{2 \pi v}} \ \text{exp}\biggl[-\frac{1}{2v}(x-\theta)^2 \biggr] \ \ \ -\infty

Suppose that the parameter $\Theta$ is normally distributed with mean $\mu$ and variance $a$ (both known parameters). The following is the density function of $\Theta$.

$\displaystyle f_{\Theta}(\theta)=\frac{1}{\sqrt{2 \pi a}} \ \text{exp}\biggl[-\frac{1}{2a}(\theta-\mu)^2 \biggr] \ \ \ -\infty

Determine the unconditional pdf of $X$.

\displaystyle \begin{aligned} f_X(x)&=\int_{-\infty}^\infty \frac{1}{\sqrt{2 \pi v}} \ \text{exp}\biggl[-\frac{1}{2v}(x-\theta)^2 \biggr] \ \frac{1}{\sqrt{2 \pi a}} \ \text{exp}\biggl[-\frac{1}{2a}(\theta-\mu)^2 \biggr] \ d \theta \\&=\frac{1}{2 \pi \sqrt{va}} \int_{-\infty}^\infty \text{exp}\biggl[-\frac{1}{2v}(x-\theta)^2 -\frac{1}{2a}(\theta-\mu)^2\biggr] \ d \theta \end{aligned}

The expression in the exponent has the following equivalent expression.

$\displaystyle \frac{(x-\theta)^2}{v}+\frac{(\theta-\mu)^2}{a}=\frac{a+v}{va} \biggl[\theta-\frac{ax+v \mu}{a+v}\biggr]^2 +\frac{(x-\mu)^2}{a+v}$

Continuing the derivation:

\displaystyle \begin{aligned} f_X(x)&=\frac{1}{2 \pi \sqrt{va}} \int_{-\infty}^\infty \text{exp}\biggl[-\frac{1}{2} \biggl(\frac{a+v}{va} \biggl[\theta-\frac{ax+v \mu}{a+v}\biggr]^2 +\frac{(x-\mu)^2}{a+v} \biggr) \biggr] \ d \theta \\&\displaystyle =\frac{\text{exp}\biggl[\displaystyle -\frac{(x-\mu)^2}{2(a+v)} \biggr]}{2 \pi \sqrt{va}} \int_{-\infty}^\infty \text{exp}\biggl[\displaystyle -\frac{1}{2} \biggl(\frac{a+v}{va} \biggl[\theta-\frac{ax+v \mu}{a+v}\biggr]^2 \biggr) \biggr] \ d \theta \\&=\frac{\text{exp}\biggl[\displaystyle -\frac{(x-\mu)^2}{2(a+v)} \biggr]}{\sqrt{2 \pi (a+v)} } \int_{-\infty}^\infty \frac{1}{\sqrt{2 \pi}} \sqrt{\frac{a+v}{va}} \ \text{exp}\biggl[-\frac{1}{2} \biggl(\frac{a+v}{va} \biggl[\theta-\frac{ax+v \mu}{a+v}\biggr]^2 \biggr) \biggr] \ d \theta \\&=\frac{\text{exp}\biggl[\displaystyle -\frac{(x-\mu)^2}{2(a+v)} \biggr]}{\sqrt{2 \pi (a+v)} } \end{aligned}

Note that the integrand in the integral in the third line is the density function of a normal distribution with mean $\frac{ax+v \mu}{a+v}$ and variance $\frac{va}{a+v}$. Hence the integral is 1. The last expression is the unconditional pdf of $X$, repeated as follows.

$\displaystyle f_X(x)=\frac{1}{\sqrt{2 \pi (a+v)}} \ \text{exp}\biggl[-\frac{(x-\mu)^2}{2(a+v)} \biggr] \ \ \ \ -\infty

The above is the pdf of a normal distribution with mean $\mu$ and variance $a+v$. Thus the mixing normal distribution with mean $\Theta$ and variance $v$ with the mixing weight $\Theta$ being normally distributed with mean $\mu$ and variance $a$ produces a normal distribution with mean $\mu$ (same mean as the mixing weight) and variance $a+v$ (sum of the conditional variance and the mixing variance).

The mean of the conditional normal distribution is uncertain. When the mean $\Theta$ follows a normal distribution with mean $\mu$, the mixture is a normal distribution that centers around $\mu$, however, with increased variance $a+v$. The increased variance of the unconditional distribution reflects the uncertainty of the parameter $\Theta$.

Remarks

Mixture distributions can be used to model a statistical population with subpopulations, where the conditional density functions are the densities on the subpopulations, and the mixing weights are the proportions of each subpopulation in the overall population. If the population can be divided into finite number of homogeneous subpopulations, then the model would be a finite mixture as in Example 1. In certain situations, continuous mixing weights may be more appropriate (e.g. Poisson-Gamma mixture).

Many other familiar distributions are mixture distributions and are discussed in the next post.

$\text{ }$

$\text{ }$

$\text{ }$

$\copyright$ 2017 – Dan Ma