# The Chi-Squared Distribution, Part 2

This is the part 2 of a 3-part series on the chi-squared distribution. In this post, we discuss several theorems, all centered around the chi-squared distribution, that play important roles in inferential statistics for the population mean and population variance of normal populations. These theorems are the basis for the test statistics used in the inferential procedures.

We first discuss the setting for the inference procedures. Then discuss the pivotal theorem (Theorem 5). We then proceed to discuss that theorems that produce the test statistics for $\mu$, the population mean of a normal population and for $\mu_1-\mu_2$, a difference of two population means from two normal populations. The discussion then shifts to the inference procedures on population variance.

_______________________________________________________________________________________________

The Settings

To facilitate the discussion, we use the notation $\mathcal{N}(\mu,\sigma^2)$ to denote the normal distribution with mean $\mu$ and variance $\sigma^2$. Whenever the random variable $X$ follows such as distribution, we use the notation $X \sim \mathcal{N}(\mu,\sigma^2)$.

The setting for making inference on one population is that we have a random sample $Y_1,Y_2,\cdots,Y_n$, drawn from a normal population $\mathcal{N}(\mu,\sigma^2)$. The sample mean $\overline{Y}$ and the sample variance $S^2$ are unbiased estimators of $\mu$ and $\sigma^2$, respectively, given by:

$\displaystyle \overline{Y}=\frac{Y_1+\cdots+Y_n}{n}=\frac{1}{n} \sum \limits_{j=1}^n Y_j \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (1)$

$\displaystyle S^2=\frac{1}{n-1} \sum \limits_{j=1}^n (Y_j-\overline{Y})^2 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

The goal is to use the information obtained from the sample, namely $\overline{Y}$ and $S^2$, to estimate or make decisions about the unknown population parameters $\mu$ and $\sigma^2$.

Because the sample is drawn from a normal population, the sample $\overline{Y}$ has a normal distribution, more specifically $\overline{Y} \sim \mathcal{N}(\mu,\frac{\sigma^2}{n})$, which has two unknown parameters. To perform inferential procedures on the population mean $\mu$, it is preferable to have a test statistic that depends on $\mu$ only. To this end, a t-statistic is used (see Theorem 7), which has the t-distribution with $n-1$ degrees of freedom (one less than the sample size). Because the parameter $\sigma^2$ is replaced by the sample variance $S^2$, the t-statistic has only $\mu$ as the unknown parameter.

On the other hand, to perform inferential procedures on the population variance $\sigma^2$, we use a statistic that has a chi-squared distribution and that has only one unknown parameter $\sigma^2$ (see Theorem 5).

Now, the setting for performing inference on two normal populations. Let $X_1,X_2,\cdots,X_n$ be a random sample drawn from the distribution $\mathcal{N}(\mu_X,\sigma_X^2)$. Let $Y_1,Y_2,\cdots,Y_m$ be a random sample drawn from the distribution $\mathcal{N}(\mu_Y,\sigma_Y^2)$. Because the two samples are independent, the difference of the sample means $\overline{X}-\overline{Y}$ has a normal distribution. Specifically, $\overline{X}-\overline{Y} \sim \mathcal{N}(\mu_X-\mu_Y,\frac{\sigma_X^2}{n}+\frac{\sigma_Y^2}{n})$. Theorem 8 gives a t-statistic that is in terms of the difference $\mu_X-\mu_Y$ such that the two unknown population variances are replaced by the pooled sample variance. This is done with the simplifying assumption that the two population variances are identical.

On the other hand, the inference on the population variances $\sigma_X^2$ and $\sigma_Y^2$, a statistic that has the F distribution can be used (See Theorem 10). One caveat is that this test statistic is sensitive to non-normality.

_______________________________________________________________________________________________

Connection between Normal Distribution and Chi-squared Distribution

There is an intimate relation between the sample items from a normal distribution and the chi-squared distribution. This is discussed in Part 1. Let’s recall this connection. If we normalize one sample item $Y_j$ and then square it, we obtain a chi-squared random variable with df = 1. Likewise, if we normalize each sample item and then square it, the sum of the squares will be a chi-squared random variable with df = $n$. The following results are discussed in Part 1 and are restated here for clarity.

Theorem 2
Suppose that the random variable $X$ follows a standard normal distribution, i.e. the normal distribution with mean 0 and standard deviation 1. Then $Y=X^2$ follows a chi-squared distribution with 1 degree of freedom.

Corollary 3
Suppose that the random variable $X$ follows a normal distribution with mean $\mu$ and standard deviation $\sigma$. Then $Y=[(X-\mu) / \sigma]^2$ follows a chi-squared distribution with 1 degree of freedom.

Corollary 4
Suppose that $X_1,X_2,\cdots,X_n$ is a random sample drawn from a normal distribution with mean $\mu$ and standard deviation $\sigma$. Then the following random variable follows a chi-squared distribution with $n$ degrees of freedom.

$\displaystyle \sum \limits_{j=1}^n \biggl( \frac{X_j-\mu}{\sigma} \biggr)^2=\biggl( \frac{X_1-\mu}{\sigma} \biggr)^2+\biggl( \frac{X_2-\mu}{\sigma} \biggr)^2+\cdots+\biggl( \frac{X_n-\mu}{\sigma} \biggr)^2$

_______________________________________________________________________________________________

A Pivotal Theorem

The statistic in Corollary 4 has two unknown parameters $\mu$ and $\sigma^2$. It turns out that the statistic will become more useful if $\mu$ is replaced by the sample mean $\overline{Y}$. The cost is that one degree of freedom is lost in the chi-squared distribution. The following theorem gives the details. The result is a statistic that is a function of the sample variance $S^2$ and the population variance $\sigma^2$.

Theorem 5
Let $Y_1,Y_2,\cdots,Y_n$ be a random sample drawn from a normal distribution with mean $\mu$ and variance $\sigma^2$. Then the following conditions hold.

• The sample mean $\overline{Y}$ and the sample variance $S^2$ are independent.
• The statistic $\displaystyle \frac{(n-1) S^2}{\sigma^2}=\frac{1}{\sigma^2} \sum \limits_{j=1}^n (Y_j-\overline{Y})^2$ has a chi-squared distribution with $n-1$ degrees of freedom.

Proof of Theorem 5
We do not prove the first bullet point. For a proof, see Exercise 13.93 in [2]. For the second bullet point, note that

\displaystyle \begin{aligned}\sum \limits_{j=1}^n \biggl( \frac{Y_j-\mu}{\sigma} \biggr)^2&=\sum \limits_{j=1}^n \biggl( \frac{(Y_j-\overline{Y})+(\overline{Y}-\mu)}{\sigma} \biggr)^2 \\&=\sum \limits_{j=1}^n \biggl( \frac{Y_j-\overline{Y}}{\sigma} \biggr)^2 +\frac{n (\overline{Y}-\mu)^2}{\sigma^2} \\&=\frac{(n-1) S^2}{\sigma^2} +\biggl( \frac{\overline{Y}-\mu}{\frac{\sigma}{\sqrt{n}}} \biggr)^2 \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (3)\end{aligned}

Note that in expanding $[(Y_j-\overline{Y})+(\overline{Y}-\mu)]^2$, the sum of the middle terms equals to 0. Furthermore the result $(3)$ can be restated as follows:

$\displaystyle Q=\frac{(n-1) S^2}{\sigma^2}+Z^2$

where $Q=\sum \limits_{j=1}^n \biggl( \frac{Y_j-\mu}{\sigma} \biggr)^2$ and $Z=\frac{\overline{Y}-\mu}{\frac{\sigma}{\sqrt{n}}}$. Note that $Z$ is a standard normal random variable. Thus $Z^2$ has a chi-squared distribution with df = 1 (by Theorem 2). Since $Q$ is an independent sum of squares of standardized normal variables, $Q$ has a chi-squared distribution with $n$ degrees of freedom. Furthermore, since $\overline{Y}$ and $S^2$ are independent, $Z^2$ and $S^2$ are independent. Let $H=\frac{(n-1) S^2}{\sigma^2}$. As a result, $H$ and $Z^2$ are independent. The following gives the moment generating function (MGF) of $Q$.

\displaystyle \begin{aligned}E[e^{t \ Q}]&=E[e^{t \ (H+Z^2)}] \\&=E[e^{t \ H}] \ E[e^{t \ Z^2}] \end{aligned}

Since $Q$ and $Z^2$ follow chi-squared distributions, we can plug in the chi-squared MGFs to obtain the MGF of the random variable $H$.

$\displaystyle \biggl(\frac{1}{1-2t} \biggr)^{\frac{n}{2}}=E[e^{t \ H}] \ \biggl(\frac{1}{1-2t} \biggr)^{\frac{1}{2}}$

$\displaystyle E[e^{t \ H}]=\biggl(\frac{1}{1-2t} \biggr)^{\frac{n-1}{2}}$

The MGF for $H$ is that of a chi-squared distribution with $n-1$ degrees of freedom. $\square$

Remark
It is interesting to compare the following two quantities:

$\displaystyle \sum \limits_{j=1}^n \biggl( \frac{X_j-\mu}{\sigma} \biggr)^2$

$\displaystyle \frac{(n-1) S^2}{\sigma^2}=\frac{1}{\sigma^2} \sum \limits_{j=1}^n (Y_j-\overline{Y})^2=\sum \limits_{j=1}^n \biggl( \frac{X_j-\overline{Y}}{\sigma} \biggr)^2$

The first quantity is from Corollary 4 and has a chi-squared distribution with $n$ degrees of freedom. The second quantity is from Theorem 5 and has a chi-squared distribution with $n-1$ degrees of freedom. Thus the effect of Theorem 5 is that by replacing the population mean $\mu$ with the sample mean $\overline{Y}$, one degree of freedom is lost in the chi-squared distribution.

Theorem 5 is a pivotal theorem that has wide applications. For our purposes at hand, it can be used for inference on both the mean and variance. Even though one degree of freedom is lost, the statistic $\displaystyle \frac{(n-1) S^2}{\sigma^2}$ is a function of one unknown parameter, namely the population variance $\sigma^2$. Since the sampling distribution is known (chi-squared), we can make probability statement about the statistic. Hence the statistic is useful for making inference about the population variance $\sigma^2$. As we will see below, in conjunction with other statistics, the statistic in Theorem 5 can be used for inference of two population variances as well as for inference on the mean (one sample and two samples).

_______________________________________________________________________________________________

Basis for Inference on Population Mean

Inference on the population mean of a single normal population and on the difference of the means of two independent normal populations relies on the t-statistic. Theorem 6 shows how to obtain a t-statistic using a chi-squared statistic and the standard normal statistic. Theorem 7 provides the one-sample t-statistic and Theorem 8 provides the two-sample t-statistic.

Theorem 6
Let $Z$ be the standard normal random variable. Let $U$ be a random variable that has a chi-squared distribution with $r$ degrees of freedom. Then the random variable

$\displaystyle T=\frac{Z}{\sqrt{\frac{U}{r}}}$

has a t-distribution with $r$ degrees of freedom and its probability density function (PDF) is

$\displaystyle g(t)=\frac{\Gamma(\frac{r+1}{2})}{\sqrt{\pi r} \Gamma(\frac{r}{2}) \biggl(1+\frac{t^2}{r} \biggr)^{\frac{r+1}{2}}} \ \ \ \ \ \ \ \ \ -\infty

Remark
The probability density function given here is not important for the purpose at hand. For the proof of Theorem 6, see [2]. The following two theorems give two applications of Theorem 6.

Theorem 7
Let $Y_1,Y_2,\cdots,Y_n$ be a random sample drawn from a normal distribution with mean $\mu$ and variance $\sigma^2$. Let $S^2$ be the sample variance defined in $(2)$. Then the random variable

$\displaystyle T=\frac{\overline{Y}-\mu}{\frac{S}{\sqrt{n}}}$

has a t-distribution with $n-1$ degrees of freedom.

Proof of Theorem 7
Consider the following statistics.

$\displaystyle Z=\frac{\overline{Y}-\mu}{\frac{\sqrt{\sigma}}{n}}$

$\displaystyle U=\frac{(n-1) \ S^2}{\sigma^2}$

Note that $Z$ has the standard normal distribution. By Theorem 5, the quantity $U$ has a chi-square distribution with df = $n-1$. By Theorem 6, the following quantity has a t-distribution with df = $n-1$.

$\displaystyle T=\frac{Z}{\sqrt{\frac{U}{n-1}}}=\frac{\overline{Y}-\mu}{\frac{S}{\sqrt{n}}}$

The above result is obtained after performing algebraic simplification. $\square$

Theorem 8
Let $X_1,X_2,\cdots,X_n$ be a random sample drawn from a normal distribution with mean $\mu_X$ and variance $\sigma_X^2$. Let $Y_1,Y_2,\cdots,Y_m$ be a random sample drawn from a normal distribution with mean $\mu_Y$ and variance $\sigma_Y^2$. Suppose that $\sigma_X^2=\sigma_Y^2=\sigma^2$. Then the following statistic:

$\displaystyle T=\frac{\overline{X}-\overline{Y}-(\mu_X-\mu_Y)}{S_p \ \sqrt{\frac{1}{n}+\frac{1}{m}}}$

has a t-distribution with df = $n+m-2$ where $\displaystyle S_p^2=\frac{(n-1) \ S_X^2+ (m-1) \ S_Y^2}{n+m-2}$.

Note that $S_p^2$ is the pooled variance of the two sample variances $S_X^2$ and $S_Y^2$.

Proof of Theorem 8
First, the sample mean $\overline{X}$ has a normal distributions with mean and variance $\mu_X$ and $\frac{\sigma_X^2}{n}$, respectively. The sample mean $\overline{Y}$ has a normal distribution with mean and variance $\mu_Y$ and $\frac{\sigma_Y^2}{n}$, respectively. Since the two samples are independent, $\overline{X}$ and $\overline{Y}$ are independent. Thus the sample difference $\overline{X}-\overline{Y}$ has a normal distribution with mean $\mu_X-\mu_Y$ and variance $\frac{\sigma_X^2}{n}+\frac{\sigma_Y^2}{n}$. The following is a standardized normal random variance:

$\displaystyle Z=\frac{\overline{X}-\overline{Y}-(\mu_X-\mu_Y)}{\sqrt{\frac{\sigma_X^2}{n}+\frac{\sigma_Y^2}{m}}}$

On the other hand, by Theorem 5 the following quantities have chi-squared distributions with degrees of freedom $n-1$ and $m-1$, respectively.

$\displaystyle \frac{(n-1) S_X^2}{\sigma_X^2}=\frac{\sum \limits_{j=1}^n (X_j-\overline{X})^2}{\sigma_X^2}$

$\displaystyle \frac{(n-1) S_Y^2}{\sigma_Y^2}=\frac{\sum \limits_{j=1}^m (Y_j-\overline{Y})^2}{\sigma_Y^2}$

Because the two samples are independent, the two chi-squared statistics are independent. Then the following is a chi-squared statistic with $n+m-2$ degrees of freedom.

$\displaystyle U=\frac{(n-1) S_X^2}{\sigma_X^2}+\frac{(n-1) S_Y^2}{\sigma_Y^2}$

By Theorem 6, the following ratio

$\displaystyle T=\frac{Z}{\sqrt{\frac{U}{n+m-2}}}$

has a t-distribution with $n+m-2$ degrees of freedom. Here’s where the simplifying assumption of $\sigma_X^2=\sigma_Y^2=\sigma^2$ is used. Plugging in this assumption gives the following:

$\displaystyle T=\frac{\overline{X}-\overline{Y}-(\mu_X-\mu_Y)}{\sqrt{\frac{(n-1) S_X^2+(m-1) S_Y^2}{n+m-2} \ (\frac{1}{n}+\frac{1}{m})}}=\frac{\overline{X}-\overline{Y}-(\mu_X-\mu_Y)}{S_p \ \sqrt{\frac{1}{n}+\frac{1}{m}}}$

where $S_p^2$ is the pooled sample variance of the two samples as indicated above. $\square$

_______________________________________________________________________________________________

Basis for Inference on Population Variance

As indicated above, the statistic $\displaystyle \frac{(n-1) S^2}{\sigma^2}$ given in Theorem 5 can be used for inference on the variance of a normal population. The following theorem gives the basis for the statistic used for comparing the variances of two normal populations.

Theorem 9
Suppose that the random variables $U$ and $V$ are independent chi-squared random variables with $r_1$ and $r_2$ degrees of freedom, respectively. Then the statistic

$\displaystyle F=\frac{U / r_1}{V / r_2}$

has an F-distribution with $r_1$ and $r_2$ degrees of freedom.

Remark
The F-distribution depends on two parameters $r_1$ and $r_2$. The order they are given is important. We regard the first parameter as the degrees of freedom of the chi-squared distribution in the numerator and the second parameter as the degrees of freedom of the chi-squared distribution in the denominator.

It is not important to know the probability density functions for both the t-distribution and the F-distribution (in both Theorem 6 and Theorem 9). When doing inference procedures with these distributions, either tables or software will be used.

Given two independent normal random samples $X_1,X_2,\cdots,X_n$ and $Y_1,Y_2,\cdots,Y_n$ (as discussed in the above section on the settings of inference), the sample variance $S_X^2$ is an unbiased estimator of the population variance $\sigma_X^2$ of the first population, and the sample variance $S_Y^2$ is an unbiased estimator of the population variance $\sigma_Y^2$ of the second population. It seems to make sense that the ratio $\displaystyle \frac{S_X^2}{S_Y^2}$ can be used to make inference about the relative magnitude of $\sigma_X^2$ and $\sigma_Y^2$. The following theorem indicates that this is a valid approach.

Theorem 10
Let $X_1,X_2,\cdots,X_n$ be a random sample drawn from a normal distribution with mean $\mu_X$ and variance $\sigma_X^2$. Let $Y_1,Y_2,\cdots,Y_m$ be a random sample drawn from a normal distribution with mean $\mu_Y$ and variance $\sigma_Y^2$. Then the statistic

$\displaystyle \frac{S_X^2 \ / \ \sigma_X^2}{S_Y^2 \ / \ \sigma_Y^2}=\frac{S_X^2}{S_Y^2} \times \frac{\sigma_Y^2}{\sigma_X^2}$

has the F-distribution with degrees of freedom $n-1$ and $m-1$.

Proof of Theorem 10
By Theorem 5, $\displaystyle \frac{(n-1) S_X^2}{\sigma_X^2}$ has a chi-squared distribution with $n-1$ degrees of freedom and $\displaystyle \frac{(m-1) S_Y^2}{\sigma_Y^2}$ has a chi-squared distribution with $m-1$ degrees of freedom. By Theorem 9, the following statistic

$\displaystyle \frac{[(n-1) S_X^2 \ / \ \sigma_X^2] \ / \ (n-1)}{[(m-1) S_Y^2 \ / \ \sigma_Y^2] \ / \ (m-1)}$

has the F-distribution with $n-1$ and $m-1$ degrees of freedom. The statistic is further simplified to become the statistic as stated in the theorem. $\square$

_______________________________________________________________________________________________

Concluding Remarks

Theorem 7 and Theorem 8 produce the one-sample t-statistic and two-sample t-statistic, respectively. They are procedures for inference about one population mean and the difference of two population means, respectively. They can be used for estimation (e.g. construction of confidence intervals) or decision making (e.g. hypothesis testing). On the other hand, Theorem 5 produces a chi-squared statistic for inference about one population variance. Theorem 10 produces an F-statistic that can be used for inference about two population variances. Since the F-statistic is a ratio of the two population variances, it can be used for inference about the relative magnitude of the variances.

The purpose in this post is to highlight the important roles of the chi-squared distribution. We now discuss briefly the quality of the derived statistical procedures. The procedures discussed here (t or F) are exactly correct if the populations from which the samples are drawn are normal. Real life data usually do not exactly follow normal distributions. Thus the usefulness of these statistics in practice depends on how strongly they are affected by non-normality. In other words, if there is a significant deviation from the assumption of normal distribution, are these procedures still reliable?

A statistical inference procedure is called robust if the calculated results drawn from the procedure are insensitive to deviations of assumptions. For a non-robust procedure, the result would be distorted if there is deviation from assumptions. For example, the t procedures are not robust against outliers. The presence of outliers in the data can distort the results since the t procedures are based on the sample mean $\overline{x}$ and sample variance $S^2$, which are not resistant to outliers.

On the other hand, the t procedures for inference about means are quite robust against slight deviation of normal population assumptions. The F procedures for inference about variances are not so robust. So it must be used with care. Even if there is a slight deviation from normal assumptions, the results from the F procedures may not be reliable. For a more detailed but accessible discussion on robustness, see [1].

When the sample sizes are large, the sample mean $\overline{x}$ is close to a normal distribution (this result is called the central limit theorem). So the discussion about deviation of normality assumption is no longer important. When the sample sizes are large, simply use the Z statistic for inference about the means. On the other hand, when the sample sizes are large, the sample variance $S^2$ will be an accurate estimate of the population variance $\sigma^2$ regardless of the assumption of the population distribution. This fact is related to the law of large numbers. Thus the statistical procedures described here are for small sample sizes and for assuming normal populations.

_______________________________________________________________________________________________

Reference

1. Moore D. S., McCabe G. P., Craig B. A., Introduction to the Practice of Statistics, 7th ed., W. H. Freeman and Company, New York, 2012
2. Wackerly D. D., Mendenhall III W., Scheaffer R. L.,Mathematical Statistics with Applications, Thomson Learning, Inc, California, 2008

_______________________________________________________________________________________________
$\copyright \ 2016 - \text{Dan Ma}$