3/23/2020

The F distribution

Comparing sums and means of squared normal variables

Recall: If \(Z_1,\dots,Z_n\sim\mathcal{N}(0,1)\), then \(\sum Z_i^2\sim\chi^2(n)\).

To compare two sets of iid squared standard normal variables, sums are less appropriate: \(\mathbb{E}\left[\sum Z_i^2\right] = n\), so the larger set is likely to have a larger sum, and also larger variance - \(\mathbb{V}\left[\sum Z_i^2\right] = 2n\).

Instead: compare means of sets of squared standard normal variables:

If \(W_1\sim\chi^2(n)\) and \(W_2\sim\chi^2(m)\), compare \(W_1/n\) to \(W_2/m\).

The F distribution

Definition

Suppose \(W_1\sim\chi^2(n)\) and \(W_2\sim\chi^2(m)\). Then the ratio \[ F = \frac{W_1/n}{W_2/m} \sim F^n_m \] follows the \(F\) distribution with \(n\) numerator degrees of freedom and \(m\) denominator degrees of freedom.

In R, the \(F\) distribution is handled by the functions rf, df, pf, qf.

One-sample test for variance

Testing for variance

Let \(Y_1,\dots,Y_n\sim\mathcal{N}(\mu,\sigma^2)\) iid with unknown mean and variance.

We know that \(X = \frac{(n-1)S^2}{\sigma^2}\sim\chi^2(n-1)\).

We can leverage this to create statistical tests for population variance.

For the one-sample case, the setup is a simple null hypothesis \(H_0:\sigma^2=\sigma_0^2\) against either an upper-, lower-, or two-tailed alternative.

We choose as test statistic \(X = \frac{(n-1)S^2}{\sigma_0^2}\sim_{H_0}\chi^2(n-1)\).

Handling tails

We choose as test statistic \(X = \frac{(n-1)S^2}{\sigma_0^2}\sim_{H_0}\chi^2(n-1)\).

Large \(S^2\) favors an \(H_A:\sigma^2 > \sigma_0^2\) alternative.

Small \(S^2\) favors an \(H_A:\sigma^2 < \sigma_0^2\) alternative.

For a two-tailed alternative, we need to distribute the \(\alpha\) probability mass among both tails. Shortest possible acceptance region is difficult, since \(\chi^2\) is an asymmetric distribution. Instead, distributing \(\alpha/2\) to each tail is much easier.

Rejection Regions

Since under the null hypothesis we have a known distribution for the test statistic, we use this distribution to create bounds for the rejection regions. This yields a test:

  • \(H_0: \sigma^2 = \sigma_0^2\) and \(H_A: \begin{cases} \sigma^2 < \sigma_0^2 \\ \sigma^2 \neq \sigma_0^2 \\ \sigma^2 > \sigma_0^2 \end{cases}\)
  • Test Statistic: \(X = \frac{(n-1)S^2}{\sigma_0^2}\).
  • Rejection region: \(RR = \begin{cases} \{X < X_{\alpha}\} \\ \{X < X_{\alpha/2}\} \cup \{X > X_{1-\alpha/2}\} \\ \{X > X_{1-\alpha}\} \end{cases}\)

Here we write \(X_\alpha = F^{-1}_{\chi^2(n-1)}(\alpha)\) for the inverse CDF.

Two-sample test of variance

Testing for variance

Let \(X_1,\dots,X_n\sim\mathcal{N}(\mu_X,\sigma_X^2)\) iid and \(Y_1,\dots,Y_m\sim\mathcal{N}(\mu_Y,\sigma_Y^2)\) iid with unknown means and variances.

We know that in general, \(W = \frac{(n-1)S^2}{\sigma^2}\sim\chi^2(n-1)\).

We can leverage this to create statistical tests for population variance.

As indicated in the video for the F-distribution, the way to compare two different \(\chi^2\)-distributed variables is through comparing the mean of squared standard normals and not the sum of squared standard normals - in other words to compare \(W_X / (n-1)\) to \(W_Y / (m-1)\).

Test Statistic

Since we know the distribution of the ratio

\[ \frac{W_X/(n-1)}{W_Y/(m-1)} = \frac{S_X^2 / \sigma_X^2}{S_Y^2 / \sigma_Y^2} \sim F^{n-1}_{m-1} \]

Under a null hypothesis of \(H_0:\sigma_X^2 = \sigma_Y^2\), the \(\sigma_X^2\) and \(\sigma_Y^2\) cancel in the ratio, leaving the test statistic

\[ F = \frac{S_X^2}{S_Y^2} \sim_{H_0} F^{n-1}_{m-1} \]

Alternative hypotheses and rejection regions

If \(\sigma_X^2 > \sigma_Y^2\), then we expect the numerator of \(F=S_X^2/S_Y^2\) to be larger than the denominator, so we expect \(F\) to be greater than 1. An upper-tail rejection region works.

If \(\sigma_X^2 < \sigma_Y^2\), then we expect the numerator of \(F=S_X^2/S_Y^2\) to be smaller than the denominator, so we expect \(F\) to be less than 1. A lower-tail rejection region works.

For \(H_A: \sigma_X^2\neq\sigma_Y^2\), as with the one-sample test, we handle the asymmetries of the F-distributions by distributing \(\alpha/2\) to each tail in spite of this producing a larger acceptance region than might have been possible.

Two-sample Test for Variance

Since under the null hypothesis we have a known distribution for the test statistic, we use this distribution to create bounds for the rejection regions. This yields a test:

  • \(H_0: \sigma_X^2 = \sigma_Y^2\) and \(H_A: \begin{cases} \sigma_X^2 < \sigma_Y^2 \\ \sigma_X^2 \neq \sigma_Y^2 \\ \sigma_X^2 > \sigma_Y^2 \end{cases}\)
  • Test Statistic: \(F = \frac{S_X^2}{S_Y^2}\).
  • Rejection region: \(RR = \begin{cases} \{F < F_{\alpha}\} \\ \{F < F_{\alpha/2}\} \cup \{F > F_{1-\alpha/2}\} \\ \{F > F_{1-\alpha}\} \end{cases}\)

Here we write \(F_\alpha = F^{-1}_{F^{n-1}_{m-1}}(\alpha)\) for the inverse CDF.

Symmetries of the F distribution

Symmetries of the F distribution

The F distribution has some nice symmetries, including:

If \(F\sim F^n_m\) then \(1/F \sim F_n^m\)

As well as symmetries connecting \(F_\alpha\) and \(F_{1-\alpha}\).

These can be used to simplify calculations if you are working with paper lookup tables.

Nowadays we have access to computers.