3/25/2020

Multiple Means Testing

Setup

Recall the setup for the last example on likelihood ratios: \[ X_{1,i} \sim \mathcal{N}(\mu_1,\sigma^2) \qquad X_{2,i} \sim \mathcal{N}(\mu_2,\sigma^2) \qquad X_{3,i} \sim \mathcal{N}(\mu_3,\sigma^2) \] with unknown means and unknown but identical variance.

Our interest is in \(H_0:\mu_1=\mu_2=\mu_3\) vs. \(H_A:\) at least one pair unequal.

In the example, we found a likelihood ratio test to reduce to testing \[ \frac{\hat\sigma_0^2}{\hat\sigma^2} < k \] for some \(k\), where \(\hat\sigma^2\) is the pooled sample variance estimator and \(\hat\sigma_0^2\) is the usual sample variance taken on all the data.

Joint vs. Pooled Variances

Means Tests

ANOVA and the T-test fit into a sequence of tests focused on population means:

Test \(H_0\)
One-sample \(\mu = \mu_0\)
Two-sample \(\mu_1 = \mu_2\)
Many-sample (ANOVA) \(\mu_1 = \dots = \mu_k\)
Continuous (Regression) \(\mu(x) = f(x)\)

Notation

Notation

ANOVA is all about sample variances - so there will be many sums of squares involved. We write:

\[ Y_{i1},\dots,Y_{i,n_i}\sim\mathcal{N}(\mu_i,\sigma^2) \\ \overline Y_{i*} = \frac{1}{n_i}\sum_{j=1}^{n_i}Y_{ij} \qquad \overline Y = \frac{1}{\sum n_i}\sum_{i=1}^k\sum_{j=1}^{n_i}Y_{ij} \]

where we have \(k\) different groups, each containing \(n_i\) samples. We separate the in-group means from the global mean - and will be examining each of these separately.

Notation (Sums of Squares)

Type Formula
Total Sum of Squares \(TSS = \sum_{i=1}^k\sum_{j=1}^{n_i}(Y_{ij}-\overline Y)^2\)
Sum of Squares for Errors \(SSE = \sum_{i=1}^k\sum_{j=1}^{n_i}(Y_{ij}-\overline Y_{i*})^2\)
Sum of Squares for Treatments \(SST = \sum_{i=1}^k\sum_{j=1}^{n_i}(\overline Y_{i*}-\overline Y)^2\)
Mean Squares for Errors \(MSE = SSE/DoFE\)
Mean Squares for Treatments \(MST = SST/DoFT\)

We will investigate the degrees of freedom \(DoFE\) and \(DoFT\) more later.

Estimators

The quantity \(TSS/(n-1)\) is the classical sample variance calculated for the entire dataset at once.

\(MSE\) is the pooled sample variance that we have met occasionally before.

Each of \(TSS/(n-1)\), \(MSE\) and \(MST\) is (under \(H_0\)) an unbiased estimator of \(\sigma^2\).

\(MSE\) is an unbiased estimator of \(\sigma^2\) even without support from \(H_0\).

Additivity

Theorem

\(TSS = SST + SSE\)

Proof

\[ TSS = \sum_i\sum_j(Y_{ij}-\overline Y)^2 = \sum_i\sum_j(\color{blue}{(Y_{ij}-\overline Y_{i*})} + \color{green}{(\overline Y_{i*}-\overline Y)})^2 \\ = \sum_i\sum_j\left[ \color{blue}{(Y_{ij}-\overline Y_{i*})^2} + 2\color{blue}{(Y_{ij}-\overline Y_{i*})}\color{green}{(\overline Y_{i*}-\overline Y)} + \color{green}{(\overline Y_{i*}-\overline Y)^2} \right] \]

Additivity

Proof…

Let’s look at the term \(2\color{blue}{(Y_{ij}-\overline Y_{i*})}\color{green}{(\overline Y_{i*}-\overline Y)}\) for a fixed \(i\):

\[ \sum_j2\color{blue}{(Y_{ij}-\overline Y_{i*})}\color{green}{(\overline Y_{i*}-\overline Y)} = 2\color{green}{(\overline Y_{i*}-\overline Y)}\sum\color{blue}{(Y_{ij}-\overline Y_{i*})} =\\ 2{(\overline Y_{i*}-\overline Y)}\left(\sum Y_{ij} - n\overline Y_{i*}\right) =2{(\overline Y_{i*}-\overline Y)}\left(n\overline Y_{i*} - n\overline Y_{i*}\right) =0 \]

Additivity

Proof…

Returning to \(TSS\):

\[ TSS = \\ \sum_i\sum_j\left[ \color{blue}{(Y_{ij}-\overline Y_{i*})^2} + \color{purple}{2(Y_{ij}-\overline Y_{i*})(\overline Y_{i*}-\overline Y)} + \color{green}{(\overline Y_{i*}-\overline Y)^2} \right] = \\ \color{blue}{\sum_i\sum_j(Y_{ij}-\overline Y_{i*})^2} + \color{purple}{\sum_i 0} + \color{green}{\sum_i\sum_j(\overline Y_{i*}-\overline Y)^2} =\\ \color{blue}{SSE} + \color{green}{SST} \]

The ANOVA F-Test

Two theorems on \(\chi^2\) distributions

Theorem

If \(U\sim\chi^2(n)\) and \(V\sim\chi^2(m)\) are independent then \(U+V\sim\chi^2(n+m)\).

Proof

Recall that if moment generating functions are equal, then so are the probability distributions. A \(\chi^2(n)\) variable has \(MGF=(1-2t)^{-n/2}\).

\[ MGF_{U+V} = \mathbb{E}[e^{t(U+V)}] = \mathbb{E}[e^{tU}e^{tV}] = \\ \mathbb{E}[e^{tU}]\mathbb{E}[e^{tV}] + \text{covariances} = \\ MGF_U\cdot MGF_V = (1-2t)^{-n/2}(1-2t)^{-m/2} =\\ (1-2t)^{-(n+m)/2} \]

Two theorems on \(\chi^2\) distributions

Theorem

If \(U\sim\chi^2(n)\) and \(V\) are independent and \(U+V\sim\chi^2(n+m)\) then \(V\sim\chi^2(m)\).

Proof

A \(\chi^2(n)\) variable has \(MGF=(1-2t)^{-n/2}\).

\[ (1-2t)^{-(n+m)/2} = MGF_{U+V} = \\ MGF_U\cdot MGF_V = (1-2t)^{-n/2}MGF_V \\ \text{So } MGF_V = (1-2t)^{-(n+m)/2}/(1-2t)^{-n/2} = (1-2t)^{-m/2} \]

\(SSE\) is a pooled sum of squares

Recall \[ SSE = \sum_i\sum_j(Y_{ij}-\overline Y_{i*})^2 \qquad S_i^2 = \frac{1}{n_i-1}\sum_j(Y_{ij}-\overline Y_{i*})^2 \]

It follows that \(SSE\) can be written in terms of the group sample variances:

\[ SSE = \sum_i(n_i-1)S_i^2 \]

SSE and pooled variance

We know that \((n-1)S^2/\sigma^2\sim\chi^2(n-1)\) in general for sample variances.

Hence, each \((n_i-1)S_i^2/\sigma^2\sim\chi^2(n_i-1)\).

It follows by the first theorem (addition of \(\chi^2\) DoF) that \[ \frac{SSE}{\sigma^2} = \sum_i\frac{(n_i-1)S_i^2}{\sigma^2} \sim\chi^2\left(\sum_i(n_i-1)\right) = \chi^2(n-k) \]

\(TSS\) and sample variance

Under the null nypothesis \(\mu_1 = \dots = \mu_k\), all the \(Y_{ij}\) are iid. Then \(TSS = (n-1)S^2\) for the ordinary sample variance \(S^2\) of all the data.

It follows that \[ \frac{TSS}{\sigma^2} = \frac{(n-1)S^2}{\sigma^2} \sim\chi^2(n-1) \]

\(SST\) and \(\chi^2\)

Since \(TSS = SST + SSE\), it follows that \[ \frac{TSS}{\sigma^2} = \frac{SST}{\sigma^2} + \frac{SSE}{\sigma^2} \] and by the second theorem (subtraction of \(\chi^2\) DoF) that \[ \frac{SST}{\sigma^2}\sim\chi^2((n-1) - (n-k)) = \chi^2(k-1) \]

Distribution summary

To summarize, we now know that

Quantity Distribution Degrees of Freedom
\(TSS/\sigma^2\) \(\chi^2(n-1)\) \(n-1\)
\(SSE/\sigma^2\) \(\chi^2(n-k)\) \(n-k\)
\(SST/\sigma^2\) \(\chi^2(k-1)\) \(k-1\)

Thus \[ \frac {\left.\frac{SST}{\sigma^2}\right/(k-1)} {\left.\frac{SSE}{\sigma^2}\right/(n-k)} \sim F_{n-k}^{k-1} \]

The ANOVA F-Test

We define \[ DoFE = n-k \qquad DoFT = k-1 \\ F = \frac {\left.\frac{SST}{\color{red}{\sigma^2}}\right/(k-1)} {\left.\frac{SSE}{\color{red}{\sigma^2}}\right/(n-k)} = \frac{SST/(k-1)}{SSE/(n-k)} = \frac{MST}{MSE} \]

This \(F\)-statistic follows - under \(H_0\) - a known probability distribution, so we can use it to create a statistical test.

We expect the variance to increase under \(H_A\), which would increase \(MST\) but not \(MSE\) - so we would reject \(H_0\) if \(F > F_\alpha\) for some threshold \(F_\alpha\) chosen from \(F_{n-k}^{k-1}\).

The ANOVA Table

The ANOVA Table

It is very common to summarize all the components of the calculation of the ANOVA \(F\)-statistic and its \(p\)-value in a single table:

Source DoF SS MS F p
Treatments \(DoFT\) \(SST\) \(MST\) \(MST/MSE\) \(1-F_{F_{n-k}^{k-1}}(F)\)
Error \(DoFE\) \(SSE\) \(MSE\)
Total \(n-1\) \(TSS\) \(S^2\)

The table can be printed in R for a fitted model (linear regression model or multiple means ANOVA) using the command anova.

Estimation

Estimation

Regardless of whether or not \(H_0\) is true, \(MSE\) is an unbiased pooled estimator of \(\sigma^2\). Since it uses all the available data, it produces a better estimate (smaller confidence intervals) than would any of the group specific sample variances in isolation.

Write \(S=\sqrt{MSE}\) and set \(t_{\alpha} = F^{-1}_{t(n-k)}(1-\alpha)\). Then \[ \mu_i \in \overline{Y}_{i*}\pm t_{\alpha/2}S/\sqrt{n_i} \\ \mu_i-\mu_j\in(\overline Y_{i*}-\overline Y_{j*})\pm t_{\alpha/2}S\sqrt{\frac{1}{n_i}+\frac{1}{n_j}} \]

This is only fully valid if you are interested in exactly one mean or mean difference among all possible ones. The reason is that the tolerated errors compound.

Family-Wise Error Rates

Suppose we are seeking simultaneous confidence intervals \(I_1,\dots,I_m\) for parameters \(\theta_1,\dots,\theta_m\) such that \[ \mathbb{P}(\theta_i\in I_i\text{ for all $i$}) = 1-\alpha \]

Picking each interval to be a \(1-\alpha\) interval will not give this probability - because we are combining several events.

The failure of all intervals to work simultaneously is called a family-wise error, and the probability \(\alpha\) here is the family-wise error rate.

Bonferroni’s Inequality

Recall that \(\overline{A_1\cap\dots\cap A_m} = \overline A_1\cup\dots\cup\overline A_m\). By sub-additivity of probabilities, \[ \begin{aligned} \mathbb{P}(A_1\cap\dots\cap A_m) &= 1-\mathbb{P}(\overline A_1\cup\dots\cup\overline A_m) \\ &\geq 1-\sum\mathbb{P}(\overline A_i) \\ &=1-\sum\alpha_i \end{aligned} \]

If each of our confidence intervals is a \((1-\alpha)\) confidence interval, their joint probability - the family-wise error rate - could be as small as \(1-m\alpha\).

Bonferroni’s Method

Bonferroni’s inequality suggests a method to control the family-wise error rate. For a family-wise error rate of \(\alpha\) - a probability of all confidence intervals to simultaneously contain their respective parameters of \(1-\alpha\) - we choose to construct each interval as a \((1-\alpha/m)\) confidence interval.

Alternatives

Bonferroni’s Method is known to be overly conservative (rejects the null too rarely). Better methods have been proposed - Holm’s Method and Hochberg’s Method - but these are out of scope for this course.

Family-Wise Error Rate can be considered to be overly harsh - especially for large sets of simultaneous CIs. An alternative is to control the False Discovery Rate - allow a rate of up to \(\alpha\) erroneous CIs.