MVJ
12April, 2018
ANOVA. Multiple testing corrections.
We have seen several models and tests for population means:
Null hypothesis in all cases is that the means do not differ: that \(\mu_A=\mu_B\) or that \(\beta_1=0\).
There is a gap here:
This has null hypothesis that all the means are equal: \(\mu_1 = \dots = \mu_m\).
We can think of many sample means models in a few ways:
In the paired case, we consider the categorical variable to contain the subpopulation membership.
How do we study numeric and categorical variables?
If the distributions for the subpopulations are all equal, then they will be equal to the distribution of their union.
In other words, pooling all the data should not change summary statistics (by much).
y | a | b | c | joint | a | b | c | joint |
z | m | m | m | m | n | n | n | n |
sd.x | 1.31 | 0.95 | 0.97 | 1.09 | 0.89 | 1.03 | 0.84 | 1.46 |
For the ANOVA, we use
The key to testing uses the \(F\)-test from last week.
The ANOVA model is:
DATA = FIT + RESIDUAL = \(x_{ij} = \mu_i + \epsilon_{ij}\)
where \(\mu_i\) are the individual subpopulation means and \(\epsilon_{ij}\sim\mathcal N(0,\sigma)\) are uncorrelated normally distributed homoscedastic errors.
With this the null hypothesis becomes
\[ \mu_1 = \dots = \mu_m \]
The ANOVA works by comparing variances – so we will be using the same kind of ANOVA table we introduced last week. Here, each \(\mu_i\) is a parameter (like how \(\beta_0, \beta_1\) were parameters):
Type | SS | DoF | MS | F | p |
---|---|---|---|---|---|
Model | \(SSM = \sum(\overline x_i-\overline x)^2\) | \(m-1\) | \(SSM/DoFM\) | MSM / MSE | qf(F, df1=DoFM, df2=DoFE) |
Error | \(SSE = \sum(x_{ij}-\overline x_i)^2\) | \(n-m\) | \(SSE/DoFE\) | ||
Total | \(SST = \sum(x_{ij}-\overline x)^2\) | \(n-1\) | \(SST/DoFE = \sigma^2\) |
\(SST = SSM + SSE\) and \(DoFT = DoFM + DoFE\).
In R, the relevant command is aov
.
You could use lm
as well.
The command anova
takes the output of lm
and produces an ANOVA table.
Do front wheel / rear wheel / 4-wheel drives differ in city driving mileage?
model = aov(cty ~ drv, data=mpg)
model
## Call:
## aov(formula = cty ~ drv, data = mpg)
##
## Terms:
## drv Residuals
## Sum of Squares 1879 2342
## Deg. of Freedom 2 231
##
## Residual standard error: 3.2
## Estimated effects may be unbalanced
Do front wheel / rear wheel / 4-wheel drives differ in city driving mileage?
model = aov(cty ~ drv, data=mpg)
summary(model)
## Df Sum Sq Mean Sq F value Pr(>F)
## drv 2 1879 939 92.7 <2e-16 ***
## Residuals 231 2342 10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Do front wheel / rear wheel / 4-wheel drives differ in city driving mileage?
model = lm(cty ~ drv, data=mpg)
model
##
## Call:
## lm(formula = cty ~ drv, data = mpg)
##
## Coefficients:
## (Intercept) drvf drvr
## 14.33 5.64 -0.25
Do front wheel / rear wheel / 4-wheel drives differ in city driving mileage?
model = lm(cty ~ drv, data=mpg)
summary(model)
##
## Call:
## lm(formula = cty ~ drv, data = mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.97 -1.97 -0.33 1.03 15.03
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14.330 0.314 45.68 <2e-16 ***
## drvf 5.642 0.440 12.81 <2e-16 ***
## drvr -0.250 0.710 -0.35 0.72
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.2 on 231 degrees of freedom
## Multiple R-squared: 0.445, Adjusted R-squared: 0.44
## F-statistic: 92.7 on 2 and 231 DF, p-value: <2e-16
If your ANOVA test is not significant, you’re done.
If your ANOVA test is significant it is time to find out where the discrepancy is. Individual pairs of subpopulation can be compared with two-sample t-tests.
As a result, we will be doing many t-tests. For a 5-group ANOVA, there will be 10 tests. Each test has a 1/20 chance of failing. The tests are not independent.
The probability of false discovery is much larger.
The easiest way to deal with multiple testing is Bonferroni correction: if we are doing \(k\) different tests, we use a cutoff of \(\alpha/k\) for each.
For our mileage example, there are 3 groups. There are 3 pairwise comparisons. We need to use a p-value significance cutoff of $0.05/30.02.
Bonferroni assumes that the tests results are disjoint. This is rarely true: so Bonferroni usually produces too low a cutoff.
Alternatives exist: commonly used are Holm and Hochberg.
All are available through the R command p.adjust
.