Lecture 27

MVJ

12April, 2018

ANOVA. Multiple testing corrections.

One sample, Two sample, Continuous sample, Multiple sample

We have seen several models and tests for population means:

Null hypothesis in all cases is that the means do not differ: that \(\mu_A=\mu_B\) or that \(\beta_1=0\).

There is a gap here:

This has null hypothesis that all the means are equal: \(\mu_1 = \dots = \mu_m\).

Many sample means

We can think of many sample means models in a few ways:

In the paired case, we consider the categorical variable to contain the subpopulation membership.

How do we study numeric and categorical variables?

Many sample means

Many sample means

Many sample means

Many sample means

Many sample means

Fundamental idea of the ANOVA

If the distributions for the subpopulations are all equal, then they will be equal to the distribution of their union.

In other words, pooling all the data should not change summary statistics (by much).

y a b c joint a b c joint
z m m m m n n n n
sd.x 1.31 0.95 0.97 1.09 0.89 1.03 0.84 1.46

Null and Alternative

For the ANOVA, we use

The key to testing uses the \(F\)-test from last week.

ANOVA model

The ANOVA model is:

DATA = FIT + RESIDUAL = \(x_{ij} = \mu_i + \epsilon_{ij}\)

where \(\mu_i\) are the individual subpopulation means and \(\epsilon_{ij}\sim\mathcal N(0,\sigma)\) are uncorrelated normally distributed homoscedastic errors.

With this the null hypothesis becomes

\[ \mu_1 = \dots = \mu_m \]

ANOVA table

The ANOVA works by comparing variances – so we will be using the same kind of ANOVA table we introduced last week. Here, each \(\mu_i\) is a parameter (like how \(\beta_0, \beta_1\) were parameters):

Type SS DoF MS F p
Model \(SSM = \sum(\overline x_i-\overline x)^2\) \(m-1\) \(SSM/DoFM\) MSM / MSE qf(F, df1=DoFM, df2=DoFE)
Error \(SSE = \sum(x_{ij}-\overline x_i)^2\) \(n-m\) \(SSE/DoFE\)
Total \(SST = \sum(x_{ij}-\overline x)^2\) \(n-1\) \(SST/DoFE = \sigma^2\)

\(SST = SSM + SSE\) and \(DoFT = DoFM + DoFE\).

In R

In R, the relevant command is aov.

You could use lm as well.

The command anova takes the output of lm and produces an ANOVA table.

In R

Do front wheel / rear wheel / 4-wheel drives differ in city driving mileage?

model = aov(cty ~ drv, data=mpg)
model
## Call:
##    aov(formula = cty ~ drv, data = mpg)
## 
## Terms:
##                  drv Residuals
## Sum of Squares  1879      2342
## Deg. of Freedom    2       231
## 
## Residual standard error: 3.2
## Estimated effects may be unbalanced

In R

Do front wheel / rear wheel / 4-wheel drives differ in city driving mileage?

model = aov(cty ~ drv, data=mpg)
summary(model)
##              Df Sum Sq Mean Sq F value Pr(>F)    
## drv           2   1879     939    92.7 <2e-16 ***
## Residuals   231   2342      10                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In R

Do front wheel / rear wheel / 4-wheel drives differ in city driving mileage?

model = lm(cty ~ drv, data=mpg)
model
## 
## Call:
## lm(formula = cty ~ drv, data = mpg)
## 
## Coefficients:
## (Intercept)         drvf         drvr  
##       14.33         5.64        -0.25

In R

Do front wheel / rear wheel / 4-wheel drives differ in city driving mileage?

model = lm(cty ~ drv, data=mpg)
summary(model)
## 
## Call:
## lm(formula = cty ~ drv, data = mpg)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -8.97  -1.97  -0.33   1.03  15.03 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   14.330      0.314   45.68   <2e-16 ***
## drvf           5.642      0.440   12.81   <2e-16 ***
## drvr          -0.250      0.710   -0.35     0.72    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.2 on 231 degrees of freedom
## Multiple R-squared:  0.445,  Adjusted R-squared:  0.44 
## F-statistic: 92.7 on 2 and 231 DF,  p-value: <2e-16

What to do next - multiple testing

If your ANOVA test is not significant, you’re done.

If your ANOVA test is significant it is time to find out where the discrepancy is. Individual pairs of subpopulation can be compared with two-sample t-tests.

As a result, we will be doing many t-tests. For a 5-group ANOVA, there will be 10 tests. Each test has a 1/20 chance of failing. The tests are not independent.

The probability of false discovery is much larger.

Multiple testing

The easiest way to deal with multiple testing is Bonferroni correction: if we are doing \(k\) different tests, we use a cutoff of \(\alpha/k\) for each.

For our mileage example, there are 3 groups. There are 3 pairwise comparisons. We need to use a p-value significance cutoff of $0.05/30.02.

Bonferroni assumes that the tests results are disjoint. This is rarely true: so Bonferroni usually produces too low a cutoff.

Alternatives exist: commonly used are Holm and Hochberg.

All are available through the R command p.adjust.

What do I need to know now