3/26/2020
| n | x.i | std.err |
|---|---|---|
| 14 | 0.93 | 0.04 |
| 14 | 1.21 | 0.03 |
| 14 | 0.92 | 0.04 |
| n | x.i | std.err |
|---|---|---|
| 14 | 0.93 | 0.04 |
| 14 | 1.21 | 0.03 |
| 14 | 0.92 | 0.04 |
Recall that \(\text{std.err} = \text{std.dev} / \sqrt{n}\). We can use this to calculate the standard deviation and variance for each case:
ex.df$S = ex.df$std.err * sqrt(ex.df$n) ex.df$S2 = ex.df$S^2
| n | x.i | std.err | S | S2 |
|---|---|---|---|---|
| 14 | 0.93 | 0.04 | 0.1496663 | 0.0224 |
| 14 | 1.21 | 0.03 | 0.1122497 | 0.0126 |
| 14 | 0.92 | 0.04 | 0.1496663 | 0.0224 |
| n | x.i | std.err | S | S2 |
|---|---|---|---|---|
| 14 | 0.93 | 0.04 | 0.1496663 | 0.0224 |
| 14 | 1.21 | 0.03 | 0.1122497 | 0.0126 |
| 14 | 0.92 | 0.04 | 0.1496663 | 0.0224 |
Now, \(SSE = \sum_i\sum_j(Y_{ij}-\overline{Y}_{i*})^2 = \sum_i(n_i-1)S_i^2\). We can put each contribution in a column:
ex.df$SSE = (ex.df$n-1)*ex.df$S2
| n | x.i | std.err | S | S2 | SSE |
|---|---|---|---|---|---|
| 14 | 0.93 | 0.04 | 0.1496663 | 0.0224 | 0.2912 |
| 14 | 1.21 | 0.03 | 0.1122497 | 0.0126 | 0.1638 |
| 14 | 0.92 | 0.04 | 0.1496663 | 0.0224 | 0.2912 |
| n | x.i | std.err | S | S2 | SSE |
|---|---|---|---|---|---|
| 14 | 0.93 | 0.04 | 0.1496663 | 0.0224 | 0.2912 |
| 14 | 1.21 | 0.03 | 0.1122497 | 0.0126 | 0.1638 |
| 14 | 0.92 | 0.04 | 0.1496663 | 0.0224 | 0.2912 |
Next, \(SST = \sum_i\sum_j(\overline{Y}_{i*}-\overline{Y})^2 = \sum_in_i(\overline{Y}_{i*}-\overline{Y})^2\) where \(\overline{Y} = \frac{1}{n}\sum_i n_i\overline{Y}_{i*}\).
ex.n = sum(ex.df$n) ex.x = (1/ex.n) * sum(ex.df$n * ex.df$x.i) ex.df$SST = ex.df$n * (ex.df$x.i - ex.x)^2
| n | x.i | std.err | S | S2 | SSE | SST |
|---|---|---|---|---|---|---|
| 14 | 0.93 | 0.04 | 0.1496663 | 0.0224 | 0.2912 | 0.1134 |
| 14 | 1.21 | 0.03 | 0.1122497 | 0.0126 | 0.1638 | 0.5054 |
| 14 | 0.92 | 0.04 | 0.1496663 | 0.0224 | 0.2912 | 0.1400 |
| n | x.i | std.err | S | S2 | SSE | SST |
|---|---|---|---|---|---|---|
| 14 | 0.93 | 0.04 | 0.1496663 | 0.0224 | 0.2912 | 0.1134 |
| 14 | 1.21 | 0.03 | 0.1122497 | 0.0126 | 0.1638 | 0.5054 |
| 14 | 0.92 | 0.04 | 0.1496663 | 0.0224 | 0.2912 | 0.1400 |
Now we have all the ingredients for our ANOVA table!
ex.dfT = nrow(ex.df) - 1 ex.dfE = ex.n - nrow(ex.df) ex.SST = sum(ex.df$SST) ex.SSE = sum(ex.df$SSE) ex.MST = ex.SST / ex.dfT ex.MSE = ex.SSE / ex.dfE ex.F = ex.MST / ex.MSE ex.p = pf(ex.F, ex.dfT, ex.dfE, lower.tail=FALSE)
| Source | DoF | SS | MS | F | p |
|---|---|---|---|---|---|
| Treatments | 2 | 0.7588 | 0.3794000 | 19.82927 | 1.1e-06 |
| Error | 39 | 0.7462 | 0.0191333 | NA | NA |
| A | B | C | D | E |
|---|---|---|---|---|
| .8 | .7 | 1.2 | 1.0 | .6 |
| .6 | .8 | 1.0 | .9 | .4 |
| .6 | .5 | .9 | .9 | .4 |
| .5 | .5 | 1.2 | 1.1 | .7 |
| .6 | 1.3 | .7 | .3 | |
| .9 | .8 | |||
| .7 |
ex.df = data.frame(
group = c(rep("A", 4), rep("B",7), rep("C", 6), rep("D", 5), rep("E", 5)),
data = c(.8,.6,.6,.5,
.7,.8,.5,.5,.6,.9,.7,
1.2,1.0,.9,1.2,1.3,.8,
1.0,.9,.9,1.1,.7,
.6,.4,.4,.7,.3
)
)
| group | data |
|---|---|
| A | 0.8 |
| A | 0.6 |
| A | 0.6 |
| A | 0.5 |
| B | 0.7 |
| B | 0.8 |
aov(data ~ group, ex.df)
## Call: ## aov(formula = data ~ group, data = ex.df) ## ## Terms: ## group Residuals ## Sum of Squares 1.211844 0.571119 ## Deg. of Freedom 4 22 ## ## Residual standard error: 0.1611209 ## Estimated effects may be unbalanced
ANOVA in R is a special case of linear regression. Using the command aov ensures that summary printouts match the multiple means use case.
anova( aov(data ~ group, ex.df) )
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| group | 4 | 1.211844 | 0.302961 | 11.67032 | 3.09e-05 |
| Residuals | 22 | 0.571119 | 0.025960 | NA | NA |
summary( aov(data ~ group, ex.df) )
## Df Sum Sq Mean Sq F value Pr(>F) ## group 4 1.2118 0.30296 11.67 3.09e-05 *** ## Residuals 22 0.5711 0.02596 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1