3/26/2020
n | x.i | std.err |
---|---|---|
14 | 0.93 | 0.04 |
14 | 1.21 | 0.03 |
14 | 0.92 | 0.04 |
n | x.i | std.err |
---|---|---|
14 | 0.93 | 0.04 |
14 | 1.21 | 0.03 |
14 | 0.92 | 0.04 |
Recall that \(\text{std.err} = \text{std.dev} / \sqrt{n}\). We can use this to calculate the standard deviation and variance for each case:
ex.df$S = ex.df$std.err * sqrt(ex.df$n) ex.df$S2 = ex.df$S^2
n | x.i | std.err | S | S2 |
---|---|---|---|---|
14 | 0.93 | 0.04 | 0.1496663 | 0.0224 |
14 | 1.21 | 0.03 | 0.1122497 | 0.0126 |
14 | 0.92 | 0.04 | 0.1496663 | 0.0224 |
n | x.i | std.err | S | S2 |
---|---|---|---|---|
14 | 0.93 | 0.04 | 0.1496663 | 0.0224 |
14 | 1.21 | 0.03 | 0.1122497 | 0.0126 |
14 | 0.92 | 0.04 | 0.1496663 | 0.0224 |
Now, \(SSE = \sum_i\sum_j(Y_{ij}-\overline{Y}_{i*})^2 = \sum_i(n_i-1)S_i^2\). We can put each contribution in a column:
ex.df$SSE = (ex.df$n-1)*ex.df$S2
n | x.i | std.err | S | S2 | SSE |
---|---|---|---|---|---|
14 | 0.93 | 0.04 | 0.1496663 | 0.0224 | 0.2912 |
14 | 1.21 | 0.03 | 0.1122497 | 0.0126 | 0.1638 |
14 | 0.92 | 0.04 | 0.1496663 | 0.0224 | 0.2912 |
n | x.i | std.err | S | S2 | SSE |
---|---|---|---|---|---|
14 | 0.93 | 0.04 | 0.1496663 | 0.0224 | 0.2912 |
14 | 1.21 | 0.03 | 0.1122497 | 0.0126 | 0.1638 |
14 | 0.92 | 0.04 | 0.1496663 | 0.0224 | 0.2912 |
Next, \(SST = \sum_i\sum_j(\overline{Y}_{i*}-\overline{Y})^2 = \sum_in_i(\overline{Y}_{i*}-\overline{Y})^2\) where \(\overline{Y} = \frac{1}{n}\sum_i n_i\overline{Y}_{i*}\).
ex.n = sum(ex.df$n) ex.x = (1/ex.n) * sum(ex.df$n * ex.df$x.i) ex.df$SST = ex.df$n * (ex.df$x.i - ex.x)^2
n | x.i | std.err | S | S2 | SSE | SST |
---|---|---|---|---|---|---|
14 | 0.93 | 0.04 | 0.1496663 | 0.0224 | 0.2912 | 0.1134 |
14 | 1.21 | 0.03 | 0.1122497 | 0.0126 | 0.1638 | 0.5054 |
14 | 0.92 | 0.04 | 0.1496663 | 0.0224 | 0.2912 | 0.1400 |
n | x.i | std.err | S | S2 | SSE | SST |
---|---|---|---|---|---|---|
14 | 0.93 | 0.04 | 0.1496663 | 0.0224 | 0.2912 | 0.1134 |
14 | 1.21 | 0.03 | 0.1122497 | 0.0126 | 0.1638 | 0.5054 |
14 | 0.92 | 0.04 | 0.1496663 | 0.0224 | 0.2912 | 0.1400 |
Now we have all the ingredients for our ANOVA table!
ex.dfT = nrow(ex.df) - 1 ex.dfE = ex.n - nrow(ex.df) ex.SST = sum(ex.df$SST) ex.SSE = sum(ex.df$SSE) ex.MST = ex.SST / ex.dfT ex.MSE = ex.SSE / ex.dfE ex.F = ex.MST / ex.MSE ex.p = pf(ex.F, ex.dfT, ex.dfE, lower.tail=FALSE)
Source | DoF | SS | MS | F | p |
---|---|---|---|---|---|
Treatments | 2 | 0.7588 | 0.3794000 | 19.82927 | 1.1e-06 |
Error | 39 | 0.7462 | 0.0191333 | NA | NA |
A | B | C | D | E |
---|---|---|---|---|
.8 | .7 | 1.2 | 1.0 | .6 |
.6 | .8 | 1.0 | .9 | .4 |
.6 | .5 | .9 | .9 | .4 |
.5 | .5 | 1.2 | 1.1 | .7 |
.6 | 1.3 | .7 | .3 | |
.9 | .8 | |||
.7 |
ex.df = data.frame( group = c(rep("A", 4), rep("B",7), rep("C", 6), rep("D", 5), rep("E", 5)), data = c(.8,.6,.6,.5, .7,.8,.5,.5,.6,.9,.7, 1.2,1.0,.9,1.2,1.3,.8, 1.0,.9,.9,1.1,.7, .6,.4,.4,.7,.3 ) )
group | data |
---|---|
A | 0.8 |
A | 0.6 |
A | 0.6 |
A | 0.5 |
B | 0.7 |
B | 0.8 |
aov(data ~ group, ex.df)
## Call: ## aov(formula = data ~ group, data = ex.df) ## ## Terms: ## group Residuals ## Sum of Squares 1.211844 0.571119 ## Deg. of Freedom 4 22 ## ## Residual standard error: 0.1611209 ## Estimated effects may be unbalanced
ANOVA in R is a special case of linear regression. Using the command aov
ensures that summary printouts match the multiple means use case.
anova( aov(data ~ group, ex.df) )
Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
---|---|---|---|---|---|
group | 4 | 1.211844 | 0.302961 | 11.67032 | 3.09e-05 |
Residuals | 22 | 0.571119 | 0.025960 | NA | NA |
summary( aov(data ~ group, ex.df) )
## Df Sum Sq Mean Sq F value Pr(>F) ## group 4 1.2118 0.30296 11.67 3.09e-05 *** ## Residuals 22 0.5711 0.02596 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1