MVJ
12April, 2018
Last week, we looked at two-sample proportions test: given \(x_1\) out of \(n_1\) and \(x_2\) out of \(n_2\), is \(x_1/n_1\) significantly different from \(x_2/n_2\).
Another way of phrasing the question is: in the two-way table with successes and failures vs. population 1 or population 2, do the columns and rows influence each other:
Population 1 | Population 2 | |
---|---|---|
Success | \(s_1\) | \(s_2\) |
Failure | \(f_1\) | \(f_2\) |
This way we can also do an \(n\)-sample proportions test:
Population 1 | Population 2 | … | Population n | |
---|---|---|---|---|
Success | \(s_1\) | \(s_2\) | … | \(s_n\) |
Failure | \(f_1\) | \(f_2\) | … | \(f_n\) |
For the two-way table case, the hypotheses to use are:
Null hypothesis There is no association between the row and column variables.
Alternative hypothesis There is some (unspecified) association between the row and column variables.
This can also be stated as:
Null hypothesis The conditional distributions of the rows conditioned on the columns are all equal.
or as
Null hypothesis The conditional distributions of the columns conditioned on the rows are all equal.
Suppose the row values are \(x_1,\dots,x_r\) and column values are \(y_1,\dots,y_c\).
If the null hypothesis is true, these two are independent: so \(\mathbb{P}(x_i\text{ and }y_j) = \mathbb{P}(x_i)\cdot\mathbb{P}(y_j)\).
Then we can get an expected cell probability for the two-way table from the row and column probabilities. These we can estimate from the row and column counts:
\[ \left( \begin{array}{cccc|c} x_{11} & x_{21} & x_{31} & x_{41} & x_{*1} \\ x_{12} & x_{22} & x_{32} & x_{42} & x_{*2} \\ x_{13} & x_{23} & x_{33} & x_{43} & x_{*3} \\ \hline x_{1*} & x_{2*} & x_{3*} & x_{4*} & n \\ \end{array} \right) \]
Suppose the row values are \(x_1,\dots,x_r\) and column values are \(y_1,\dots,y_c\).
If the null hypothesis is true, these two are independent: so \(\mathbb{P}(x_i\text{ and }y_j) = \mathbb{P}(x_i)\cdot\mathbb{P}(y_j)\).
Then we can get an expected cell probability for the two-way table from the row and column probabilities. These we can estimate from the row and column counts:
\[ \left( \begin{array}{cccc|c} p_{11} & p_{21} & p_{31} & p_{41} & p_{*1} \\ p_{12} & p_{22} & p_{32} & p_{42} & p_{*2} \\ p_{13} & p_{23} & p_{33} & p_{43} & p_{*3} \\ \hline p_{1*} & p_{2*} & p_{3*} & p_{4*} & 1 \\ \end{array} \right) \]
As proportions by dividing every entry by \(n\).
Suppose the row values are \(x_1,\dots,x_r\) and column values are \(y_1,\dots,y_c\).
If the null hypothesis is true, these two are independent: so \(\mathbb{P}(x_i\text{ and }y_j) = \mathbb{P}(x_i)\cdot\mathbb{P}(y_j)\).
Then we can get an expected cell probability for the two-way table from the row and column probabilities. These we can estimate from the row and column counts:
\[ \left( \begin{array}{cccc|c} p_{1*}\cdot p_{*1} & p_{2*}\cdot p_{*1} & p_{3*}\cdot p_{*1} & p_{4*}\cdot p_{*1} & p_{*1} \\ p_{1*}\cdot p_{*2} & p_{2*}\cdot p_{*2} & p_{3*}\cdot p_{*2} & p_{4*}\cdot p_{*2} & p_{*2} \\ p_{1*}\cdot p_{*3} & p_{2*}\cdot p_{*3} & p_{3*}\cdot p_{*3} & p_{4*}\cdot p_{*3} & p_{*3} \\ \hline p_{1*} & p_{2*} & p_{3*} & p_{4*} & 1 \\ \end{array} \right) \]
Under the null hypothesis: each is the product of the row and column proportions.
Suppose the row values are \(x_1,\dots,x_r\) and column values are \(y_1,\dots,y_c\).
If the null hypothesis is true, these two are independent: so \(\mathbb{P}(x_i\text{ and }y_j) = \mathbb{P}(x_i)\cdot\mathbb{P}(y_j)\).
Then we can get an expected cell probability for the two-way table from the row and column probabilities. These we can estimate from the row and column counts:
\[ \left( \begin{array}{cccc|c} p_{1*}\cdot p_{*1} & p_{2*}\cdot p_{*1} & p_{3*}\cdot p_{*1} & p_{4*}\cdot p_{*1} & p_{*1} \\ p_{1*}\cdot p_{*2} & p_{2*}\cdot p_{*2} & p_{3*}\cdot p_{*2} & p_{4*}\cdot p_{*2} & p_{*2} \\ p_{1*}\cdot p_{*3} & p_{2*}\cdot p_{*3} & p_{3*}\cdot p_{*3} & p_{4*}\cdot p_{*3} & p_{*3} \\ \hline p_{1*} & p_{2*} & p_{3*} & p_{4*} & 1 \\ \end{array} \right) \]
The expected count in each cell is the expected proportions multiplied by the total number of observations:
\[ E_{ij} = p_{i*}\cdot p_{*j}\cdot n = \frac{x_{i*}}{n}\cdot \frac{x_{*j}}{\color{red}{n}}\cdot \color{red}{n} = \frac{x_{i*}\cdot x_{*j}}{n} \]
Now that we have an expected count in each cell, we can build a statistical test from this.
The deviation in each cell is the difference between the observed count and the expected count.
Having a large deviation could be because
We can normalize to control for 1. by dividing the squared difference by the expected count. As it turns out, doing this and summing up the squared deviations follows a known distribution:
\[ X^2 = \sum\frac{(x_{ij} - E_{ij})^2}{E_{ij}} \sim \chi^2((r-1)(c-1)) \]
rchisq
, dchisq
, pchisq
, qchisq
with parameter df
for the degrees of freedom parameter.
We could create confidence intervals using the \(\chi^2\) statistic to create a confidence interval for \(X^2\) – but this is not a useful quantity.
A large deviation from the expected count is worth taking note of. Only upper-tailed (greater) testing is relevant for the \(\chi^2\) tests.
Note that the 2-sample test reports X-squared
and df
.
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: c(328, 234) out of c(537, 532)
## X-squared = 30, df = 1, p-value = 3e-08
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## 0.11 0.23
## sample estimates:
## prop 1 prop 2
## 0.61 0.44
Tschuprow’s T: \(T = \sqrt{\frac{\chi^2}{n\sqrt{df}}}\)
Cramér’s V: \(V = \sqrt{\frac{\chi^2}{n\min(r-1, c-1)}}\)
Both take values between 0 and 1. Both are available in the package DescTools
. They are categorical versions of \(r^2\) (correlation squared).
Magnitude | Cramér’s V |
---|---|
Small | \(0 < V < 0.2\) |
Medium | \(0.2 < V < 0.4\) |
Large | \(0.4 < V\) |
(Cohen 1988: Statistical power and analysis for the behavioral ssciences)
Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)
\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.
Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\)
95% threshold: qchisq(0.95, df=df)
Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)
Do you use Instagram?
No | Yes | |
---|---|---|
Men | 298 | 234 |
Women | 209 | 328 |
Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)
\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.
Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\) : 1
95% threshold: 3.84
Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)
Do you use Instagram?
No | Yes | |
---|---|---|
Men | 298 | 234 |
Women | 209 | 328 |
Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)
\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.
Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\) : 1
95% threshold: 3.84
Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)
Do you use Instagram?
No | Yes | No.Exp | Yes.Exp | No.Diff | Yes.Diff | No.X2 | Yes.X2 | |
---|---|---|---|---|---|---|---|---|
Men | 298 | 234 | 252 | 280 | 46 | -46 | 8.3 | 7.5 |
Women | 209 | 328 | 255 | 282 | -46 | 46 | 8.2 | 7.4 |
\(\chi^2 = 31.32\) and \(T = 0.17\)
Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)
\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.
Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\)
95% threshold: qchisq(0.95, df=df)
Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)
Should all children be required to vaccinate?
No | Yes | |
---|---|---|
Democratic | 230 | 729 |
Republican | 258 | 479 |
Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)
\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.
Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\) : 1
95% threshold: 3.84
Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)
Should all children be required to vaccinate?
No | Yes | |
---|---|---|
Democratic | 230 | 729 |
Republican | 258 | 479 |
Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)
\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.
Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\) : 1
95% threshold: 3.84
Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)
Should all children be required to vaccinate?
No | Yes | No.Exp | Yes.Exp | No.Diff | Yes.Diff | No.X2 | Yes.X2 | |
---|---|---|---|---|---|---|---|---|
Democratic | 230 | 729 | 276 | 683 | -46 | 46 | 7.7 | 3.1 |
Republican | 258 | 479 | 212 | 525 | 46 | -46 | 9.9 | 4.0 |
\(\chi^2 = 24.71\) and \(T = 0.12\)
Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)
\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.
Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\)
95% threshold: qchisq(0.95, df=df)
Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)
How much fruit do you eat - and how much do you exercise?
Lo | Mid | Hi | |
---|---|---|---|
Low | 69 | 25 | 14 |
Moderate | 206 | 126 | 111 |
Vigorous | 294 | 170 | 169 |
Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)
\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.
Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\) : 4
95% threshold: 9.49
Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)
How much fruit do you eat - and how much do you exercise?
Lo | Mid | Hi | |
---|---|---|---|
Low | 69 | 25 | 14 |
Moderate | 206 | 126 | 111 |
Vigorous | 294 | 170 | 169 |
Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)
\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.
Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\) : 4
95% threshold: 9.49
Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)
How much fruit do you eat - and how much do you exercise?
Lo | Mid | Hi | Lo.E | Mid.E | Hi.E | Lo.D | Mid.D | Hi.D | Lo.X2 | Mid.X2 | Hi.X2 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Low | 69 | 25 | 14 | 52 | 29 | 27 | 17.1 | -4.3 | -13 | 5.63 | 0.63 | 6.13 |
Moderate | 206 | 126 | 111 | 213 | 120 | 110 | -6.9 | 5.9 | 1 | 0.22 | 0.29 | 0.01 |
Vigorous | 294 | 170 | 169 | 304 | 172 | 157 | -10.2 | -1.6 | 12 | 0.34 | 0.02 | 0.89 |
\(\chi^2 = 14.15\) and \(T = 0.08\)
Input |
Two-way table tab or paired factor variables x and y
|
Null hypothesis | Rows and columns are independent |
Alternative hypothesis | Rows and columns are not independent |
Test statistic | \(\chi^2 = \sum (E_{ij}-O_{ij})^2/E_{ij}\) for expected \(E_{ij}\) and observed \(O_{ij}\). |
This test only has one alternative and has no confidence interval.
Requirements All expected counts should be at least 1, and the average expected counts should be at least 5.
For a \(2\times2\) table, all expected counts should be at least 5.
Command: chisq.test
with arguments tab
test = chisq.test(tab)
test
##
## Pearson's Chi-squared test
##
## data: tab
## X-squared = 10, df = 4, p-value = 0.007
Effect size:
DescTools
, command CramerV
DescTools
, command TschuprowT
V = CramerV(tab)
T = TschuprowT(tab)
c(V=V, T=T) %>% kable
V | 0.08 |
T | 0.08 |
To check validity, run the test, then check test$expected
:
c(`table size` = dim(tab),
`Eij > 1` = all(test$expected > 1),
`Eij > 5` = all(test$expected > 5),
`mean(Eij) > 5` = mean(test$expected) > 5,
V = CramerV(tab))
table size | Eij > 1 | Eij > 5 | mean(Eij) > 5 | V |
---|---|---|---|---|
3 x 3 | TRUE | TRUE | TRUE | 0.08 |
Test:
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: tab
## X-squared = 30, df = 1, p-value = 3e-08
Validity and effect size:
table size | Eij > 1 | Eij > 5 | mean(Eij) > 5 | V |
---|---|---|---|---|
2 x 2 | TRUE | TRUE | TRUE | 0.17 |
Test:
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: tab
## X-squared = 20, df = 1, p-value = 9e-07
Validity and effect size:
table size | Eij > 1 | Eij > 5 | mean(Eij) > 5 | V |
---|---|---|---|---|
2 x 2 | TRUE | TRUE | TRUE | 0.12 |
Test:
##
## Pearson's Chi-squared test
##
## data: tab
## X-squared = 10, df = 4, p-value = 0.007
Validity and effect size:
table size | Eij > 1 | Eij > 5 | mean(Eij) > 5 | V |
---|---|---|---|---|
3 x 3 | TRUE | TRUE | TRUE | 0.08 |
Instead of deriving our expected counts from the data, we could just prescribe proportions for each cell.
This is usually good for checking whether the data fits a particular distribution.
The chi-square test for independence generalized from a two-sample proportions test to many samples.
The chi-square test for goodness of fit generalizes from a one-sample proportions test to many proportions in a single sample.
Example: The ACT study of relationships between bone growth and calcium intake. Studied 14 000 adoleschents from Arizona, California, Hawai’i, Indiana, Nevada and Ohio. 10% were sampled for a more indepth analysis of written comments.
Proportions are for the proportion from each state out of the 14 000, while counts are for the 10% sample:
State | AZ | CA | HI | IN | NV | OH |
Count | 167 | 257 | 257 | 297 | 107 | 482 |
Prob | 0.10 | 0.17 | 0.16 | 0.19 | 0.07 | 0.30 |
Question: Are these samples balanced against the original distribution?
Example: The ACT study of relationships between bone growth and calcium intake. Studied 14 000 adoleschents from Arizona, California, Hawai’i, Indiana, Nevada and Ohio. 10% were sampled for a more indepth analysis of written comments.
Proportions are for the proportion from each state out of the 14 000, while counts are for the 10% sample:
State | AZ | CA | HI | IN | NV | OH |
Count | 167 | 257 | 257 | 297 | 107 | 482 |
Prob | 0.10 | 0.17 | 0.16 | 0.19 | 0.07 | 0.30 |
Expected | 165 | 270 | 257 | 295 | 110 | 472 |
Question: Are these samples balanced against the original distribution?
We add the expected counts by multiplying the proportions with the total count of 1567
Now we have expected and observed counts. This is just like our setup for chi-square testing!
Example: The ACT study of relationships between bone growth and calcium intake. Studied 14 000 adoleschents from Arizona, California, Hawai’i, Indiana, Nevada and Ohio. 10% were sampled for a more indepth analysis of written comments.
Proportions are for the proportion from each state out of the 14 000, while counts are for the 10% sample:
State | AZ | CA | HI | IN | NV | OH |
Count | 167 | 257 | 257 | 297 | 107 | 482 |
Prob | 0.10 | 0.17 | 0.16 | 0.19 | 0.07 | 0.30 |
Expected | 165 | 270 | 257 | 295 | 110 | 472 |
Diff | -2.465 | 12.524 | -0.012 | -2.404 | 2.690 | -10.333 |
X2 | 3.6e-02 | 6.1e-01 | 5.6e-07 | 1.9e-02 | 6.8e-02 | 2.2e-01 |
These combine to a chi-square statistic of \(\chi^2=0.96\). To see whether this is large or small, we will need to know the degrees of freedom for the test.
For a Goodness of Fit test, the degrees of freedom is # categories - 1. For the ACT case, this works out to \(6-1 = 5\).
A 95% threshold is given by qchisq(0.95, df=5)
= 11.07
With the chi-square statistic of 0.96, we have no reason to believe the sample to be unbalanced.
Cramér’s V and Tschuprow’s T both work for Goodness of Fit as well.
Making them work is a bit more tricky: need to compile proportions and counts into a single matrix.
c(T = act.df %>% select(Count, Prob) %>% as.matrix() %>% TschuprowT(),
V = act.df %>% select(Count, Prob) %>% as.matrix() %>% CramerV())
## T V
## 0.00042 0.00062
Recall: John Kerrich tossed a coin 10 000 times in a WW2 prison camp, and got 5 067 heads.
Expected: \(E_i = np_i\)
Chi-square: \(\sum (E_i-x_i)^2/E_i\)
95% threshold for r
different categories is qchisq(0.95, df=r-1)
Recall: John Kerrich tossed a coin 10 000 times in a WW2 prison camp, and got 5 067 heads.
Expected: \(E_i = np_i\) : Heads 5 000, Tails 5 000
Chi-square: \(\sum (E_i-x_i)^2/E_i\)
95% threshold for 1 degree of freedom is 3.84
Pair up, grab an M&M bag.
As of 2017, M&M color distributions were different between factories. In the US, these are Cleveland (CLV) and Hackettstown (HKP). These cities can be found on the packaging. Distributions are:
factory | red | orange | yellow | green | blue | brown |
---|---|---|---|---|---|---|
CLV | 0.13 | 0.20 | 0.14 | 0.20 | 0.21 | 0.12 |
HKP | 0.12 | 0.25 | 0.12 | 0.12 | 0.25 | 0.12 |
For your bag, count the colors, calculate expected counts, chi-square statistic. Check validity and compare the statistic to the 95% threshold value 11.07.
Validity: each \(np_i\) needs to be at least 5.
Chi-square statistic: \(\sum (E_i-O_i)^2/E_i\), where \(E_i = np_i\).
factory | red | orange | yellow | green | blue | brown |
---|---|---|---|---|---|---|
CLV | 0.13 | 0.20 | 0.14 | 0.20 | 0.21 | 0.12 |
HKP | 0.12 | 0.25 | 0.12 | 0.12 | 0.25 | 0.12 |
Count colors, calculate expected counts, chi-square statistic. Check validity and compare the statistic to 11.07.
Validity: each \(np_i\) needs to be at least 5.
Chi-square statistic: \(\sum (E_i-O_i)^2/E_i\), where \(E_i = np_i\).
factory | red | orange | yellow | green | blue | brown | min.n |
---|---|---|---|---|---|---|---|
CLV | 0.131 | 0.205 | 0.135 | 0.198 | 0.207 | 0.124 | 41 |
HKP | 0.125 | 0.250 | 0.125 | 0.125 | 0.250 | 0.125 | 41 |
Count colors, calculate expected counts, chi-square statistic. Check validity and compare the statistic to 11.0705.
Input |
One-way table tab or factor variable x , vector of null hypothesis probabilities p .
|
Null hypothesis |
Proportions of the labels in the variable are as given in p
|
Alternative hypothesis |
Proportions of the labels in the variable are not as given in p
|
Test statistic | \(\chi^2 = \sum (E_{i}-O_{i})^2/E_{i}\) for expected \(E_{i}\) and observed \(O_{i}\). |
This test only has one alternative and has no confidence interval.
Requirements All expected counts should be at least 5.
Command: chisq.test
with arguments tab
and p
test = chisq.test(tab, p=p)
test
##
## Chi-squared test for given probabilities
##
## data: tab
## X-squared = 0.93, df = 5, p-value = 1
Command: chisq.test
with arguments x
and p
tab = table(x)
test = chisq.test(tab, p=p)
test
##
## Chi-squared test for given probabilities
##
## data: tab
## X-squared = 0.93, df = 5, p-value = 1
Effect size:
DescTools
, command CramerV
DescTools
, command TschuprowT
V = CramerV(rbind(tab,p))
T = TschuprowT(rbind(tab,p))
c(V=V, T=T) %>% kable
V | 6e-04 |
T | 4e-04 |
To check validity, run the test, then check test$expected
:
c(`Ei > 5` = all(test$expected > 5))
## Ei > 5
## TRUE