Lecture 23

MVJ

12April, 2018

Two-sample and many-sample proportions

Last week, we looked at two-sample proportions test: given \(x_1\) out of \(n_1\) and \(x_2\) out of \(n_2\), is \(x_1/n_1\) significantly different from \(x_2/n_2\).

Another way of phrasing the question is: in the two-way table with successes and failures vs. population 1 or population 2, do the columns and rows influence each other:

  Population 1 Population 2
Success \(s_1\) \(s_2\)
Failure \(f_1\) \(f_2\)

This way we can also do an \(n\)-sample proportions test:

  Population 1 Population 2 Population n
Success \(s_1\) \(s_2\) \(s_n\)
Failure \(f_1\) \(f_2\) \(f_n\)

Two-way table null and alternative

For the two-way table case, the hypotheses to use are:

Null hypothesis There is no association between the row and column variables.

Alternative hypothesis There is some (unspecified) association between the row and column variables.

This can also be stated as:

Null hypothesis The conditional distributions of the rows conditioned on the columns are all equal.

or as

Null hypothesis The conditional distributions of the columns conditioned on the rows are all equal.

Expected cell counts and the multiplication rule

Suppose the row values are \(x_1,\dots,x_r\) and column values are \(y_1,\dots,y_c\).

If the null hypothesis is true, these two are independent: so \(\mathbb{P}(x_i\text{ and }y_j) = \mathbb{P}(x_i)\cdot\mathbb{P}(y_j)\).

Then we can get an expected cell probability for the two-way table from the row and column probabilities. These we can estimate from the row and column counts:

\[ \left( \begin{array}{cccc|c} x_{11} & x_{21} & x_{31} & x_{41} & x_{*1} \\ x_{12} & x_{22} & x_{32} & x_{42} & x_{*2} \\ x_{13} & x_{23} & x_{33} & x_{43} & x_{*3} \\ \hline x_{1*} & x_{2*} & x_{3*} & x_{4*} & n \\ \end{array} \right) \]

Expected cell counts and the multiplication rule

Suppose the row values are \(x_1,\dots,x_r\) and column values are \(y_1,\dots,y_c\).

If the null hypothesis is true, these two are independent: so \(\mathbb{P}(x_i\text{ and }y_j) = \mathbb{P}(x_i)\cdot\mathbb{P}(y_j)\).

Then we can get an expected cell probability for the two-way table from the row and column probabilities. These we can estimate from the row and column counts:

\[ \left( \begin{array}{cccc|c} p_{11} & p_{21} & p_{31} & p_{41} & p_{*1} \\ p_{12} & p_{22} & p_{32} & p_{42} & p_{*2} \\ p_{13} & p_{23} & p_{33} & p_{43} & p_{*3} \\ \hline p_{1*} & p_{2*} & p_{3*} & p_{4*} & 1 \\ \end{array} \right) \]

As proportions by dividing every entry by \(n\).

Expected cell counts and the multiplication rule

Suppose the row values are \(x_1,\dots,x_r\) and column values are \(y_1,\dots,y_c\).

If the null hypothesis is true, these two are independent: so \(\mathbb{P}(x_i\text{ and }y_j) = \mathbb{P}(x_i)\cdot\mathbb{P}(y_j)\).

Then we can get an expected cell probability for the two-way table from the row and column probabilities. These we can estimate from the row and column counts:

\[ \left( \begin{array}{cccc|c} p_{1*}\cdot p_{*1} & p_{2*}\cdot p_{*1} & p_{3*}\cdot p_{*1} & p_{4*}\cdot p_{*1} & p_{*1} \\ p_{1*}\cdot p_{*2} & p_{2*}\cdot p_{*2} & p_{3*}\cdot p_{*2} & p_{4*}\cdot p_{*2} & p_{*2} \\ p_{1*}\cdot p_{*3} & p_{2*}\cdot p_{*3} & p_{3*}\cdot p_{*3} & p_{4*}\cdot p_{*3} & p_{*3} \\ \hline p_{1*} & p_{2*} & p_{3*} & p_{4*} & 1 \\ \end{array} \right) \]

Under the null hypothesis: each is the product of the row and column proportions.

Expected cell counts and the multiplication rule

Suppose the row values are \(x_1,\dots,x_r\) and column values are \(y_1,\dots,y_c\).

If the null hypothesis is true, these two are independent: so \(\mathbb{P}(x_i\text{ and }y_j) = \mathbb{P}(x_i)\cdot\mathbb{P}(y_j)\).

Then we can get an expected cell probability for the two-way table from the row and column probabilities. These we can estimate from the row and column counts:

\[ \left( \begin{array}{cccc|c} p_{1*}\cdot p_{*1} & p_{2*}\cdot p_{*1} & p_{3*}\cdot p_{*1} & p_{4*}\cdot p_{*1} & p_{*1} \\ p_{1*}\cdot p_{*2} & p_{2*}\cdot p_{*2} & p_{3*}\cdot p_{*2} & p_{4*}\cdot p_{*2} & p_{*2} \\ p_{1*}\cdot p_{*3} & p_{2*}\cdot p_{*3} & p_{3*}\cdot p_{*3} & p_{4*}\cdot p_{*3} & p_{*3} \\ \hline p_{1*} & p_{2*} & p_{3*} & p_{4*} & 1 \\ \end{array} \right) \]

The expected count in each cell is the expected proportions multiplied by the total number of observations:

\[ E_{ij} = p_{i*}\cdot p_{*j}\cdot n = \frac{x_{i*}}{n}\cdot \frac{x_{*j}}{\color{red}{n}}\cdot \color{red}{n} = \frac{x_{i*}\cdot x_{*j}}{n} \]

Expected counts and testing

Now that we have an expected count in each cell, we can build a statistical test from this.

The deviation in each cell is the difference between the observed count and the expected count.

Having a large deviation could be because

  1. The deviation is large, or
  2. the expected count is large

We can normalize to control for 1. by dividing the squared difference by the expected count. As it turns out, doing this and summing up the squared deviations follows a known distribution:

\[ X^2 = \sum\frac{(x_{ij} - E_{ij})^2}{E_{ij}} \sim \chi^2((r-1)(c-1)) \]

Chi-square distribution

rchisq, dchisq, pchisq, qchisq with parameter df for the degrees of freedom parameter.

(No) Confidence intervals, no two-tailed tests

We could create confidence intervals using the \(\chi^2\) statistic to create a confidence interval for \(X^2\) – but this is not a useful quantity.

A large deviation from the expected count is worth taking note of. Only upper-tailed (greater) testing is relevant for the \(\chi^2\) tests.

\(\chi^2\) is what R uses for two-sample tests

Note that the 2-sample test reports X-squared and df.

## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(328, 234) out of c(537, 532)
## X-squared = 30, df = 1, p-value = 3e-08
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.11 0.23
## sample estimates:
## prop 1 prop 2 
##   0.61   0.44

Effect sizes for \(\chi^2\)

Tschuprow’s T: \(T = \sqrt{\frac{\chi^2}{n\sqrt{df}}}\)

Cramér’s V: \(V = \sqrt{\frac{\chi^2}{n\min(r-1, c-1)}}\)

Both take values between 0 and 1. Both are available in the package DescTools. They are categorical versions of \(r^2\) (correlation squared).

Magnitude Cramér’s V
Small \(0 < V < 0.2\)
Medium \(0.2 < V < 0.4\)
Large \(0.4 < V\)

(Cohen 1988: Statistical power and analysis for the behavioral ssciences)

Let’s do it - Instagram

Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)

\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\)

95% threshold: qchisq(0.95, df=df)

Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)

Do you use Instagram?

No Yes
Men 298 234
Women 209 328

Let’s do it - Instagram

Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)

\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\) : 1

95% threshold: 3.84

Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)

Do you use Instagram?

No Yes
Men 298 234
Women 209 328

Let’s do it - Instagram

Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)

\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\) : 1

95% threshold: 3.84

Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)

Do you use Instagram?

No Yes No.Exp Yes.Exp No.Diff Yes.Diff No.X2 Yes.X2
Men 298 234 252 280 46 -46 8.3 7.5
Women 209 328 255 282 -46 46 8.2 7.4

\(\chi^2 = 31.32\) and \(T = 0.17\)

Let’s do it - Vaccine vs Party

Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)

\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\)

95% threshold: qchisq(0.95, df=df)

Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)

Should all children be required to vaccinate?

No Yes
Democratic 230 729
Republican 258 479

Let’s do it - Vaccine vs Party

Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)

\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\) : 1

95% threshold: 3.84

Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)

Should all children be required to vaccinate?

No Yes
Democratic 230 729
Republican 258 479

Let’s do it - Vaccine vs Party

Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)

\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\) : 1

95% threshold: 3.84

Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)

Should all children be required to vaccinate?

No Yes No.Exp Yes.Exp No.Diff Yes.Diff No.X2 Yes.X2
Democratic 230 729 276 683 -46 46 7.7 3.1
Republican 258 479 212 525 46 -46 9.9 4.0

\(\chi^2 = 24.71\) and \(T = 0.12\)

Let’s do it - Student health

Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)

\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\)

95% threshold: qchisq(0.95, df=df)

Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)

How much fruit do you eat - and how much do you exercise?

Lo Mid Hi
Low 69 25 14
Moderate 206 126 111
Vigorous 294 170 169

Let’s do it - Student health

Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)

\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\) : 4

95% threshold: 9.49

Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)

How much fruit do you eat - and how much do you exercise?

Lo Mid Hi
Low 69 25 14
Moderate 206 126 111
Vigorous 294 170 169

Let’s do it - Student health

Expected values \(E_{ij} = r_i\cdot c_j/n\) : row sum multiplied by column sum divided by \(n\)

\(\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}\) : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: \((n_r-1)\cdot(n_c-1)\) : 4

95% threshold: 9.49

Tschuprow’s T: \(\sqrt{\chi^2/{(n\sqrt{df})}}\)

How much fruit do you eat - and how much do you exercise?

Lo Mid Hi Lo.E Mid.E Hi.E Lo.D Mid.D Hi.D Lo.X2 Mid.X2 Hi.X2
Low 69 25 14 52 29 27 17.1 -4.3 -13 5.63 0.63 6.13
Moderate 206 126 111 213 120 110 -6.9 5.9 1 0.22 0.29 0.01
Vigorous 294 170 169 304 172 157 -10.2 -1.6 12 0.34 0.02 0.89

\(\chi^2 = 14.15\) and \(T = 0.08\)

Chi-squared test of independence

Input Two-way table tab or paired factor variables x and y
Null hypothesis Rows and columns are independent
Alternative hypothesis Rows and columns are not independent
Test statistic \(\chi^2 = \sum (E_{ij}-O_{ij})^2/E_{ij}\) for expected \(E_{ij}\) and observed \(O_{ij}\).

This test only has one alternative and has no confidence interval.

Requirements All expected counts should be at least 1, and the average expected counts should be at least 5.

For a \(2\times2\) table, all expected counts should be at least 5.

Chi-squared test of independence

Command: chisq.test with arguments tab

test = chisq.test(tab)
test
## 
##  Pearson's Chi-squared test
## 
## data:  tab
## X-squared = 10, df = 4, p-value = 0.007

Chi-squared test of independence

Effect size:

V = CramerV(tab)
T = TschuprowT(tab)
c(V=V, T=T) %>% kable
V 0.08
T 0.08

Chi-squared test of independence

To check validity, run the test, then check test$expected:

c(`table size` = dim(tab),
  `Eij > 1` = all(test$expected > 1),
  `Eij > 5` = all(test$expected > 5),
  `mean(Eij) > 5` = mean(test$expected) > 5,
  V = CramerV(tab))
table size Eij > 1 Eij > 5 mean(Eij) > 5 V
3 x 3 TRUE TRUE TRUE 0.08

Return to Instagram

Test:

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tab
## X-squared = 30, df = 1, p-value = 3e-08

Validity and effect size:

table size Eij > 1 Eij > 5 mean(Eij) > 5 V
2 x 2 TRUE TRUE TRUE 0.17

Return to Vaccine vs Party

Test:

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tab
## X-squared = 20, df = 1, p-value = 9e-07

Validity and effect size:

table size Eij > 1 Eij > 5 mean(Eij) > 5 V
2 x 2 TRUE TRUE TRUE 0.12

Return to Student Health

Test:

## 
##  Pearson's Chi-squared test
## 
## data:  tab
## X-squared = 10, df = 4, p-value = 0.007

Validity and effect size:

table size Eij > 1 Eij > 5 mean(Eij) > 5 V
3 x 3 TRUE TRUE TRUE 0.08

Goodness of Fit

Instead of deriving our expected counts from the data, we could just prescribe proportions for each cell.

This is usually good for checking whether the data fits a particular distribution.

The chi-square test for independence generalized from a two-sample proportions test to many samples.

The chi-square test for goodness of fit generalizes from a one-sample proportions test to many proportions in a single sample.

Goodness of Fit

Example: The ACT study of relationships between bone growth and calcium intake. Studied 14 000 adoleschents from Arizona, California, Hawai’i, Indiana, Nevada and Ohio. 10% were sampled for a more indepth analysis of written comments.

Proportions are for the proportion from each state out of the 14 000, while counts are for the 10% sample:

State AZ CA HI IN NV OH
Count 167 257 257 297 107 482
Prob 0.10 0.17 0.16 0.19 0.07 0.30

Question: Are these samples balanced against the original distribution?

Goodness of Fit

Example: The ACT study of relationships between bone growth and calcium intake. Studied 14 000 adoleschents from Arizona, California, Hawai’i, Indiana, Nevada and Ohio. 10% were sampled for a more indepth analysis of written comments.

Proportions are for the proportion from each state out of the 14 000, while counts are for the 10% sample:

State AZ CA HI IN NV OH
Count 167 257 257 297 107 482
Prob 0.10 0.17 0.16 0.19 0.07 0.30
Expected 165 270 257 295 110 472

Question: Are these samples balanced against the original distribution?

We add the expected counts by multiplying the proportions with the total count of 1567

Now we have expected and observed counts. This is just like our setup for chi-square testing!

Goodness of Fit

Example: The ACT study of relationships between bone growth and calcium intake. Studied 14 000 adoleschents from Arizona, California, Hawai’i, Indiana, Nevada and Ohio. 10% were sampled for a more indepth analysis of written comments.

Proportions are for the proportion from each state out of the 14 000, while counts are for the 10% sample:

State AZ CA HI IN NV OH
Count 167 257 257 297 107 482
Prob 0.10 0.17 0.16 0.19 0.07 0.30
Expected 165 270 257 295 110 472
Diff -2.465 12.524 -0.012 -2.404 2.690 -10.333
X2 3.6e-02 6.1e-01 5.6e-07 1.9e-02 6.8e-02 2.2e-01

These combine to a chi-square statistic of \(\chi^2=0.96\). To see whether this is large or small, we will need to know the degrees of freedom for the test.

Degrees of Freedom for Goodness of Fit

For a Goodness of Fit test, the degrees of freedom is # categories - 1. For the ACT case, this works out to \(6-1 = 5\).

A 95% threshold is given by qchisq(0.95, df=5) = 11.07

With the chi-square statistic of 0.96, we have no reason to believe the sample to be unbalanced.

Effect size

Cramér’s V and Tschuprow’s T both work for Goodness of Fit as well.

Making them work is a bit more tricky: need to compile proportions and counts into a single matrix.

c(T = act.df %>% select(Count, Prob) %>% as.matrix() %>% TschuprowT(),
  V = act.df %>% select(Count, Prob) %>% as.matrix() %>% CramerV())
##       T       V 
## 0.00042 0.00062

Let’s do it - Kerrich’s coins

Recall: John Kerrich tossed a coin 10 000 times in a WW2 prison camp, and got 5 067 heads.

Expected: \(E_i = np_i\)

Chi-square: \(\sum (E_i-x_i)^2/E_i\)

95% threshold for r different categories is qchisq(0.95, df=r-1)

Let’s do it - Kerrich’s coins

Recall: John Kerrich tossed a coin 10 000 times in a WW2 prison camp, and got 5 067 heads.

Expected: \(E_i = np_i\) : Heads 5 000, Tails 5 000

Chi-square: \(\sum (E_i-x_i)^2/E_i\)

95% threshold for 1 degree of freedom is 3.84

Let’s do it - M&M distribution

Pair up, grab an M&M bag.

As of 2017, M&M color distributions were different between factories. In the US, these are Cleveland (CLV) and Hackettstown (HKP). These cities can be found on the packaging. Distributions are:

factory red orange yellow green blue brown
CLV 0.13 0.20 0.14 0.20 0.21 0.12
HKP 0.12 0.25 0.12 0.12 0.25 0.12

For your bag, count the colors, calculate expected counts, chi-square statistic. Check validity and compare the statistic to the 95% threshold value 11.07.

Let’s do it - M&M distribution

Validity: each \(np_i\) needs to be at least 5.

Chi-square statistic: \(\sum (E_i-O_i)^2/E_i\), where \(E_i = np_i\).

factory red orange yellow green blue brown
CLV 0.13 0.20 0.14 0.20 0.21 0.12
HKP 0.12 0.25 0.12 0.12 0.25 0.12

Count colors, calculate expected counts, chi-square statistic. Check validity and compare the statistic to 11.07.

Let’s do it - M&M distribution

Validity: each \(np_i\) needs to be at least 5.

Chi-square statistic: \(\sum (E_i-O_i)^2/E_i\), where \(E_i = np_i\).

factory red orange yellow green blue brown min.n
CLV 0.131 0.205 0.135 0.198 0.207 0.124 41
HKP 0.125 0.250 0.125 0.125 0.250 0.125 41

Count colors, calculate expected counts, chi-square statistic. Check validity and compare the statistic to 11.0705.

Chi-squared test of Goodness of Fit

Input One-way table tab or factor variable x, vector of null hypothesis probabilities p.
Null hypothesis Proportions of the labels in the variable are as given in p
Alternative hypothesis Proportions of the labels in the variable are not as given in p
Test statistic \(\chi^2 = \sum (E_{i}-O_{i})^2/E_{i}\) for expected \(E_{i}\) and observed \(O_{i}\).

This test only has one alternative and has no confidence interval.

Requirements All expected counts should be at least 5.

Chi-squared test of Goodness of Fit

Command: chisq.test with arguments tab and p

test = chisq.test(tab, p=p)
test
## 
##  Chi-squared test for given probabilities
## 
## data:  tab
## X-squared = 0.93, df = 5, p-value = 1

Chi-squared test of Goodness of Fit

Command: chisq.test with arguments x and p

tab = table(x)
test = chisq.test(tab, p=p)
test
## 
##  Chi-squared test for given probabilities
## 
## data:  tab
## X-squared = 0.93, df = 5, p-value = 1

Chi-squared test of Goodness of Fit

Effect size:

V = CramerV(rbind(tab,p))
T = TschuprowT(rbind(tab,p))
c(V=V, T=T) %>% kable
V 6e-04
T 4e-04

Chi-squared test of Goodness of Fit

To check validity, run the test, then check test$expected:

c(`Ei > 5` = all(test$expected > 5))
## Ei > 5 
##   TRUE