Lecture 23

MVJ

12April, 2018

Two-sample and many-sample proportions

Last week, we looked at two-sample proportions test: given $x_1$ out of $n_1$ and $x_2$ out of $n_2$, is $x_1/n_1$ significantly different from $x_2/n_2$.

Another way of phrasing the question is: in the two-way table with successes and failures vs. population 1 or population 2, do the columns and rows influence each other:

	Population 1	Population 2
Success	$s_1$	$s_2$
Failure	$f_1$	$f_2$

This way we can also do an $n$-sample proportions test:

	Population 1	Population 2	…	Population n
Success	$s_1$	$s_2$	…	$s_n$
Failure	$f_1$	$f_2$	…	$f_n$

Two-way table null and alternative

For the two-way table case, the hypotheses to use are:

Null hypothesis There is no association between the row and column variables.

Alternative hypothesis There is some (unspecified) association between the row and column variables.

This can also be stated as:

Null hypothesis The conditional distributions of the rows conditioned on the columns are all equal.

or as

Null hypothesis The conditional distributions of the columns conditioned on the rows are all equal.

Expected cell counts and the multiplication rule

Suppose the row values are $x_1,\dots,x_r$ and column values are $y_1,\dots,y_c$.

If the null hypothesis is true, these two are independent: so $\mathbb{P}(x_i\text{ and }y_j) = \mathbb{P}(x_i)\cdot\mathbb{P}(y_j)$.

Then we can get an expected cell probability for the two-way table from the row and column probabilities. These we can estimate from the row and column counts:

\[ \left( \begin{array}{cccc|c} x_{11} & x_{21} & x_{31} & x_{41} & x_{*1} \\ x_{12} & x_{22} & x_{32} & x_{42} & x_{*2} \\ x_{13} & x_{23} & x_{33} & x_{43} & x_{*3} \\ \hline x_{1*} & x_{2*} & x_{3*} & x_{4*} & n \\ \end{array} \right) \]

Expected cell counts and the multiplication rule

Suppose the row values are $x_1,\dots,x_r$ and column values are $y_1,\dots,y_c$.

If the null hypothesis is true, these two are independent: so $\mathbb{P}(x_i\text{ and }y_j) = \mathbb{P}(x_i)\cdot\mathbb{P}(y_j)$.

Then we can get an expected cell probability for the two-way table from the row and column probabilities. These we can estimate from the row and column counts:

\[ \left( \begin{array}{cccc|c} p_{11} & p_{21} & p_{31} & p_{41} & p_{*1} \\ p_{12} & p_{22} & p_{32} & p_{42} & p_{*2} \\ p_{13} & p_{23} & p_{33} & p_{43} & p_{*3} \\ \hline p_{1*} & p_{2*} & p_{3*} & p_{4*} & 1 \\ \end{array} \right) \]

As proportions by dividing every entry by $n$.

Expected cell counts and the multiplication rule

Suppose the row values are $x_1,\dots,x_r$ and column values are $y_1,\dots,y_c$.

If the null hypothesis is true, these two are independent: so $\mathbb{P}(x_i\text{ and }y_j) = \mathbb{P}(x_i)\cdot\mathbb{P}(y_j)$.

Then we can get an expected cell probability for the two-way table from the row and column probabilities. These we can estimate from the row and column counts:

\[ \left( \begin{array}{cccc|c} p_{1*}\cdot p_{*1} & p_{2*}\cdot p_{*1} & p_{3*}\cdot p_{*1} & p_{4*}\cdot p_{*1} & p_{*1} \\ p_{1*}\cdot p_{*2} & p_{2*}\cdot p_{*2} & p_{3*}\cdot p_{*2} & p_{4*}\cdot p_{*2} & p_{*2} \\ p_{1*}\cdot p_{*3} & p_{2*}\cdot p_{*3} & p_{3*}\cdot p_{*3} & p_{4*}\cdot p_{*3} & p_{*3} \\ \hline p_{1*} & p_{2*} & p_{3*} & p_{4*} & 1 \\ \end{array} \right) \]

Under the null hypothesis: each is the product of the row and column proportions.

Expected cell counts and the multiplication rule

Suppose the row values are $x_1,\dots,x_r$ and column values are $y_1,\dots,y_c$.

If the null hypothesis is true, these two are independent: so $\mathbb{P}(x_i\text{ and }y_j) = \mathbb{P}(x_i)\cdot\mathbb{P}(y_j)$.

Then we can get an expected cell probability for the two-way table from the row and column probabilities. These we can estimate from the row and column counts:

\[ \left( \begin{array}{cccc|c} p_{1*}\cdot p_{*1} & p_{2*}\cdot p_{*1} & p_{3*}\cdot p_{*1} & p_{4*}\cdot p_{*1} & p_{*1} \\ p_{1*}\cdot p_{*2} & p_{2*}\cdot p_{*2} & p_{3*}\cdot p_{*2} & p_{4*}\cdot p_{*2} & p_{*2} \\ p_{1*}\cdot p_{*3} & p_{2*}\cdot p_{*3} & p_{3*}\cdot p_{*3} & p_{4*}\cdot p_{*3} & p_{*3} \\ \hline p_{1*} & p_{2*} & p_{3*} & p_{4*} & 1 \\ \end{array} \right) \]

The expected count in each cell is the expected proportions multiplied by the total number of observations:

\[ E_{ij} = p_{i*}\cdot p_{*j}\cdot n = \frac{x_{i*}}{n}\cdot \frac{x_{*j}}{\color{red}{n}}\cdot \color{red}{n} = \frac{x_{i*}\cdot x_{*j}}{n} \]

Expected counts and testing

Now that we have an expected count in each cell, we can build a statistical test from this.

The deviation in each cell is the difference between the observed count and the expected count.

Having a large deviation could be because

The deviation is large, or
the expected count is large

We can normalize to control for 1. by dividing the squared difference by the expected count. As it turns out, doing this and summing up the squared deviations follows a known distribution:

\[ X^2 = \sum\frac{(x_{ij} - E_{ij})^2}{E_{ij}} \sim \chi^2((r-1)(c-1)) \]

Chi-square distribution

rchisq, dchisq, pchisq, qchisq with parameter df for the degrees of freedom parameter.

(No) Confidence intervals, no two-tailed tests

We could create confidence intervals using the $\chi^2$ statistic to create a confidence interval for $X^2$ – but this is not a useful quantity.

A large deviation from the expected count is worth taking note of. Only upper-tailed (greater) testing is relevant for the $\chi^2$ tests.

$\chi^2$ is what R uses for two-sample tests

Note that the 2-sample test reports X-squared and df.

## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(328, 234) out of c(537, 532)
## X-squared = 30, df = 1, p-value = 3e-08
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.11 0.23
## sample estimates:
## prop 1 prop 2 
##   0.61   0.44

Effect sizes for $\chi^2$

Tschuprow’s T: $T = \sqrt{\frac{\chi^2}{n\sqrt{df}}}$

Cramér’s V: $V = \sqrt{\frac{\chi^2}{n\min(r-1, c-1)}}$

Both take values between 0 and 1. Both are available in the package DescTools. They are categorical versions of $r^2$ (correlation squared).

Magnitude	Cramér’s V
Small	$0 < V < 0.2$
Medium	$0.2 < V < 0.4$
Large	$0.4 < V$

(Cohen 1988: Statistical power and analysis for the behavioral ssciences)

Let’s do it - Instagram

Expected values $E_{ij} = r_i\cdot c_j/n$ : row sum multiplied by column sum divided by $n$

$\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}$ : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: $(n_r-1)\cdot(n_c-1)$

95% threshold: qchisq(0.95, df=df)

Tschuprow’s T: $\sqrt{\chi^2/{(n\sqrt{df})}}$

Do you use Instagram?

	No	Yes
Men	298	234
Women	209	328

Let’s do it - Instagram

Expected values $E_{ij} = r_i\cdot c_j/n$ : row sum multiplied by column sum divided by $n$

$\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}$ : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: $(n_r-1)\cdot(n_c-1)$ : 1

95% threshold: 3.84

Tschuprow’s T: $\sqrt{\chi^2/{(n\sqrt{df})}}$

Do you use Instagram?

	No	Yes
Men	298	234
Women	209	328

Let’s do it - Instagram

Expected values $E_{ij} = r_i\cdot c_j/n$ : row sum multiplied by column sum divided by $n$

$\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}$ : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: $(n_r-1)\cdot(n_c-1)$ : 1

95% threshold: 3.84

Tschuprow’s T: $\sqrt{\chi^2/{(n\sqrt{df})}}$

Do you use Instagram?

	No	Yes	No.Exp	Yes.Exp	No.Diff	Yes.Diff	No.X2	Yes.X2
Men	298	234	252	280	46	-46	8.3	7.5
Women	209	328	255	282	-46	46	8.2	7.4

$\chi^2 = 31.32$ and $T = 0.17$

Let’s do it - Vaccine vs Party

Expected values $E_{ij} = r_i\cdot c_j/n$ : row sum multiplied by column sum divided by $n$

$\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}$ : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: $(n_r-1)\cdot(n_c-1)$

95% threshold: qchisq(0.95, df=df)

Tschuprow’s T: $\sqrt{\chi^2/{(n\sqrt{df})}}$

Should all children be required to vaccinate?

	No	Yes
Democratic	230	729
Republican	258	479

Let’s do it - Vaccine vs Party

Expected values $E_{ij} = r_i\cdot c_j/n$ : row sum multiplied by column sum divided by $n$

$\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}$ : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: $(n_r-1)\cdot(n_c-1)$ : 1

95% threshold: 3.84

Tschuprow’s T: $\sqrt{\chi^2/{(n\sqrt{df})}}$

Should all children be required to vaccinate?

	No	Yes
Democratic	230	729
Republican	258	479

Let’s do it - Vaccine vs Party

Expected values $E_{ij} = r_i\cdot c_j/n$ : row sum multiplied by column sum divided by $n$

$\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}$ : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: $(n_r-1)\cdot(n_c-1)$ : 1

95% threshold: 3.84

Tschuprow’s T: $\sqrt{\chi^2/{(n\sqrt{df})}}$

Should all children be required to vaccinate?

	No	Yes	No.Exp	Yes.Exp	No.Diff	Yes.Diff	No.X2	Yes.X2
Democratic	230	729	276	683	-46	46	7.7	3.1
Republican	258	479	212	525	46	-46	9.9	4.0

$\chi^2 = 24.71$ and $T = 0.12$

Let’s do it - Student health

Expected values $E_{ij} = r_i\cdot c_j/n$ : row sum multiplied by column sum divided by $n$

$\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}$ : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: $(n_r-1)\cdot(n_c-1)$

95% threshold: qchisq(0.95, df=df)

Tschuprow’s T: $\sqrt{\chi^2/{(n\sqrt{df})}}$

How much fruit do you eat - and how much do you exercise?

	Lo	Mid	Hi
Low	69	25	14
Moderate	206	126	111
Vigorous	294	170	169

Let’s do it - Student health

Expected values $E_{ij} = r_i\cdot c_j/n$ : row sum multiplied by column sum divided by $n$

$\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}$ : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: $(n_r-1)\cdot(n_c-1)$ : 4

95% threshold: 9.49

Tschuprow’s T: $\sqrt{\chi^2/{(n\sqrt{df})}}$

How much fruit do you eat - and how much do you exercise?

	Lo	Mid	Hi
Low	69	25	14
Moderate	206	126	111
Vigorous	294	170	169

Let’s do it - Student health

Expected values $E_{ij} = r_i\cdot c_j/n$ : row sum multiplied by column sum divided by $n$

$\chi^2 = \sum (E_{ij} - x_{ij})^2/E_{ij}$ : add up (expected minus observed) squared divided by expected.

Degrees of Freedom: $(n_r-1)\cdot(n_c-1)$ : 4

95% threshold: 9.49

Tschuprow’s T: $\sqrt{\chi^2/{(n\sqrt{df})}}$

How much fruit do you eat - and how much do you exercise?

	Lo	Mid	Hi	Lo.E	Mid.E	Hi.E	Lo.D	Mid.D	Hi.D	Lo.X2	Mid.X2	Hi.X2
Low	69	25	14	52	29	27	17.1	-4.3	-13	5.63	0.63	6.13
Moderate	206	126	111	213	120	110	-6.9	5.9	1	0.22	0.29	0.01
Vigorous	294	170	169	304	172	157	-10.2	-1.6	12	0.34	0.02	0.89

$\chi^2 = 14.15$ and $T = 0.08$

Chi-squared test of independence

Input	Two-way table `tab` or paired factor variables `x` and `y`
Null hypothesis	Rows and columns are independent
Alternative hypothesis	Rows and columns are not independent
Test statistic	$\chi^2 = \sum (E_{ij}-O_{ij})^2/E_{ij}$ for expected $E_{ij}$ and observed $O_{ij}$.

This test only has one alternative and has no confidence interval.

Requirements All expected counts should be at least 1, and the average expected counts should be at least 5.

For a $2\times2$ table, all expected counts should be at least 5.

Chi-squared test of independence

Command: chisq.test with arguments tab

test = chisq.test(tab)
test

## 
##  Pearson's Chi-squared test
## 
## data:  tab
## X-squared = 10, df = 4, p-value = 0.007

Chi-squared test of independence

Effect size:

Cramér’s $V$ - library DescTools, command CramerV
Tschuprow’s $T$ - library DescTools, command TschuprowT

V = CramerV(tab)
T = TschuprowT(tab)
c(V=V, T=T) %>% kable

V	0.08
T	0.08

Chi-squared test of independence

To check validity, run the test, then check test$expected:

c(`table size` = dim(tab),
  `Eij > 1` = all(test$expected > 1),
  `Eij > 5` = all(test$expected > 5),
  `mean(Eij) > 5` = mean(test$expected) > 5,
  V = CramerV(tab))

table size	Eij > 1	Eij > 5	mean(Eij) > 5	V
3 x 3	TRUE	TRUE	TRUE	0.08

Return to Instagram

Test:

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tab
## X-squared = 30, df = 1, p-value = 3e-08

Validity and effect size:

table size	Eij > 1	Eij > 5	mean(Eij) > 5	V
2 x 2	TRUE	TRUE	TRUE	0.17

Return to Vaccine vs Party

Test:

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  tab
## X-squared = 20, df = 1, p-value = 9e-07

Validity and effect size:

table size	Eij > 1	Eij > 5	mean(Eij) > 5	V
2 x 2	TRUE	TRUE	TRUE	0.12

Return to Student Health

Test:

## 
##  Pearson's Chi-squared test
## 
## data:  tab
## X-squared = 10, df = 4, p-value = 0.007

Validity and effect size:

table size	Eij > 1	Eij > 5	mean(Eij) > 5	V
3 x 3	TRUE	TRUE	TRUE	0.08

Goodness of Fit

Instead of deriving our expected counts from the data, we could just prescribe proportions for each cell.

This is usually good for checking whether the data fits a particular distribution.

The chi-square test for independence generalized from a two-sample proportions test to many samples.

The chi-square test for goodness of fit generalizes from a one-sample proportions test to many proportions in a single sample.

Goodness of Fit

Example: The ACT study of relationships between bone growth and calcium intake. Studied 14 000 adoleschents from Arizona, California, Hawai’i, Indiana, Nevada and Ohio. 10% were sampled for a more indepth analysis of written comments.

Proportions are for the proportion from each state out of the 14 000, while counts are for the 10% sample:

State	AZ	CA	HI	IN	NV	OH
Count	167	257	257	297	107	482
Prob	0.10	0.17	0.16	0.19	0.07	0.30

Question: Are these samples balanced against the original distribution?

Goodness of Fit

Example: The ACT study of relationships between bone growth and calcium intake. Studied 14 000 adoleschents from Arizona, California, Hawai’i, Indiana, Nevada and Ohio. 10% were sampled for a more indepth analysis of written comments.

Proportions are for the proportion from each state out of the 14 000, while counts are for the 10% sample:

State	AZ	CA	HI	IN	NV	OH
Count	167	257	257	297	107	482
Prob	0.10	0.17	0.16	0.19	0.07	0.30
Expected	165	270	257	295	110	472

Question: Are these samples balanced against the original distribution?

We add the expected counts by multiplying the proportions with the total count of 1567

Now we have expected and observed counts. This is just like our setup for chi-square testing!

Goodness of Fit

Example: The ACT study of relationships between bone growth and calcium intake. Studied 14 000 adoleschents from Arizona, California, Hawai’i, Indiana, Nevada and Ohio. 10% were sampled for a more indepth analysis of written comments.

Proportions are for the proportion from each state out of the 14 000, while counts are for the 10% sample:

State	AZ	CA	HI	IN	NV	OH
Count	167	257	257	297	107	482
Prob	0.10	0.17	0.16	0.19	0.07	0.30
Expected	165	270	257	295	110	472
Diff	-2.465	12.524	-0.012	-2.404	2.690	-10.333
X2	3.6e-02	6.1e-01	5.6e-07	1.9e-02	6.8e-02	2.2e-01

These combine to a chi-square statistic of $\chi^2=0.96$. To see whether this is large or small, we will need to know the degrees of freedom for the test.

Degrees of Freedom for Goodness of Fit

For a Goodness of Fit test, the degrees of freedom is # categories - 1. For the ACT case, this works out to $6-1 = 5$.

A 95% threshold is given by qchisq(0.95, df=5) = 11.07

With the chi-square statistic of 0.96, we have no reason to believe the sample to be unbalanced.

Effect size

Cramér’s V and Tschuprow’s T both work for Goodness of Fit as well.

Making them work is a bit more tricky: need to compile proportions and counts into a single matrix.

c(T = act.df %>% select(Count, Prob) %>% as.matrix() %>% TschuprowT(),
  V = act.df %>% select(Count, Prob) %>% as.matrix() %>% CramerV())

##       T       V 
## 0.00042 0.00062

Let’s do it - Kerrich’s coins

Recall: John Kerrich tossed a coin 10 000 times in a WW2 prison camp, and got 5 067 heads.

Expected: $E_i = np_i$

Chi-square: $\sum (E_i-x_i)^2/E_i$

Heads difference ?, Tails difference ?
Heads chi square contribution ?, Tails contribution ?
Chi-square: ?

95% threshold for r different categories is qchisq(0.95, df=r-1)

Let’s do it - Kerrich’s coins

Recall: John Kerrich tossed a coin 10 000 times in a WW2 prison camp, and got 5 067 heads.

Expected: $E_i = np_i$ : Heads 5 000, Tails 5 000

Chi-square: $\sum (E_i-x_i)^2/E_i$

Heads difference 67, Tails difference -67
Heads chi square contribution 0.9, Tails contribution 0.9
Chi-square: 1.8

95% threshold for 1 degree of freedom is 3.84

Let’s do it - M&M distribution

Pair up, grab an M&M bag.

As of 2017, M&M color distributions were different between factories. In the US, these are Cleveland (CLV) and Hackettstown (HKP). These cities can be found on the packaging. Distributions are:

factory	red	orange	yellow	green	blue	brown
CLV	0.13	0.20	0.14	0.20	0.21	0.12
HKP	0.12	0.25	0.12	0.12	0.25	0.12

For your bag, count the colors, calculate expected counts, chi-square statistic. Check validity and compare the statistic to the 95% threshold value 11.07.

Let’s do it - M&M distribution

Validity: each $np_i$ needs to be at least 5.

Chi-square statistic: $\sum (E_i-O_i)^2/E_i$, where $E_i = np_i$.

factory	red	orange	yellow	green	blue	brown
CLV	0.13	0.20	0.14	0.20	0.21	0.12
HKP	0.12	0.25	0.12	0.12	0.25	0.12

Count colors, calculate expected counts, chi-square statistic. Check validity and compare the statistic to 11.07.

Let’s do it - M&M distribution

Validity: each $np_i$ needs to be at least 5.

Chi-square statistic: $\sum (E_i-O_i)^2/E_i$, where $E_i = np_i$.

factory	red	orange	yellow	green	blue	brown	min.n
CLV	0.131	0.205	0.135	0.198	0.207	0.124	41
HKP	0.125	0.250	0.125	0.125	0.250	0.125	41

Count colors, calculate expected counts, chi-square statistic. Check validity and compare the statistic to 11.0705.

Chi-squared test of Goodness of Fit

Input	One-way table `tab` or factor variable `x`, vector of null hypothesis probabilities `p`.
Null hypothesis	Proportions of the labels in the variable are as given in `p`
Alternative hypothesis	Proportions of the labels in the variable are not as given in `p`
Test statistic	$\chi^2 = \sum (E_{i}-O_{i})^2/E_{i}$ for expected $E_{i}$ and observed $O_{i}$.

This test only has one alternative and has no confidence interval.

Requirements All expected counts should be at least 5.

Chi-squared test of Goodness of Fit

Command: chisq.test with arguments tab and p

test = chisq.test(tab, p=p)
test

## 
##  Chi-squared test for given probabilities
## 
## data:  tab
## X-squared = 0.93, df = 5, p-value = 1

Chi-squared test of Goodness of Fit

Command: chisq.test with arguments x and p

tab = table(x)
test = chisq.test(tab, p=p)
test

## 
##  Chi-squared test for given probabilities
## 
## data:  tab
## X-squared = 0.93, df = 5, p-value = 1

Chi-squared test of Goodness of Fit

Effect size:

Cramér’s $V$ - library DescTools, command CramerV
Tschuprow’s $T$ - library DescTools, command TschuprowT

V = CramerV(rbind(tab,p))
T = TschuprowT(rbind(tab,p))
c(V=V, T=T) %>% kable

V	6e-04
T	4e-04

Chi-squared test of Goodness of Fit

To check validity, run the test, then check test$expected:

c(`Ei > 5` = all(test$expected > 5))

## Ei > 5 
##   TRUE

	Population 1	Population 2	…	Population n
Success	\(s_1\)	\(s_2\)	…	\(s_n\)
Failure	\(f_1\)	\(f_2\)	…	\(f_n\)

Magnitude	Cramér’s V
Small	\(0 < V < 0.2\)
Medium	\(0.2 < V < 0.4\)
Large	\(0.4 < V\)

Lecture 23

Two-sample and many-sample proportions

Two-way table null and alternative

Expected cell counts and the multiplication rule

Expected cell counts and the multiplication rule

Expected cell counts and the multiplication rule

Expected cell counts and the multiplication rule

Expected counts and testing

Chi-square distribution

(No) Confidence intervals, no two-tailed tests

\(\chi^2\) is what R uses for two-sample tests

Effect sizes for \(\chi^2\)

Let’s do it - Instagram

Let’s do it - Instagram

Let’s do it - Instagram

Let’s do it - Vaccine vs Party

Let’s do it - Vaccine vs Party

Let’s do it - Vaccine vs Party

Let’s do it - Student health

Let’s do it - Student health

Let’s do it - Student health

Chi-squared test of independence

Chi-squared test of independence

Chi-squared test of independence

Chi-squared test of independence

Return to Instagram

Return to Vaccine vs Party

Return to Student Health

Goodness of Fit

Goodness of Fit

Goodness of Fit

Goodness of Fit

Degrees of Freedom for Goodness of Fit

Effect size

Let’s do it - Kerrich’s coins

Let’s do it - Kerrich’s coins

Let’s do it - M&M distribution

Let’s do it - M&M distribution

Let’s do it - M&M distribution

Chi-squared test of Goodness of Fit

Chi-squared test of Goodness of Fit

Chi-squared test of Goodness of Fit

Chi-squared test of Goodness of Fit

Chi-squared test of Goodness of Fit