Lecture 21

MVJ

12April, 2018

Categorical data

For numeric data, we use the mean as the primary testable sample statistic, and the t-test to test for means and differences in means.

For categorical data, the mean does not exist: for categorical data we can count occurrences but not do much else.

Any test on categorical data has to be a test that counts occurrences and compares the count (or a proportion) to an expected distribution.

Distributions of counts and proportions

Recall the binomial situation criteria:

If these are fulfilled, then the count of successes follows the binomial distribution.

If these were to be fulfilled, we would know the sample distribution of the count \(\hat n\) through the binomial distribution.

Exact testing for binomial counts

To use the binomial distribution to test, the setup would take the shape of:

This method uses the exact sampling distribution, but the confidence intervals tend to be a little bit wider (less exact) than they could be.

Effect sizes for counts and proportions

Most common - and most straightforward to interpret - effect sizes for counts and proportions are the odds ratio and relative risk.

If the probability of success is \(p\), then the odds is defined as \(p/(1-p)\). To compare two different odds, we can take the ratio of the odds: \[ OR = \frac{p_1/(1-p_1)}{p_2/(1-p_2)} \]

For a concrete example: in Roulette, there are 38 possible outcomes. A bet on red is a simultaneous bet on 18 of these outcomes, for a probability of \(18/38\approx0.47\). This gives us odds of \((18/38) / (20/38) \approx 0.9\) or about 9 to 10.

A bet on a column is a simultaneous bet on 12 of the outcomes for a probability of \(12/38\approx0.32\). This gives us odds of \((12/38) / (26/38) \approx 0.46\) or about 1 to 2.

The odds ratio of a bet on a column over a bet on red is going to be \[ \frac{(18/38) / (20/38)}{(12/38) / (26/38)} \approx 1.95 \] increasing the odds of winning to about the double if we move from a column bet to a red bet.

Effect sizes for counts and proportions

Most common - and most straightforward to interpret - effect sizes for counts and proportions are the odds ratio and relative risk.

The relative risk is the quotient of probabilities and measures the expected increase / decrease in success between the groups (or between the null hypothesis and the observed values): \[ RR = \frac{p_1}{p_2} \]

Continuing the Roulette example:

A bet on red is a simultaneous bet on 18 of 38 outcomes, for a probability of \(18/38\approx0.47\).

A bet on a column is a simultaneous bet on 12 of the outcomes for a probability of \(12/38\approx0.32\).

The relative risk of a red bet over a column bet is \[ RR = \frac{18/38}{12/38} = \frac{18}{12} = 1.5 \]

We would expect to win 50% more often with a red bet than with a column bet.

Effect sizes for counts and proportions

Most common - and most straightforward to interpret - effect sizes for counts and proportions are the odds ratio and relative risk.

Suppose we compare proportions between observed and null hypothesis - or between experiment and control.

RR is OR is Interpretation
=1 =1 No effect
<1 <1 Lower rate than hypothesis - lower rate than control
>1 >1 Higher rate than hypothesis - higher rate than control

Binomial test of a single proportion

If using the library(mosaic) extension, the first entry (or level) will be used as success, all others as failure. The function relevel can be used to reorder so that the correct level is used for testing.

Requirements The data is expected to come from a binomial setting. This is not illustrated, but rather argued from the data collection and description directly.

Binomial test of a single proportion

Command: binom.test with arguments x, n, p, alternative, conf.level for x the number of successes and n the number of trials:

test = binom.test(x, n, p=p.0, alternative="two.sided", conf.level=0.99)
test
## 
## 
## 
## data:  x out of n
## number of successes = 7, number of trials = 20, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.3333333
## 99 percent confidence interval:
##  0.1138798 0.6565686
## sample estimates:
## probability of success 
##                   0.35

Binomial test of a single proportion

Command: binom.test with arguments x, n, alternative, conf.level for s the number of successes and f the number of failures:

test = binom.test(c(s,f), p=p.0, alternative="two.sided", conf.level=0.99)
test
## 
## 
## 
## data:  c(s, f)
## number of successes = 7, number of trials = 20, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.3333333
## 99 percent confidence interval:
##  0.1138798 0.6565686
## sample estimates:
## probability of success 
##                   0.35

Binomial test of a single proportion

Command: binom.test with arguments x, n, alternative, conf.level for df$v a categorical variable and "label" the value representing success:

df$v = relevel(df$v, "label")
test = binom.test(~v, data=df, p=p.0, alternative="two.sided", conf.level=0.99)
test
## 
## 
## 
## data:  df$v  [with success = label]
## number of successes = 7, number of trials = 20, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.3333333
## 99 percent confidence interval:
##  0.1138798 0.6565686
## sample estimates:
## probability of success 
##                   0.35

Binomial test of a single proportion

Command: binom.test with arguments x, n, alternative, conf.level for df$v a categorical variable and "label" the value representing success:

test = binom.test(df$v == "label", p=p.0, alternative="two.sided", conf.level=0.99)
test
## 
## 
## 
## data:  df$v == "label"  [with success = TRUE]
## number of successes = 7, number of trials = 20, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.3333333
## 99 percent confidence interval:
##  0.1138798 0.6565686
## sample estimates:
## probability of success 
##                   0.35

Binomial test of a single proportion

Effect size: Odds ratio or Relative risk

\[ p = \frac{x}{n} \qquad OR = \frac{p/(1-p)}{p_0/(1-p_0)} \qquad RR = \frac{p}{p_0} \]

For the one-sample case, this requires slightly more code. With x the number of successes and n the number of trials:

observed = matrix(c(p.0, (1-p.0), x, n-x), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
  relrisk(observed))
##       OR       RR 
## 1.076923 1.050000

Binomial test of a single proportion

Effect size: Odds ratio or Relative risk

\[ p = \frac{s}{s+f} \qquad OR = \frac{p/(1-p)}{p_0/(1-p_0)} \qquad RR = \frac{p}{p_0} \]

For the one-sample case, this requires slightly more code. With s the number of successes and f the number of failures:

observed = matrix(c(p.0, (1-p.0), s, f), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
  relrisk(observed))
##       OR       RR 
## 1.076923 1.050000

Binomial test of a single proportion

Effect size: Odds ratio or Relative risk

\[ p = \frac{\text{sum(df\$v == "label")}}{\text{nrow(df)}} \qquad OR = \frac{p/(1-p)}{p_0/(1-p_0)} \qquad RR = \frac{p}{p_0} \]

For the one-sample case, this requires slightly more code. With df$v a categorical variable and "label" the value representing success:

observed = matrix(c(p.0, (1-p.0), table(df$v != "label")), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
  relrisk(observed))
##       OR       RR 
## 1.076923 1.050000

Normal approximations

For large enough samples, both the sample count and the sample proportion follow approximately a normal distribution.

Large enough is usually taken to mean that one should expect at least 10 successes and at least 10 failures.

We also require the full population to be at least 20 times larger than the sample size.

The threshold comes from requiring a rate of less than 5% normal distribution taking on negative (and thus utterly unreasonable) values.

Normal approximation for one sample proportions

Given that we observe \(x\) successes in \(n\) trials.

Normal approximation hypothesis testing for one sample proportions

We can test the null hypothesis of \(p=p_0\) by computing a z-score for the normal approximation:

Given that we observe \(x\) successes in \(n\) trials. The sample proportion is \(\overline{p} = x/n\)

The z-score is \[ z = \frac{\overline{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \]

This is tested against the standard normal distribution \(\mathcal N(0,1)\).

Here, we use the normal distribution and not the t distribution since under the null hypothesis we know the population variance.

Normal approximation test of a single proportion

The function relevel can be used to reorder so that the correct level is used for testing.

Requirements

Normal approximation test of a single proportion

Command: prop.test with arguments x, n, p, alternative, conf.level for x the number of successes and n the number of trials:

test = prop.test(x, n, p=p.0, alternative="two.sided", conf.level=0.99)
test
## 
##  1-sample proportions test with continuity correction
## 
## data:  x out of n
## X-squared = 2.3666e-31, df = 1, p-value = 1
## alternative hypothesis: true p is not equal to 0.3333333
## 99 percent confidence interval:
##  0.1359357 0.6426790
## sample estimates:
##    p 
## 0.35

Normal approximation test of a single proportion

Command: prop.test with arguments x, n, alternative, conf.level for df$v a categorical variable and "label" the value representing success:

df$v = relevel(df$v, "label")
test = prop.test(~v, data=df, p=p.0, alternative="two.sided", conf.level=0.99)
test
## 
##  1-sample proportions test with continuity correction
## 
## data:  df$v  [with success = label]
## X-squared = 2.3666e-31, df = 1, p-value = 1
## alternative hypothesis: true p is not equal to 0.3333333
## 99 percent confidence interval:
##  0.1359357 0.6426790
## sample estimates:
##    p 
## 0.35

Normal approximation test of a single proportion

Command: prop.test with arguments x, n, alternative, conf.level for df$v a categorical variable and "label" the value representing success:

test = prop.test(df$v == "label", p=p.0, alternative="two.sided", conf.level=0.99)
test
## 
##  1-sample proportions test with continuity correction
## 
## data:  df$v == "label"  [with success = TRUE]
## X-squared = 2.3666e-31, df = 1, p-value = 1
## alternative hypothesis: true p is not equal to 0.3333333
## 99 percent confidence interval:
##  0.1359357 0.6426790
## sample estimates:
##    p 
## 0.35

Normal approximation test of a single proportion

Effect size: Odds ratio or Relative risk

\[ p = \frac{x}{n} \qquad OR = \frac{p/(1-p)}{p_0/(1-p_0)} \qquad RR = \frac{p}{p_0} \]

For the one-sample case, this requires slightly more code. With x the number of successes and n the number of trials:

observed = matrix(c(p.0, (1-p.0), x, n-x), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
  relrisk(observed))
##       OR       RR 
## 1.076923 1.050000

Normal approximation test of a single proportion

Effect size: Odds ratio or Relative risk

\[ p = \frac{s}{s+f} \qquad OR = \frac{p/(1-p)}{p_0/(1-p_0)} \qquad RR = \frac{p}{p_0} \]

For the one-sample case, this requires slightly more code. With s the number of successes and f the number of failures:

observed = matrix(c(p.0, (1-p.0), s, f), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
  relrisk(observed))
##       OR       RR 
## 1.076923 1.050000

Normal approximation test of a single proportion

Effect size: Odds ratio or Relative risk

\[ p = \frac{\text{sum(df\$v == "label")}}{\text{nrow(df)}} \qquad OR = \frac{p/(1-p)}{p_0/(1-p_0)} \qquad RR = \frac{p}{p_0} \]

For the one-sample case, this requires slightly more code. With df$v a categorical variable and "label" the value representing success:

observed = matrix(c(p.0, (1-p.0), table(df$v != "label")), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
  relrisk(observed))
##       OR       RR 
## 1.076923 1.050000

Sample size selection

From the formula for the margin of error, we can derive a required sample size to reach a particular margin of error \(m\) under a particular confidence level \(\alpha\) with threshold value \(z^*\).

\[ m = z^*\cdot SE_p = z^*\sqrt{\frac{p(1-p)}{n}} \qquad\text{so}\qquad n = \left(\frac{z^*}{m}\right)^2p(1-p) \]

The product \(p(1-p)\) can never be larger than \(1/4\), which leads to the simplified formula, that might overestimate the needed sample size for very small or very large population probabilities.

\[ n = \frac{1}{4}\left(\frac{z^*}{m}\right)^2 \]

For a power analysis, the function power.prop.test allows you to calculate sample sizes for hypothesis testing.

One sample median test

The median of a variable is a value \(M\) such that 50% of \(x\) is less than or equal to \(M\).

This means that we can build a test for medians using a proportions test: if the proportion of \(x\geq M\) is significantly different from 0.5, this gives evidence against \(M\) being the median. If the proportion of \(x\geq M\) is larger than 0.5, it means the true median is larger, if the proportion is smaller, then so is the true median.

Null hypothesis The median of x is equal to M.0.

Alternative hypothesis The median of x is [greater / not equal / less] than M.0.

Requirements The same as the chosen option between binom.test and prop.test.

One sample median test

Using binom.test, with values in x and null hypothesis M.0:

test = binom.test(x >= M.0)

Using prop.test, with values in x and null hypothesis M.0:

test = prop.test(x >= M.0)

Takes alternative and conf.level as options.

Doing this with x >= M.0 means that for an upper-tail median test, an upper-tail proportions test can be used - and for a lower-tail median test, a lower-tail proportions test can be used.

Two sample proportions testing

A study compared instagram use among young women and young men. They surveyed 1069 participants, and (among other questions) asked if they are using instagram.

No Yes
1Women 209 328
2Men 298 234

Two sample proportions testing

A study compared instagram use among young women and young men. They surveyed 1069 participants, and (among other questions) asked if they are using instagram.

Sex n X p
1Women 537 328 0.611
2Men 532 234 0.440
Total 1069 562 0.526

Two sample proportions testing

A study compared instagram use among young women and young men. They surveyed 1069 participants, and (among other questions) asked if they are using instagram.

Exact binomial testing no longer makes any sense: we do not know a sampling distribution for a difference of counts.

The normal approximation still works. Since \(p_m\) and \(p_w\) are both approximately normally distributed, so is \(p_m-p_w\).

With this we can get confidence intervals and hypothesis tests for the difference in proportions.

Two sample proportions testing

Given: two success/trial pairs \(x_1, n_1\) and \(x_2, n_2\).

Two sample proportions testing

Given: two success/trial pairs \(x_1, n_1\) and \(x_2, n_2\).

Under the null hypothesis, \(p_1 = p_2 = p\). We can estimate this common value as

We test this against the standard normal distribution \(\mathcal N(0,1)\).

Normal approximation test of two proportions

The two-way table can be constructed through:

test.table = tally(k ~ (v == "label"), data=df)

Requirements

x.1 > 5, x.2 > 5, n.1 - x.1 > 5, n.2 - x.2 > 5 and population sizes at least 20*n.1 and 20*n.2 respectively.

Normal approximation test of two proportions

Command: prop.test with arguments x, n, alternative, conf.level; using x.1, x.2, n.1, n.2:

test = prop.test(c(x.1,x.2), c(n.1,n.2), alternative="two.sided", conf.level=0.99)
test
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  c(x.1, x.2) out of c(n.1, n.2)
## X-squared = 0.66143, df = 1, p-value = 0.4161
## alternative hypothesis: two.sided
## 99 percent confidence interval:
##  -0.172063  0.372063
## sample estimates:
## prop 1 prop 2 
##   0.46   0.36

Normal approximation test of two proportions

Command: prop.test with arguments x, n, alternative, conf.level; using a two-way table test.table. This function is broken in the package mosaic.

test = stats::prop.test(test.table, alternative="two.sided", conf.level=0.99)
test
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  test.table
## X-squared = 0.66143, df = 1, p-value = 0.4161
## alternative hypothesis: two.sided
## 99 percent confidence interval:
##  -0.172063  0.372063
## sample estimates:
## prop 1 prop 2 
##   0.46   0.36

Normal approximation test of two proportions

Command: prop.test with arguments x, n, alternative, conf.level; using a data frame with columns v for categorical values and k for subpopulations:

test = prop.test((v == "label") ~ k, data = df, alternative="two.sided", conf.level=0.99)
test
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  tally((v == "label") ~ k)
## X-squared = 0.66143, df = 1, p-value = 0.4161
## alternative hypothesis: two.sided
## 99 percent confidence interval:
##  -0.172063  0.372063
## sample estimates:
## prop 1 prop 2 
##   0.46   0.36

Normal approximation test of two proportions

Effect size: Odds ratio or Relative risk

\[ p_1 = \frac{x_1}{n_1} \qquad p_2 = \frac{x_2}{n_2} \qquad OR = \frac{p_1/(1-p_1)}{p_2/(1-p_2)} \qquad RR = \frac{p_1}{p_2} \]

With x.1, x.2, n.1, n.2:

observed = matrix(c(x.1, n.1-x.1, x.2, n.2-x.2), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
  relrisk(observed))
##        OR        RR 
## 0.6603261 0.7826087

Normal approximation test of two proportions

Effect size: Odds ratio or Relative risk

\[ p_1 = \frac{s_1}{s_1+f_1} \qquad p_2 = \frac{s_2}{s_2+f_2} \qquad OR = \frac{p_1/(1-p_1)}{p_2/(1-p_2)} \qquad RR = \frac{p_1}{p_2} \]

With s.1, s.2, f.1, f.2 counts of success and failure:

observed = matrix(c(s.1, f.1, s.2, f.2), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
  relrisk(observed))
##        OR        RR 
## 0.6603261 0.7826087

Normal approximation test of two proportions

Effect size: Odds ratio or Relative risk

\[ p_1 = \frac{x_1}{n_1} \qquad p_2 = \frac{x_2}{n_2} \qquad OR = \frac{p_1/(1-p_1)}{p_2/(1-p_2)} \qquad RR = \frac{p_1}{p_2} \]

With the two-way table test.table:

c(oddsRatio(test.table),
  relrisk(test.table))
##        OR        RR 
## 0.6603261 0.7826087

Back to our example

Recall our instagram data

No Yes
1Women 209 328
2Men 298 234

We clear the 10 successes and 10 failures hurdle comfortably: the test can be used.

Back to our example

stats::prop.test(t(instag.table))
## 
##  2-sample test for equality of proportions with continuity
##  correction
## 
## data:  t(instag.table)
## X-squared = 30.641, df = 1, p-value = 3.104e-08
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.2324113 -0.1103909
## sample estimates:
##    prop 1    prop 2 
## 0.4122288 0.5836299

Back to our example

c(oddsRatio(instag.table), relrisk(instag.table))
##       OR       RR 
## 1.998610 1.439238

Careful! Remember that the two-way table ordered the columns as «Does not use» first, and «Does use» second.

So the probability of non-usage increases with 50% going from Women to Men.

Fixing the order means using relevel cleverly when loading data and creating the two-way table.