MVJ
12April, 2018
For numeric data, we use the mean as the primary testable sample statistic, and the t-test to test for means and differences in means.
For categorical data, the mean does not exist: for categorical data we can count occurrences but not do much else.
Any test on categorical data has to be a test that counts occurrences and compares the count (or a proportion) to an expected distribution.
Recall the binomial situation criteria:
If these are fulfilled, then the count of successes follows the binomial distribution.
If these were to be fulfilled, we would know the sample distribution of the count \(\hat n\) through the binomial distribution.
To use the binomial distribution to test, the setup would take the shape of:
This method uses the exact sampling distribution, but the confidence intervals tend to be a little bit wider (less exact) than they could be.
Most common - and most straightforward to interpret - effect sizes for counts and proportions are the odds ratio and relative risk.
If the probability of success is \(p\), then the odds is defined as \(p/(1-p)\). To compare two different odds, we can take the ratio of the odds: \[ OR = \frac{p_1/(1-p_1)}{p_2/(1-p_2)} \]
For a concrete example: in Roulette, there are 38 possible outcomes. A bet on red is a simultaneous bet on 18 of these outcomes, for a probability of \(18/38\approx0.47\). This gives us odds of \((18/38) / (20/38) \approx 0.9\) or about 9 to 10.
A bet on a column is a simultaneous bet on 12 of the outcomes for a probability of \(12/38\approx0.32\). This gives us odds of \((12/38) / (26/38) \approx 0.46\) or about 1 to 2.
The odds ratio of a bet on a column over a bet on red is going to be \[ \frac{(18/38) / (20/38)}{(12/38) / (26/38)} \approx 1.95 \] increasing the odds of winning to about the double if we move from a column bet to a red bet.
Most common - and most straightforward to interpret - effect sizes for counts and proportions are the odds ratio and relative risk.
The relative risk is the quotient of probabilities and measures the expected increase / decrease in success between the groups (or between the null hypothesis and the observed values): \[ RR = \frac{p_1}{p_2} \]
Continuing the Roulette example:
A bet on red is a simultaneous bet on 18 of 38 outcomes, for a probability of \(18/38\approx0.47\).
A bet on a column is a simultaneous bet on 12 of the outcomes for a probability of \(12/38\approx0.32\).
The relative risk of a red bet over a column bet is \[ RR = \frac{18/38}{12/38} = \frac{18}{12} = 1.5 \]
We would expect to win 50% more often with a red bet than with a column bet.
Most common - and most straightforward to interpret - effect sizes for counts and proportions are the odds ratio and relative risk.
Suppose we compare proportions between observed and null hypothesis - or between experiment and control.
RR is | OR is | Interpretation |
---|---|---|
=1 | =1 | No effect |
<1 | <1 | Lower rate than hypothesis - lower rate than control |
>1 | >1 | Higher rate than hypothesis - higher rate than control |
library(mosaic)
, also possible: a dataset \(df\) with a variable \(v\) containing observations, or a vector \(x\) containins observationsp.0
.p.0
n.hat
If using the library(mosaic)
extension, the first entry (or level) will be used as success, all others as failure. The function relevel
can be used to reorder so that the correct level is used for testing.
Requirements The data is expected to come from a binomial setting. This is not illustrated, but rather argued from the data collection and description directly.
Command: binom.test
with arguments x
, n
, p
, alternative
, conf.level
for x
the number of successes and n
the number of trials:
test = binom.test(x, n, p=p.0, alternative="two.sided", conf.level=0.99)
test
##
##
##
## data: x out of n
## number of successes = 7, number of trials = 20, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.3333333
## 99 percent confidence interval:
## 0.1138798 0.6565686
## sample estimates:
## probability of success
## 0.35
Command: binom.test
with arguments x
, n
, alternative
, conf.level
for s
the number of successes and f
the number of failures:
test = binom.test(c(s,f), p=p.0, alternative="two.sided", conf.level=0.99)
test
##
##
##
## data: c(s, f)
## number of successes = 7, number of trials = 20, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.3333333
## 99 percent confidence interval:
## 0.1138798 0.6565686
## sample estimates:
## probability of success
## 0.35
Command: binom.test
with arguments x
, n
, alternative
, conf.level
for df$v
a categorical variable and "label"
the value representing success:
df$v = relevel(df$v, "label")
test = binom.test(~v, data=df, p=p.0, alternative="two.sided", conf.level=0.99)
test
##
##
##
## data: df$v [with success = label]
## number of successes = 7, number of trials = 20, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.3333333
## 99 percent confidence interval:
## 0.1138798 0.6565686
## sample estimates:
## probability of success
## 0.35
Command: binom.test
with arguments x
, n
, alternative
, conf.level
for df$v
a categorical variable and "label"
the value representing success:
test = binom.test(df$v == "label", p=p.0, alternative="two.sided", conf.level=0.99)
test
##
##
##
## data: df$v == "label" [with success = TRUE]
## number of successes = 7, number of trials = 20, p-value = 1
## alternative hypothesis: true probability of success is not equal to 0.3333333
## 99 percent confidence interval:
## 0.1138798 0.6565686
## sample estimates:
## probability of success
## 0.35
Effect size: Odds ratio or Relative risk
\[ p = \frac{x}{n} \qquad OR = \frac{p/(1-p)}{p_0/(1-p_0)} \qquad RR = \frac{p}{p_0} \]
For the one-sample case, this requires slightly more code. With x
the number of successes and n
the number of trials:
observed = matrix(c(p.0, (1-p.0), x, n-x), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
relrisk(observed))
## OR RR
## 1.076923 1.050000
Effect size: Odds ratio or Relative risk
\[ p = \frac{s}{s+f} \qquad OR = \frac{p/(1-p)}{p_0/(1-p_0)} \qquad RR = \frac{p}{p_0} \]
For the one-sample case, this requires slightly more code. With s
the number of successes and f
the number of failures:
observed = matrix(c(p.0, (1-p.0), s, f), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
relrisk(observed))
## OR RR
## 1.076923 1.050000
Effect size: Odds ratio or Relative risk
\[ p = \frac{\text{sum(df\$v == "label")}}{\text{nrow(df)}} \qquad OR = \frac{p/(1-p)}{p_0/(1-p_0)} \qquad RR = \frac{p}{p_0} \]
For the one-sample case, this requires slightly more code. With df$v
a categorical variable and "label"
the value representing success:
observed = matrix(c(p.0, (1-p.0), table(df$v != "label")), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
relrisk(observed))
## OR RR
## 1.076923 1.050000
For large enough samples, both the sample count and the sample proportion follow approximately a normal distribution.
Large enough is usually taken to mean that one should expect at least 10 successes and at least 10 failures.
We also require the full population to be at least 20 times larger than the sample size.
The threshold comes from requiring a rate of less than 5% normal distribution taking on negative (and thus utterly unreasonable) values.
Given that we observe \(x\) successes in \(n\) trials.
qnorm(1-alpha/2)
, is \(t^*\cdot SE_{\overline{p}}\)We can test the null hypothesis of \(p=p_0\) by computing a z-score for the normal approximation:
Given that we observe \(x\) successes in \(n\) trials. The sample proportion is \(\overline{p} = x/n\)
The z-score is \[ z = \frac{\overline{p}-p_0}{\sqrt{\frac{p_0(1-p_0)}{n}}} \]
This is tested against the standard normal distribution \(\mathcal N(0,1)\).
Here, we use the normal distribution and not the t distribution since under the null hypothesis we know the population variance.
library(mosaic)
, also possible: a dataset \(df\) with a variable \(v\) containing observations, or a vector \(x\) containins observationsp.0
.p.0
The function relevel
can be used to reorder so that the correct level is used for testing.
Requirements
n*p.0 > 10
and n*(1-p.0) > 10
and population size at least 20*n
Command: prop.test
with arguments x
, n
, p
, alternative
, conf.level
for x
the number of successes and n
the number of trials:
test = prop.test(x, n, p=p.0, alternative="two.sided", conf.level=0.99)
test
##
## 1-sample proportions test with continuity correction
##
## data: x out of n
## X-squared = 2.3666e-31, df = 1, p-value = 1
## alternative hypothesis: true p is not equal to 0.3333333
## 99 percent confidence interval:
## 0.1359357 0.6426790
## sample estimates:
## p
## 0.35
Command: prop.test
with arguments x
, n
, alternative
, conf.level
for df$v
a categorical variable and "label"
the value representing success:
df$v = relevel(df$v, "label")
test = prop.test(~v, data=df, p=p.0, alternative="two.sided", conf.level=0.99)
test
##
## 1-sample proportions test with continuity correction
##
## data: df$v [with success = label]
## X-squared = 2.3666e-31, df = 1, p-value = 1
## alternative hypothesis: true p is not equal to 0.3333333
## 99 percent confidence interval:
## 0.1359357 0.6426790
## sample estimates:
## p
## 0.35
Command: prop.test
with arguments x
, n
, alternative
, conf.level
for df$v
a categorical variable and "label"
the value representing success:
test = prop.test(df$v == "label", p=p.0, alternative="two.sided", conf.level=0.99)
test
##
## 1-sample proportions test with continuity correction
##
## data: df$v == "label" [with success = TRUE]
## X-squared = 2.3666e-31, df = 1, p-value = 1
## alternative hypothesis: true p is not equal to 0.3333333
## 99 percent confidence interval:
## 0.1359357 0.6426790
## sample estimates:
## p
## 0.35
Effect size: Odds ratio or Relative risk
\[ p = \frac{x}{n} \qquad OR = \frac{p/(1-p)}{p_0/(1-p_0)} \qquad RR = \frac{p}{p_0} \]
For the one-sample case, this requires slightly more code. With x
the number of successes and n
the number of trials:
observed = matrix(c(p.0, (1-p.0), x, n-x), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
relrisk(observed))
## OR RR
## 1.076923 1.050000
Effect size: Odds ratio or Relative risk
\[ p = \frac{s}{s+f} \qquad OR = \frac{p/(1-p)}{p_0/(1-p_0)} \qquad RR = \frac{p}{p_0} \]
For the one-sample case, this requires slightly more code. With s
the number of successes and f
the number of failures:
observed = matrix(c(p.0, (1-p.0), s, f), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
relrisk(observed))
## OR RR
## 1.076923 1.050000
Effect size: Odds ratio or Relative risk
\[ p = \frac{\text{sum(df\$v == "label")}}{\text{nrow(df)}} \qquad OR = \frac{p/(1-p)}{p_0/(1-p_0)} \qquad RR = \frac{p}{p_0} \]
For the one-sample case, this requires slightly more code. With df$v
a categorical variable and "label"
the value representing success:
observed = matrix(c(p.0, (1-p.0), table(df$v != "label")), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
relrisk(observed))
## OR RR
## 1.076923 1.050000
From the formula for the margin of error, we can derive a required sample size to reach a particular margin of error \(m\) under a particular confidence level \(\alpha\) with threshold value \(z^*\).
\[ m = z^*\cdot SE_p = z^*\sqrt{\frac{p(1-p)}{n}} \qquad\text{so}\qquad n = \left(\frac{z^*}{m}\right)^2p(1-p) \]
The product \(p(1-p)\) can never be larger than \(1/4\), which leads to the simplified formula, that might overestimate the needed sample size for very small or very large population probabilities.
\[ n = \frac{1}{4}\left(\frac{z^*}{m}\right)^2 \]
For a power analysis, the function power.prop.test
allows you to calculate sample sizes for hypothesis testing.
The median of a variable is a value \(M\) such that 50% of \(x\) is less than or equal to \(M\).
This means that we can build a test for medians using a proportions test: if the proportion of \(x\geq M\) is significantly different from 0.5, this gives evidence against \(M\) being the median. If the proportion of \(x\geq M\) is larger than 0.5, it means the true median is larger, if the proportion is smaller, then so is the true median.
Null hypothesis The median of x
is equal to M.0
.
Alternative hypothesis The median of x
is [greater / not equal / less] than M.0
.
Requirements The same as the chosen option between binom.test
and prop.test
.
Using binom.test
, with values in x
and null hypothesis M.0
:
test = binom.test(x >= M.0)
Using prop.test
, with values in x
and null hypothesis M.0
:
test = prop.test(x >= M.0)
Takes alternative
and conf.level
as options.
Doing this with x >= M.0
means that for an upper-tail median test, an upper-tail proportions test can be used - and for a lower-tail median test, a lower-tail proportions test can be used.
A study compared instagram use among young women and young men. They surveyed 1069 participants, and (among other questions) asked if they are using instagram.
No | Yes | |
---|---|---|
1Women | 209 | 328 |
2Men | 298 | 234 |
A study compared instagram use among young women and young men. They surveyed 1069 participants, and (among other questions) asked if they are using instagram.
Sex | n | X | p |
---|---|---|---|
1Women | 537 | 328 | 0.611 |
2Men | 532 | 234 | 0.440 |
Total | 1069 | 562 | 0.526 |
A study compared instagram use among young women and young men. They surveyed 1069 participants, and (among other questions) asked if they are using instagram.
Exact binomial testing no longer makes any sense: we do not know a sampling distribution for a difference of counts.
The normal approximation still works. Since \(p_m\) and \(p_w\) are both approximately normally distributed, so is \(p_m-p_w\).
With this we can get confidence intervals and hypothesis tests for the difference in proportions.
Given: two success/trial pairs \(x_1, n_1\) and \(x_2, n_2\).
qnorm(1-alpha/2)
is \(z^*\cdot SE_D\)Given: two success/trial pairs \(x_1, n_1\) and \(x_2, n_2\).
Under the null hypothesis, \(p_1 = p_2 = p\). We can estimate this common value as
We test this against the standard normal distribution \(\mathcal N(0,1)\).
p.0
.p.0
The two-way table can be constructed through:
test.table = tally(k ~ (v == "label"), data=df)
Requirements
x.1 > 5
, x.2 > 5
, n.1 - x.1 > 5
, n.2 - x.2 > 5
and population sizes at least 20*n.1
and 20*n.2
respectively.
Command: prop.test
with arguments x
, n
, alternative
, conf.level
; using x.1
, x.2
, n.1
, n.2
:
test = prop.test(c(x.1,x.2), c(n.1,n.2), alternative="two.sided", conf.level=0.99)
test
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: c(x.1, x.2) out of c(n.1, n.2)
## X-squared = 0.66143, df = 1, p-value = 0.4161
## alternative hypothesis: two.sided
## 99 percent confidence interval:
## -0.172063 0.372063
## sample estimates:
## prop 1 prop 2
## 0.46 0.36
Command: prop.test
with arguments x
, n
, alternative
, conf.level
; using a two-way table test.table
. This function is broken in the package mosaic
.
test = stats::prop.test(test.table, alternative="two.sided", conf.level=0.99)
test
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: test.table
## X-squared = 0.66143, df = 1, p-value = 0.4161
## alternative hypothesis: two.sided
## 99 percent confidence interval:
## -0.172063 0.372063
## sample estimates:
## prop 1 prop 2
## 0.46 0.36
Command: prop.test
with arguments x
, n
, alternative
, conf.level
; using a data frame with columns v
for categorical values and k
for subpopulations:
test = prop.test((v == "label") ~ k, data = df, alternative="two.sided", conf.level=0.99)
test
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: tally((v == "label") ~ k)
## X-squared = 0.66143, df = 1, p-value = 0.4161
## alternative hypothesis: two.sided
## 99 percent confidence interval:
## -0.172063 0.372063
## sample estimates:
## prop 1 prop 2
## 0.46 0.36
Effect size: Odds ratio or Relative risk
\[ p_1 = \frac{x_1}{n_1} \qquad p_2 = \frac{x_2}{n_2} \qquad OR = \frac{p_1/(1-p_1)}{p_2/(1-p_2)} \qquad RR = \frac{p_1}{p_2} \]
With x.1
, x.2
, n.1
, n.2
:
observed = matrix(c(x.1, n.1-x.1, x.2, n.2-x.2), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
relrisk(observed))
## OR RR
## 0.6603261 0.7826087
Effect size: Odds ratio or Relative risk
\[ p_1 = \frac{s_1}{s_1+f_1} \qquad p_2 = \frac{s_2}{s_2+f_2} \qquad OR = \frac{p_1/(1-p_1)}{p_2/(1-p_2)} \qquad RR = \frac{p_1}{p_2} \]
With s.1
, s.2
, f.1
, f.2
counts of success and failure:
observed = matrix(c(s.1, f.1, s.2, f.2), nrow=2, byrow=TRUE)
c(oddsRatio(observed),
relrisk(observed))
## OR RR
## 0.6603261 0.7826087
Effect size: Odds ratio or Relative risk
\[ p_1 = \frac{x_1}{n_1} \qquad p_2 = \frac{x_2}{n_2} \qquad OR = \frac{p_1/(1-p_1)}{p_2/(1-p_2)} \qquad RR = \frac{p_1}{p_2} \]
With the two-way table test.table
:
c(oddsRatio(test.table),
relrisk(test.table))
## OR RR
## 0.6603261 0.7826087
Recall our instagram data
No | Yes | |
---|---|---|
1Women | 209 | 328 |
2Men | 298 | 234 |
We clear the 10 successes and 10 failures hurdle comfortably: the test can be used.
stats::prop.test(t(instag.table))
##
## 2-sample test for equality of proportions with continuity
## correction
##
## data: t(instag.table)
## X-squared = 30.641, df = 1, p-value = 3.104e-08
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.2324113 -0.1103909
## sample estimates:
## prop 1 prop 2
## 0.4122288 0.5836299
c(oddsRatio(instag.table), relrisk(instag.table))
## OR RR
## 1.998610 1.439238
Careful! Remember that the two-way table ordered the columns as «Does not use» first, and «Does use» second.
So the probability of non-usage increases with 50% going from Women to Men.
Fixing the order means using relevel
cleverly when loading data and creating the two-way table.