Statistical Tests

These are the tests we discuss during this course. Each test has specific conditions to check before using, and we give information both on how to calculate test statistics and results by hand as well as the relevant R functions to do the job.

Means testing

One sample tests

One sample means test

Test name: One sample t-test

Input Name
Data x numeric
Size N or length(x)
Significance level \(\alpha\) (alpha)
Confidence level \(1-\alpha\)

Requirements

Dataset Size Range Conditions How to check
N ≤ 15 Normal distribution geom_qq
15 < N ≤ 40 Not skew, no outliers geom_histogram and geom_boxplot
40 < N ≤ hundreds No outliers geom_boxplot
hundreds < N Relatively few or small outliers geom_boxplot

Null hypothesis True population mean is \(\mu_0\) (mu.0)

Alternative hypothesis True population mean is [< or ≠ or >] to \(\mu_0\).

Test statistic T = (mean(x) - mu.0) / ( sd(x)/sqrt(N) ) or, writing \(s_x\) for the sample standard deviation, \(\overline{x}\) for the sample mean, \(T = \frac{\overline{x} - \mu_0}{s_x/\sqrt{N}}\)

p-values

Alternative p-value
> pt(T, df=N-1, lower.tail=FALSE)
2*pt(abs(T), df=N-1, lower.tail=FALSE)
< pt(T, df=N-1)

Confidence intervals

Use \(t_\alpha\)=qt(1-alpha, df=N-1) and \(t_{\alpha/2}\)=qt(1-alpha/2, df=N-1)

Alternative From To
> \(\overline{x}-t_{\alpha}\cdot s_x/\sqrt{N}\)
\(\overline{x}-t_{\alpha/2}\cdot s_x/\sqrt{N}\) \(\overline{x}+t_{\alpha/2}\cdot s_x/\sqrt{N}\)
< -∞ \(\overline{x}+t_{\alpha}\cdot s_x/\sqrt{N}\)

R command

t.test(x)

Extra parameter Values Default value
mu \(\mu_0\) 0
alternative two.sided or greater or less two.sided
conf.level \(1-\alpha\) 0.95

Example: upper tail (>) means test of x vs. \(\mu_0 = 10\) at a significance level of 1%:

test = t.test(x, mu=10, conf.level=0.99, alternative='greater')
test

Effect size measure: Cohen's d:

\(d = \frac{\overline{x}-\mu_0}{s_x}\)

d = (mean(x) - mu.0) / sd(x)  # or
d = test$statistic / sqrt(test$parameter + 1)

Two sample means test

Two sample t-test

Input Name
Data x, y both numeric; or data$v numeric and data$w categorical with two labels
Size Nx, Ny or length(x), length(y)
Significance level alpha
Confidence level 1-alpha

Requirements

Both x and y need to be appropriate for a one-sample t-test, ie:

Dataset Size Range Conditions How to check
N ≤ 15 Normal distribution geom_qq
15 < N ≤ 40 Not skew, no outliers geom_histogram and geom_boxplot
40 < N ≤ hundreds No outliers geom_boxplot
hundreds < N Relatively few or small outliers geom_boxplot

Null hypothesis Population means are equal: \(\mu_x = \mu_y\)

Alternative hypothesis \(\mu_x\) is [< or ≠ or >] to \(\mu_y\)

Test statistic Different depending on what assumptions you can make on your data. We write \(\overline{x}\) and \(\overline{y}\) for the sample means, \(s_x\) and \(s_y\) for sample standard deviations.

Each case, it is

\(T = \frac{\overline{x} - \overline{y}}{SE}\)

for different SE.

  1. No assumptions: \(SE = \sqrt{\frac{s_x^2}{N_x}+\frac{s_y^2}{N_y}}\)
  2. Equal variance: \(SE = \sqrt{\frac{\sum(x-\overline{x})^2 + \sum(y-\overline{y})^2}{N_x+N_y-2}}\cdot\sqrt{\frac{1}{N_x}+\frac{1}{N_y}}\)
  3. Equal variance and equal size N: \(SE = \sqrt{\frac{\sum(x-\overline{x})^2+\sum(y-\overline{y})^2}{2N-2}}\cdot\sqrt{\frac{2}{N}}\)

p-values

The degrees of freedom T.df is given by

  1. No assumptions: the Welch–Satterthwaite equation.
  2. Equal variance: \(N_x + N_y - 2\)
  3. Equal variance and equal sample size: \(2N-2\)
Alternative p-value
> pt(T, df=T.df, lower.tail=FALSE)
2*pt(abs(T), df=T.df, lower.tail=FALSE)
< pt(T, df=T.df)

Confidence intervals

With T.df as above, we write \(t_\alpha\)=qt(1-alpha, df=T.df) and \(t_{\alpha/2}\)=qt(1-alpha/2, df=T.df)

Alternative From To
> \(\overline{x}-\overline{y}-t_{\alpha}\cdot SE\)
\(\overline{x}-\overline{y}-t_{\alpha/2}\cdot SE\) \(\overline{x}-\overline{y}+t_{\alpha/2}\cdot SE\)
< -∞ \(\overline{x}-\overline{y}+t_{\alpha}\cdot SE\)

R command

t.test(x, y)

Extra parameter Values Default value
var.equal TRUE or FALSE FALSE
mu mu.0 0
alternative two.sided or greater or less two.sided
conf.level 1-alpha 0.95

Example: upper tail (>) means test of x vs. y at a significance level of 1%:

test = t.test(x, y, conf.level=0.99, alternative='greater')
test

Example: two-tailed (≠) means test of data$v for the subpopulations marked by labels 'a' and 'b' in the variable data$w at a significance level of 10%.

  1. If 'a' and 'b' are the only labels in data$w:

test = t.test(v ~ w, data, conf.level=0.9) 2. If data$w has more labels:

test = t.test(v ~ w, data, w %in% c('a','b'), conf.level=0.9)

Effect size measure: Cohen's d:

\(d = \frac{\overline{x}-\overline{y}}{\sqrt{\frac{\sum(x-\overline{x})^2 + \sum(y-\overline{y})^2}{N_x+N_y-2}}}\)

library(effsize)
cohen.d(x, y)
cohen.d(v ~ w, data)

Paired two sample t-test

The observations in x and in y are naturally paired with each other.

Input Name
Data x, y both numeric; or data$v numeric and data$w categorical with two labels
Size N or length(x)
Significance level alpha
Confidence level 1-alpha

Requirements

The difference x-y needs to fulfill the requirements of a one-sample test, ie:

Dataset Size Range Conditions How to check
N ≤ 15 Normal distribution geom_qq
15 < N ≤ 40 Not skew, no outliers geom_histogram and geom_boxplot
40 < N ≤ hundreds No outliers geom_boxplot
hundreds < N Relatively few or small outliers geom_boxplot

Null hypothesis Population mean of differences x-y: \(\mu_\Delta = 0\)

Alternative hypothesis \(\mu_\Delta\) is [< or ≠ or >] 0

Test statistic Write \(\overline{x-y}\) for the sample mean of the differences, \(s_{\Delta}\) for the sample standard deviation of the differences.

\(T = \frac{\overline{x-y}}{s_\Delta/\sqrt{N}}\)

p-values

Alternative p-value
> pt(T, df=N-1, lower.tail=FALSE)
2*pt(abs(T), df=N-1, lower.tail=FALSE)
< pt(T, df=N-1)

Confidence intervals

Use \(t_\alpha\)=qt(1-alpha, df=N-1) and \(t_{\alpha/2}\)=qt(1-alpha/2, df=N-1)

Alternative From To
> \(\overline{x-y}-t_{\alpha}\cdot s_\Delta/\sqrt{N}\)
\(\overline{x-y}-t_{\alpha/2}\cdot s_\Delta/\sqrt{N}\) \(\overline{x-y}+t_{\alpha/2}\cdot s_\Delta/\sqrt{N}\)
< -∞ \(\overline{x-y}+t_{\alpha}\cdot s_\Delta/\sqrt{N}\)

R command

t.test(x, y, paired=TRUE)

Extra parameter Values Default value
mu mu.0 0
alternative two.sided or greater or less two.sided
conf.level 1-alpha 0.95

Example: upper tail (>) paired means test of x vs. y at a significance level of 1%:

test = t.test(x, y, paired=TRUE, conf.level=0.99, alternative='greater')
test

Example: two-tailed (≠) means test of data$v for the subpopulations marked by labels 'a' and 'b' in the variable data$w at a significance level of 10%.

  1. If 'a' and 'b' are the only labels in data$w:

test = t.test(v ~ w, data, paired=TRUE, conf.level=0.9) 2. If data$w has more labels:

test = t.test(v ~ w, data, w %in% c('a','b'), paired=TRUE, conf.level=0.9)

Effect size measure: Cohen's d:

library(effsize)
cohen.d(x, y, paired=TRUE)
cohen.d(v ~ w, data, paired=TRUE)

Many sample means test (ANOVA)

One-way ANOVA

Input Name
Data data$v numeric and data$w categorical with at least two labels
Size N or length(x)
Significance level alpha
Confidence level 1-alpha

Requirements

Independent samples. In particular, no pairing. Each group approximately normal. All in-group variances approximately equal.

Condition How to check
Independence Argue about data source and data collection.
Normality QQ-plots for each group. eg ggplot(data, aes(sample=v)) + geom_qq() + facet_wrap(~w)
Variance Calculate each group variance, eg tapply(data$v, data$w, var). Try to get maximum / minimum no greater than 2. Also useful, side-by-side boxplots - eg ggplot(data, aes(x=w, y=v)) + geom_boxplot().

Null hypothesis All population means are equal.

Alternative hypothesis Not all population means are equal.

Test statistic F = MSG/MSE

p-values Take from ANOVA table

Confidence intervals Do not exist for ANOVA.

R command

model = aov(v ~ w, data)
summary(model)

or

model = lm(v ~ w, data)
anova(model)

Effect size measure: \(\eta^2 = SSG/SSE\) = [in-group sum of squares] / [total sum of squares]

model.anova = anova(model)
eta.2 = model.anova[1, "Sum Sq"]/sum(model.anova[,"Sum Sq"])

Proportions testing

One-sample tests

Normal approximation one sample proportions test

Input Name
Data n number of occurrences of a label L in a variable V
Size N or length(V)
Significance level alpha
Confidence level 1-alpha

Write \(p_L = n/N\) for the observed proportion.

Requirements \(Np_0 > 10\) and \(N(1-p_0) > 10\). Independent samples (usually OK if sample is less than 10% of total population)

Null hypothesis True probability \(p = p_0\)

Alternative hypothesis \(p\) is [> or ≠ or <] to \(p_0\).

Test statistic z = (p-p_0) / sqrt( p_0(1-p_0)/N )

\(z=\frac{p_L-p_0}{\sqrt{p_0(1-p_0)/N}}\)

p-values

Alternative p-value
> pnorm(z, lower.tail = FALSE)
2*pnorm(abs(z), lower.tail = FALSE)
< pnorm(z)

Confidence intervals

Write \(z_\alpha\)=qnorm(1-\alpha) and \(z_{\alpha/2}\)=qnorm(1-alpha/2). Using the upper bound of \(p_0(1-p_0) < 0.25\):

Alternative From To
> \(p_L-z_\alpha/\sqrt{4N}\)
\(p_L-z_\alpha/\sqrt{4N}\) \(p_L+z_\alpha/\sqrt{4N}\)
< \(p_L+z_\alpha/\sqrt{4N}\)

R command

prop.test(n, N)

Extra parameter Values Default value
p p.0 0.5
alternative two.sided or greater or less two.sided
conf.level 1-alpha 0.95

Example: Two-tailed test of whether occurrences of L in V are not equal to 25%, confidence level 0.95 (default).

n = sum(V == 'L')
N = length(V)
prop.test(n, N, p=0.25)

Example: Two-tailed test of whether occurrences of L in data$V1 are not less to 50%, confidence level 0.95 (default).

n = sum(data$V == 'L')
N = nrow(data)
prop.test(n, N, alternative='less')

Effect size measure: Odds Ratio

\(OR = \frac{p_L/(1-p_L)}{p_0/(1-p_0)}\)

OR = ( p/(1-p) ) / ( p.0/(1-p.0) )


Exact one sample proportions test

Input Name
Data n count of occurrences of label L in categorical variable V
Size N or length(V)
Significance level alpha
Confidence level 1-alpha

Write \(p_L = n/N\).

Requirements Sample independent.

Null hypothesis True probability = \(p_0\).

Alternative hypothesis True probability [> or ≠ or <] to p.0

Test statistic n

p-values

Alternative p-value
> pbinom(n, N, p.0, lower.tail=FALSE)
< pbinom(n, N, p.0)

For ≠, we need to calculate both tails separately and add them together

hi.lo = c(n, N*p.0 - abs(N*p.0 - n))
p.value = pbinom(max(hi.lo), N, p.0, lower.tail=FALSE) + pbinom(min(hi.lo), N, p.0)

Confidence intervals

Complicated. Use binom.test to calculate.

R command

binom.test(n, N)

Extra parameter Values Default value
p p.0 0.5
alternative two.sided or greater or less two.sided
conf.level 1-alpha 0.95

Example: Two-tailed test of whether occurrences of L in V are not equal to 25%, confidence level 0.95 (default).

n = sum(V == 'L')
N = length(V)
binom.test(n, N, p=0.25)

Example: Two-tailed test of whether occurrences of L in data$V1 are not less to 50%, confidence level 0.95 (default).

n = sum(data$V == 'L')
N = nrow(data)
binom.test(n, N, alternative='less')

Effect size measure: Odds Ratio

\(OR = \frac{p_L/(1-p_L)}{p_0/(1-p_0)}\)

OR = ( p/(1-p) ) / ( p.0/(1-p.0) )


One sample median test

Input Name
Data x numeric
Size N or length(x)
Significance level alpha
Confidence level 1-alpha

The median test will test for whether the true median of x differs significantly from a null hypothesis value m.0.

The one-sample median test is a one sample proportions test for whether n = sum(x > m.0) differs from p.0 = 0.5. Deviation up means the true median is higher, deviation down that the true median is lower.

Confidence intervals

Confidence intervals from the one-sample tests tell us likely values for the quantile actually represented by m.0.

R command

prop.test(sum(x > m.0), N)

or

binom.test(sum(x > m.0), N)
Extra parameter Values Default value
alternative two.sided or greater or less two.sided
conf.level 1-alpha 0.95

Example:

Effect size measure: Odds Ratio. Write p = sum(x > m.0) / N, then OR = p/(1-p)/( 0.5/0.5 ) = p/(1-p)


Two sample proportions test

Normal approximation proportion test for two independent samples

Input Name
Data n.L count of L in V1; n.K count of K in different variable V2
Size N1 = length(V1) and N2 = length(V2)
Significance level alpha
Confidence level 1-alpha

Write

\(p_L = n_L/N_1\)

\(p_K = n_K/N_2\)

\(p_{joint} = (n_L+n_K)/(N_1+N_2)\)

or

p.L = n.L / N1
p.K = n.K / N2
p.joint = (n.L+n.K) / (N1+N2)

Requirements Each of \(p_{joint}N_1\) and \(p_{joint}N_2\) needs to be greater than 10. Independent samples. This means V1 and V2 are at most 10% of their respective populations, and that V1 is not equal to V2.

Null hypothesis True proportion of L is equal to the true proportion of K

Alternative hypothesis True proportion of L is [< or ≠ or >] to true proportion of K

Test statistic \(z = \frac{p_L-p_K}{\sqrt{\frac{p_L(1-p_L)}{N_1} + \frac{p_K(1-p_K)}{N_2}}}\)

p-values

Alternative p-value
> pnorm(z, lower.tail=FALSE)
2*pnorm(abs(z), lower.tail=FALSE)
< pnorm(z)

Confidence intervals

Write \(z_\alpha\)=qnorm(1-\alpha) and \(z_{\alpha/2}\)=qnorm(1-alpha/2). Using the upper bound of \(p_0(1-p_0) < 0.25\):

Alternative From To
> \(p_L-z_\alpha/\sqrt{4N}\)
\(p_L-z_\alpha/\sqrt{4N}\) \(p_L+z_\alpha/\sqrt{4N}\)
< \(p_L+z_\alpha/\sqrt{4N}\)
Alternative From To
> \(p_L - p_K - z_\alpha\sqrt{\frac{p_L(1-p_L)}{N_1} + \frac{p_K(1-p_K)}{N_2}}\)
\(p_L - p_K - z_\alpha\sqrt{\frac{p_L(1-p_L)}{N_1} + \frac{p_K(1-p_K)}{N_2}}\) \(p_L + p_K - z_\alpha\sqrt{\frac{p_L(1-p_L)}{N_1} + \frac{p_K(1-p_K)}{N_2}}\)
< -∞ \(p_L + p_K - z_\alpha\sqrt{\frac{p_L(1-p_L)}{N_1} + \frac{p_K(1-p_K)}{N_2}}\)

R command

prop.test(c(n.L, n.K), c(N1, N2))

Extra parameter Values Default value
alternative two.sided or greater or less two.sided
conf.level 1-alpha 0.95

Example:

Effect size measure:

Odds Ratio

\(OR = \frac{p_L/(1-p_L)}{p_K/(1-p_K)}\)

OR = ( p.L/(1-p.L) ) / ( p.K/(1-p.K) )


Exact test for two proportions from the same variable

If K and L are both values of V, independence for two-sample proportions test fails. Instead, we can view K and L as a restricted sample, only counting the part of V that consists entirely of K and L. Then equal proportions turns into a one-sample proportions test against p.0 = 0.5.

Input Name
Data n.K, n.L counts of labels K and L in a categorical variable V.
Size N or length(V)
Significance level alpha
Confidence level 1-alpha

Write p.K = n.K/N and p.L = n.L/N.

Run a one-sample proportions test with n.L = n.L and N = n.L + n.K and p.0 = 0.5. Check requirements, and generate confidence intervals etc. as in the one-sample test you choose.

R command

n.K = sum(V == 'K')
n.L = sum(V == 'L')
prop.test(n.K, n.K + n.L)

or

n.K = sum(V == 'K')
n.L = sum(V == 'L')
binom.test(n.K, n.K + n.L)

Many sample proportions test

Goodness of fit test

Input Name
Data x categorical or n = table(x) vector of counts; p list of values between 0 and 1.
Size N or length(x)
Significance level alpha
Confidence level 1-alpha

p is the null hypothesis list of expected proportions. We require sum(p) to be equal to 1.

We require length(n) == length(p) and call this number k.

Requirements

  • Values and pairs of values are independent
  • min(N*p) >= 5

Null hypothesis The true proportions of labels in x is given by the vector of proportions p.

Alternative hypothesis The true proportions are significantly different from the ones in p.

Test statistic X.2 = sum( (observed-expected)^2/expected = sum( (n - p*N)^2/(p*N) )

p-value pchisq(X.2, df=k-1, lower.tail=FALSE)

Confidence intervals

For the goodness of fit test, there are no sensible confidence intervals.

R command

chisq.test(x)

or

chisq.test(n)

Extra parameter Values Default value
p p All values set to 1/k

Effect size measure: phi as in

test = chisq.test(n)
phi = sqrt(test$statistics / N)


Regressions

Two way table independence test / chi-squared test

Input Name
Data x, y both categorical, or observed a two-way table
Size N or length(x)
Significance level alpha
Confidence level 1-alpha

We write r for the number of labels in x and s for the number of labels in y.

Requirements

  • Values and pairs of values are independent
  • All expected counts are at least 5

Easiest way to check expected counts is to

  1. Run the test using the command below, say test = chisq.test(x,y)
  2. Check the expected counts in test$expected

Null hypothesis The variables x and y are independent.

Alternative hypothesis There is some dependency between the variables x and y.

Test statistic

To explain this code requires a bit of linear algebra. %*% is matrix multiplication, t() turns a row vector into a column vector.

observed = table(x,y)
expected = margin.table(observed, 1) %*% t( margin.table(observed, 2) ) / margin.table(observed)
X.2 = sum( (observed - expected)^2/expected )

p-value pchisq(X.2, df=(r-1)*(s-1), lower.tail=FALSE)

Confidence intervals

For the chi square test, there are no sensible confidence intervals.

R command

chisq.test(x, y)

or

chisq.test(observed)

Effect size measure: phi in

test = chisq.test(n)
phi = sqrt(test$statistics / N)

Linear regression and correlation test

Test name: Linear regression

Input Name
Data data$v, data$w0 (and data$w1 and ...) all numeric and paired
Size N or length(x)
Significance level alpha
Confidence level 1-alpha

Requirements

Condition Meaning How to test
Linearity Data should follow a linear trend. If data is "bendy", other methods are needed. Scatterplot. Residual plot (residuals vs. fitted values).
Normal residuals Residuals should be approximately normal. Outliers may indicate highly influential points that overshadow any information in the regression. QQ-plot of residuals.
Constant variability Variability of points is roughly constant across different values of the explanatory variable(s). Residual plot.
Independent observations Observations need to be independent. For instance, timeseries are not suitable since subsequent observations are highly correlated. Argue about source of data.

Null hypothesis \(\beta_k=0\) for the coefficients \(\beta_k\) of the regression.

Alternative hypothesis

  1. Not all \(\beta_k\) are 0. [F-test, ANOVA table]
  2. One specific \(\beta_k\) is [greater / less / not equal] to a specific value [t-test]

Test statistic F or T depending on which test is interesting. Both reported in summary(model).

p-values

As for t-test and as for ANOVA.

Confidence intervals

As for t-test.

R command

model = lm(v ~ w0, data)

or

model = lm(v ~ w0 + w1 + ... + wk, data)`

Both F- and t-tests are reported by summary(model).

Model info model$
Residuals residuals
Coefficients coefficients
Fitted values fitted.values
Model info summary(model)$
Coefficient t-test estimate coefficients[,"Estimate"]
Coefficient t-test std.err. coefficients[,"Std. Error"]
Coefficient t-test T coefficients[,"t value"]
Coefficient t-test p coefficients[,"Pr(>\|t\|)"]
Correlation square \(r^2\) (use for effect size) r.squared

You can use the estimate and standard error to generate confidence intervals for each coefficient

Model info anova(model)$
F `F value`
F-test p-value `Pr(>\|F\|)`

Effect size measure: Cohen's \(f^2 = r^2 / (1-r^2)\)


Test name: Correlation test

Input Name
Data x, y both numeric; or data$v numeric and data$w categorical with two labels
Size N or length(x)
Significance level alpha
Confidence level 1-alpha

Requirements

  • Values and pairs of values are independent
  • For each x, the corresponding y are normal with equal variances
  • For each y, the corresponding x are normal with equal variances
  • Either x is a linear function of y, or y is linear function of x

If you know which is the response, the linear regression is better. If you don't know, the correlation test lets you test for the presence of an effect without committing to an explanatory and response variable.

Null hypothesis cor(x,y) is 0

Alternative hypothesis cor(x,y) is [> or < or ≠] to 0

Test statistic Write \(r_{xy}\) for the correlation.

\(T = \frac{r_{xy}\sqrt{N-2}}{\sqrt{1-r_{xy}^2}}\)

Alternative p-value
> pt(T, df=N-2, lower.tail=FALSE)
2*pt(abs(T), df=N-2, lower.tail=FALSE)
< pt(T, df=N-2)

Confidence intervals

Write \(r_{xy}\) for the correlation, \(s_\Delta\) for the standard deviation of the difference \(x-y\). Use \(t_\alpha\)=qt(1-alpha, df=N-2) and \(t_{\alpha/2}\)=qt(1-alpha/2, df=N-2)

Alternative From To
> \(r_{xy} - t_\alpha s_\Delta/\sqrt{N}\)
\(r_{xy} - t_\alpha s_\Delta/\sqrt{N}\) \(r_{xy} + t_\alpha s_\Delta/\sqrt{N}\)
< -∞ \(r_{xy} + t_\alpha s_\Delta/\sqrt{N}\)

R command

cor.test(v ~ w, data, subset.condition) where subset.condition is optional but useful to pick out good subsets, eg w %in% c("a", "b")

cor.test(x, y)

Extra parameter Values Default value
alternative two.sided or greater or less two.sided
conf.level 1-alpha 0.95

Effect size measure: \(f^2 = r_{xy}^2 / (1-r_{xy}^2 )\) where r_{xy} = cor(x,y).