These are the tests we discuss during this course. Each test has specific conditions to check before using, and we give information both on how to calculate test statistics and results by hand as well as the relevant R functions to do the job.

Means testing

One sample tests

One sample means test

Test name: One sample t-test

Input	Name
Data	`x` numeric
Size	`N` or `length(x)`
Significance level	$\alpha$ (`alpha`)
Confidence level	$1-\alpha$

Requirements

Dataset Size Range	Conditions	How to check
N ≤ 15	Normal distribution	`geom_qq`
15 < N ≤ 40	Not skew, no outliers	`geom_histogram` and `geom_boxplot`
40 < N ≤ hundreds	No outliers	`geom_boxplot`
hundreds < N	Relatively few or small outliers	`geom_boxplot`

Null hypothesis True population mean is $\mu_0$ (mu.0)

Alternative hypothesis True population mean is [< or ≠ or >] to $\mu_0$.

Test statistic T = (mean(x) - mu.0) / ( sd(x)/sqrt(N) ) or, writing $s_x$ for the sample standard deviation, $\overline{x}$ for the sample mean, $T = \frac{\overline{x} - \mu_0}{s_x/\sqrt{N}}$

p-values

Alternative	p-value
>	`pt(T, df=N-1, lower.tail=FALSE)`
≠	`2*pt(abs(T), df=N-1, lower.tail=FALSE)`
<	`pt(T, df=N-1)`

Confidence intervals

Use $t_\alpha$=qt(1-alpha, df=N-1) and $t_{\alpha/2}$=qt(1-alpha/2, df=N-1)

Alternative	From	To
>	$\overline{x}-t_{\alpha}\cdot s_x/\sqrt{N}$	∞
≠	$\overline{x}-t_{\alpha/2}\cdot s_x/\sqrt{N}$	$\overline{x}+t_{\alpha/2}\cdot s_x/\sqrt{N}$
<	-∞	$\overline{x}+t_{\alpha}\cdot s_x/\sqrt{N}$

R command

t.test(x)

Extra parameter	Values	Default value
`mu`	$\mu_0$	0
`alternative`	`two.sided` or `greater` or `less`	`two.sided`
`conf.level`	$1-\alpha$	0.95

Example: upper tail (>) means test of x vs. $\mu_0 = 10$ at a significance level of 1%:

test = t.test(x, mu=10, conf.level=0.99, alternative='greater')
test

Effect size measure: Cohen's d:

$d = \frac{\overline{x}-\mu_0}{s_x}$

d = (mean(x) - mu.0) / sd(x)  # or
d = test$statistic / sqrt(test$parameter + 1)

Two sample means test

Two sample t-test

Input	Name
Data	`x`, `y` both numeric; or `data$v` numeric and `data$w` categorical with two labels
Size	`Nx`, `Ny` or `length(x)`, `length(y)`
Significance level	`alpha`
Confidence level	`1-alpha`

Requirements

Both x and y need to be appropriate for a one-sample t-test, ie:

Dataset Size Range	Conditions	How to check
N ≤ 15	Normal distribution	`geom_qq`
15 < N ≤ 40	Not skew, no outliers	`geom_histogram` and `geom_boxplot`
40 < N ≤ hundreds	No outliers	`geom_boxplot`
hundreds < N	Relatively few or small outliers	`geom_boxplot`

Null hypothesis Population means are equal: $\mu_x = \mu_y$

Alternative hypothesis $\mu_x$ is [< or ≠ or >] to $\mu_y$

Test statistic Different depending on what assumptions you can make on your data. We write $\overline{x}$ and $\overline{y}$ for the sample means, $s_x$ and $s_y$ for sample standard deviations.

Each case, it is

$T = \frac{\overline{x} - \overline{y}}{SE}$

for different SE.

No assumptions: $SE = \sqrt{\frac{s_x^2}{N_x}+\frac{s_y^2}{N_y}}$
Equal variance: $SE = \sqrt{\frac{\sum(x-\overline{x})^2 + \sum(y-\overline{y})^2}{N_x+N_y-2}}\cdot\sqrt{\frac{1}{N_x}+\frac{1}{N_y}}$
Equal variance and equal size N: $SE = \sqrt{\frac{\sum(x-\overline{x})^2+\sum(y-\overline{y})^2}{2N-2}}\cdot\sqrt{\frac{2}{N}}$

p-values

The degrees of freedom T.df is given by

No assumptions: the Welch–Satterthwaite equation.
Equal variance: $N_x + N_y - 2$
Equal variance and equal sample size: $2N-2$

Alternative	p-value
>	`pt(T, df=T.df, lower.tail=FALSE)`
≠	`2*pt(abs(T), df=T.df, lower.tail=FALSE)`
<	`pt(T, df=T.df)`

Confidence intervals

With T.df as above, we write $t_\alpha$=qt(1-alpha, df=T.df) and $t_{\alpha/2}$=qt(1-alpha/2, df=T.df)

Alternative	From	To
>	$\overline{x}-\overline{y}-t_{\alpha}\cdot SE$	∞
≠	$\overline{x}-\overline{y}-t_{\alpha/2}\cdot SE$	$\overline{x}-\overline{y}+t_{\alpha/2}\cdot SE$
<	-∞	$\overline{x}-\overline{y}+t_{\alpha}\cdot SE$

R command

t.test(x, y)

Extra parameter	Values	Default value
`var.equal`	`TRUE` or `FALSE`	`FALSE`
`mu`	`mu.0`	0
`alternative`	`two.sided` or `greater` or `less`	`two.sided`
`conf.level`	`1-alpha`	0.95

Example: upper tail (>) means test of x vs. y at a significance level of 1%:

test = t.test(x, y, conf.level=0.99, alternative='greater')
test

Example: two-tailed (≠) means test of data$v for the subpopulations marked by labels 'a' and 'b' in the variable data$w at a significance level of 10%.

If 'a' and 'b' are the only labels in data$w:

test = t.test(v ~ w, data, conf.level=0.9) 2. If data$w has more labels:

test = t.test(v ~ w, data, w %in% c('a','b'), conf.level=0.9)

Effect size measure: Cohen's d:

$d = \frac{\overline{x}-\overline{y}}{\sqrt{\frac{\sum(x-\overline{x})^2 + \sum(y-\overline{y})^2}{N_x+N_y-2}}}$

library(effsize)
cohen.d(x, y)
cohen.d(v ~ w, data)

Paired two sample t-test

The observations in x and in y are naturally paired with each other.

Input	Name
Data	`x`, `y` both numeric; or `data$v` numeric and `data$w` categorical with two labels
Size	`N` or `length(x)`
Significance level	`alpha`
Confidence level	`1-alpha`

Requirements

The difference x-y needs to fulfill the requirements of a one-sample test, ie:

Dataset Size Range	Conditions	How to check
N ≤ 15	Normal distribution	`geom_qq`
15 < N ≤ 40	Not skew, no outliers	`geom_histogram` and `geom_boxplot`
40 < N ≤ hundreds	No outliers	`geom_boxplot`
hundreds < N	Relatively few or small outliers	`geom_boxplot`

Null hypothesis Population mean of differences x-y: $\mu_\Delta = 0$

Alternative hypothesis $\mu_\Delta$ is [< or ≠ or >] 0

Test statistic Write $\overline{x-y}$ for the sample mean of the differences, $s_{\Delta}$ for the sample standard deviation of the differences.

$T = \frac{\overline{x-y}}{s_\Delta/\sqrt{N}}$

p-values

Alternative	p-value
>	`pt(T, df=N-1, lower.tail=FALSE)`
≠	`2*pt(abs(T), df=N-1, lower.tail=FALSE)`
<	`pt(T, df=N-1)`

Confidence intervals

Use $t_\alpha$=qt(1-alpha, df=N-1) and $t_{\alpha/2}$=qt(1-alpha/2, df=N-1)

Alternative	From	To
>	$\overline{x-y}-t_{\alpha}\cdot s_\Delta/\sqrt{N}$	∞
≠	$\overline{x-y}-t_{\alpha/2}\cdot s_\Delta/\sqrt{N}$	$\overline{x-y}+t_{\alpha/2}\cdot s_\Delta/\sqrt{N}$
<	-∞	$\overline{x-y}+t_{\alpha}\cdot s_\Delta/\sqrt{N}$

R command

t.test(x, y, paired=TRUE)

Extra parameter	Values	Default value
`mu`	`mu.0`	0
`alternative`	`two.sided` or `greater` or `less`	`two.sided`
`conf.level`	`1-alpha`	0.95

Example: upper tail (>) paired means test of x vs. y at a significance level of 1%:

test = t.test(x, y, paired=TRUE, conf.level=0.99, alternative='greater')
test

Example: two-tailed (≠) means test of data$v for the subpopulations marked by labels 'a' and 'b' in the variable data$w at a significance level of 10%.

If 'a' and 'b' are the only labels in data$w:

test = t.test(v ~ w, data, paired=TRUE, conf.level=0.9) 2. If data$w has more labels:

test = t.test(v ~ w, data, w %in% c('a','b'), paired=TRUE, conf.level=0.9)

Effect size measure: Cohen's d:

library(effsize)
cohen.d(x, y, paired=TRUE)
cohen.d(v ~ w, data, paired=TRUE)

Many sample means test (ANOVA)

One-way ANOVA

Input	Name
Data	`data$v` numeric and `data$w` categorical with at least two labels
Size	`N` or `length(x)`
Significance level	`alpha`
Confidence level	`1-alpha`

Requirements

Independent samples. In particular, no pairing. Each group approximately normal. All in-group variances approximately equal.

Condition	How to check
Independence	Argue about data source and data collection.
Normality	QQ-plots for each group. eg `ggplot(data, aes(sample=v)) + geom_qq() + facet_wrap(~w)`
Variance	Calculate each group variance, eg `tapply(data$v, data$w, var)`. Try to get maximum / minimum no greater than 2. Also useful, side-by-side boxplots - eg `ggplot(data, aes(x=w, y=v)) + geom_boxplot()`.

Null hypothesis All population means are equal.

Alternative hypothesis Not all population means are equal.

Test statistic F = MSG/MSE

p-values Take from ANOVA table

Confidence intervals Do not exist for ANOVA.

R command

model = aov(v ~ w, data)
summary(model)

model = lm(v ~ w, data)
anova(model)

Effect size measure: $\eta^2 = SSG/SSE$ = [in-group sum of squares] / [total sum of squares]

model.anova = anova(model)
eta.2 = model.anova[1, "Sum Sq"]/sum(model.anova[,"Sum Sq"])

Proportions testing

One-sample tests

Normal approximation one sample proportions test

Input	Name
Data	`n` number of occurrences of a label `L` in a variable `V`
Size	`N` or `length(V)`
Significance level	`alpha`
Confidence level	`1-alpha`

Write $p_L = n/N$ for the observed proportion.

Requirements $Np_0 > 10$ and $N(1-p_0) > 10$. Independent samples (usually OK if sample is less than 10% of total population)

Null hypothesis True probability $p = p_0$

Alternative hypothesis $p$ is [> or ≠ or <] to $p_0$.

Test statistic z = (p-p_0) / sqrt( p_0(1-p_0)/N )

$z=\frac{p_L-p_0}{\sqrt{p_0(1-p_0)/N}}$

p-values

Alternative	p-value
>	`pnorm(z, lower.tail = FALSE)`
≠	`2*pnorm(abs(z), lower.tail = FALSE)`
<	`pnorm(z)`

Confidence intervals

Write $z_\alpha$=qnorm(1-\alpha) and $z_{\alpha/2}$=qnorm(1-alpha/2). Using the upper bound of $p_0(1-p_0) < 0.25$:

Alternative	From	To
>	$p_L-z_\alpha/\sqrt{4N}$	∞
≠	$p_L-z_\alpha/\sqrt{4N}$	$p_L+z_\alpha/\sqrt{4N}$
<	∞	$p_L+z_\alpha/\sqrt{4N}$

R command

prop.test(n, N)

Extra parameter	Values	Default value
`p`	`p.0`	0.5
`alternative`	`two.sided` or `greater` or `less`	`two.sided`
`conf.level`	`1-alpha`	0.95

Example: Two-tailed test of whether occurrences of L in V are not equal to 25%, confidence level 0.95 (default).

n = sum(V == 'L')
N = length(V)
prop.test(n, N, p=0.25)

Example: Two-tailed test of whether occurrences of L in data$V1 are not less to 50%, confidence level 0.95 (default).

n = sum(data$V == 'L')
N = nrow(data)
prop.test(n, N, alternative='less')

Effect size measure: Odds Ratio

$OR = \frac{p_L/(1-p_L)}{p_0/(1-p_0)}$

OR = ( p/(1-p) ) / ( p.0/(1-p.0) )

Exact one sample proportions test

Input	Name
Data	`n` count of occurrences of label `L` in categorical variable `V`
Size	`N` or `length(V)`
Significance level	`alpha`
Confidence level	`1-alpha`

Write $p_L = n/N$.

Requirements Sample independent.

Null hypothesis True probability = $p_0$.

Alternative hypothesis True probability [> or ≠ or <] to p.0

Test statistic n

p-values

Alternative	p-value
>	`pbinom(n, N, p.0, lower.tail=FALSE)`
<	`pbinom(n, N, p.0)`

For ≠, we need to calculate both tails separately and add them together

hi.lo = c(n, N*p.0 - abs(N*p.0 - n))
p.value = pbinom(max(hi.lo), N, p.0, lower.tail=FALSE) + pbinom(min(hi.lo), N, p.0)

Confidence intervals

Complicated. Use binom.test to calculate.

R command

binom.test(n, N)

Extra parameter	Values	Default value
`p`	`p.0`	0.5
`alternative`	`two.sided` or `greater` or `less`	`two.sided`
`conf.level`	`1-alpha`	0.95

Example: Two-tailed test of whether occurrences of L in V are not equal to 25%, confidence level 0.95 (default).

n = sum(V == 'L')
N = length(V)
binom.test(n, N, p=0.25)

Example: Two-tailed test of whether occurrences of L in data$V1 are not less to 50%, confidence level 0.95 (default).

n = sum(data$V == 'L')
N = nrow(data)
binom.test(n, N, alternative='less')

Effect size measure: Odds Ratio

$OR = \frac{p_L/(1-p_L)}{p_0/(1-p_0)}$

OR = ( p/(1-p) ) / ( p.0/(1-p.0) )

One sample median test

Input	Name
Data	`x` numeric
Size	`N` or `length(x)`
Significance level	`alpha`
Confidence level	`1-alpha`

The median test will test for whether the true median of x differs significantly from a null hypothesis value m.0.

The one-sample median test is a one sample proportions test for whether n = sum(x > m.0) differs from p.0 = 0.5. Deviation up means the true median is higher, deviation down that the true median is lower.

Confidence intervals

Confidence intervals from the one-sample tests tell us likely values for the quantile actually represented by m.0.

R command

prop.test(sum(x > m.0), N)

binom.test(sum(x > m.0), N)

Extra parameter	Values	Default value
`alternative`	`two.sided` or `greater` or `less`	`two.sided`
`conf.level`	`1-alpha`	0.95

Example:

Effect size measure: Odds Ratio. Write p = sum(x > m.0) / N, then OR = p/(1-p)/( 0.5/0.5 ) = p/(1-p)

Two sample proportions test

Normal approximation proportion test for two independent samples

Input	Name
Data	`n.L` count of `L` in `V1`; `n.K` count of `K` in different variable `V2`
Size	`N1 = length(V1)` and `N2 = length(V2)`
Significance level	`alpha`
Confidence level	`1-alpha`

Write

$p_L = n_L/N_1$

$p_K = n_K/N_2$

$p_{joint} = (n_L+n_K)/(N_1+N_2)$

p.L = n.L / N1
p.K = n.K / N2
p.joint = (n.L+n.K) / (N1+N2)

Requirements Each of $p_{joint}N_1$ and $p_{joint}N_2$ needs to be greater than 10. Independent samples. This means V1 and V2 are at most 10% of their respective populations, and that V1 is not equal to V2.

Null hypothesis True proportion of L is equal to the true proportion of K

Alternative hypothesis True proportion of L is [< or ≠ or >] to true proportion of K

Test statistic $z = \frac{p_L-p_K}{\sqrt{\frac{p_L(1-p_L)}{N_1} + \frac{p_K(1-p_K)}{N_2}}}$

p-values

Alternative	p-value
>	`pnorm(z, lower.tail=FALSE)`
≠	`2*pnorm(abs(z), lower.tail=FALSE)`
<	`pnorm(z)`

Confidence intervals

Write $z_\alpha$=qnorm(1-\alpha) and $z_{\alpha/2}$=qnorm(1-alpha/2). Using the upper bound of $p_0(1-p_0) < 0.25$:

Alternative	From	To
>	$p_L-z_\alpha/\sqrt{4N}$	∞
≠	$p_L-z_\alpha/\sqrt{4N}$	$p_L+z_\alpha/\sqrt{4N}$
<	∞	$p_L+z_\alpha/\sqrt{4N}$

Alternative	From	To
>	$p_L - p_K - z_\alpha\sqrt{\frac{p_L(1-p_L)}{N_1} + \frac{p_K(1-p_K)}{N_2}}$	∞
≠	$p_L - p_K - z_\alpha\sqrt{\frac{p_L(1-p_L)}{N_1} + \frac{p_K(1-p_K)}{N_2}}$	$p_L + p_K - z_\alpha\sqrt{\frac{p_L(1-p_L)}{N_1} + \frac{p_K(1-p_K)}{N_2}}$
<	-∞	$p_L + p_K - z_\alpha\sqrt{\frac{p_L(1-p_L)}{N_1} + \frac{p_K(1-p_K)}{N_2}}$

R command

prop.test(c(n.L, n.K), c(N1, N2))

Extra parameter	Values	Default value
`alternative`	`two.sided` or `greater` or `less`	`two.sided`
`conf.level`	`1-alpha`	0.95

Example:

Effect size measure:

Odds Ratio

$OR = \frac{p_L/(1-p_L)}{p_K/(1-p_K)}$

OR = ( p.L/(1-p.L) ) / ( p.K/(1-p.K) )

Exact test for two proportions from the same variable

If K and L are both values of V, independence for two-sample proportions test fails. Instead, we can view K and L as a restricted sample, only counting the part of V that consists entirely of K and L. Then equal proportions turns into a one-sample proportions test against p.0 = 0.5.

Input	Name
Data	`n.K`, `n.L` counts of labels `K` and `L` in a categorical variable `V`.
Size	`N` or `length(V)`
Significance level	`alpha`
Confidence level	`1-alpha`

Write p.K = n.K/N and p.L = n.L/N.

Run a one-sample proportions test with n.L = n.L and N = n.L + n.K and p.0 = 0.5. Check requirements, and generate confidence intervals etc. as in the one-sample test you choose.

R command

n.K = sum(V == 'K')
n.L = sum(V == 'L')
prop.test(n.K, n.K + n.L)

n.K = sum(V == 'K')
n.L = sum(V == 'L')
binom.test(n.K, n.K + n.L)

Many sample proportions test

Goodness of fit test

Input	Name
Data	`x` categorical or `n = table(x)` vector of counts; `p` list of values between 0 and 1.
Size	`N` or `length(x)`
Significance level	`alpha`
Confidence level	`1-alpha`

p is the null hypothesis list of expected proportions. We require sum(p) to be equal to 1.

We require length(n) == length(p) and call this number k.

Requirements

Values and pairs of values are independent
min(N*p) >= 5

Null hypothesis The true proportions of labels in x is given by the vector of proportions p.

Alternative hypothesis The true proportions are significantly different from the ones in p.

Test statistic X.2 = sum( (observed-expected)^2/expected = sum( (n - p*N)^2/(p*N) )

p-value pchisq(X.2, df=k-1, lower.tail=FALSE)

Confidence intervals

For the goodness of fit test, there are no sensible confidence intervals.

R command

chisq.test(x)

chisq.test(n)

Extra parameter	Values	Default value
`p`	`p`	All values set to `1/k`

Effect size measure: phi as in

test = chisq.test(n)
phi = sqrt(test$statistics / N)

Regressions

Two way table independence test / chi-squared test

Input	Name
Data	`x`, `y` both categorical, or `observed` a two-way table
Size	`N` or `length(x)`
Significance level	`alpha`
Confidence level	`1-alpha`

We write r for the number of labels in x and s for the number of labels in y.

Requirements

Values and pairs of values are independent
All expected counts are at least 5

Easiest way to check expected counts is to

Run the test using the command below, say test = chisq.test(x,y)
Check the expected counts in test$expected

Null hypothesis The variables x and y are independent.

Alternative hypothesis There is some dependency between the variables x and y.

Test statistic

To explain this code requires a bit of linear algebra. %*% is matrix multiplication, t() turns a row vector into a column vector.

observed = table(x,y)
expected = margin.table(observed, 1) %*% t( margin.table(observed, 2) ) / margin.table(observed)
X.2 = sum( (observed - expected)^2/expected )

p-value pchisq(X.2, df=(r-1)*(s-1), lower.tail=FALSE)

Confidence intervals

For the chi square test, there are no sensible confidence intervals.

R command

chisq.test(x, y)

chisq.test(observed)

Effect size measure: phi in

test = chisq.test(n)
phi = sqrt(test$statistics / N)

Linear regression and correlation test

Test name: Linear regression

Input	Name
Data	`data$v`, `data$w0` (and `data$w1` and ...) all numeric and paired
Size	`N` or `length(x)`
Significance level	`alpha`
Confidence level	`1-alpha`

Requirements

Condition	Meaning	How to test
Linearity	Data should follow a linear trend. If data is "bendy", other methods are needed.	Scatterplot. Residual plot (residuals vs. fitted values).
Normal residuals	Residuals should be approximately normal. Outliers may indicate highly influential points that overshadow any information in the regression.	QQ-plot of residuals.
Constant variability	Variability of points is roughly constant across different values of the explanatory variable(s).	Residual plot.
Independent observations	Observations need to be independent. For instance, timeseries are not suitable since subsequent observations are highly correlated.	Argue about source of data.

Null hypothesis $\beta_k=0$ for the coefficients $\beta_k$ of the regression.

Alternative hypothesis

Not all $\beta_k$ are 0. [F-test, ANOVA table]
One specific $\beta_k$ is [greater / less / not equal] to a specific value [t-test]

Test statistic F or T depending on which test is interesting. Both reported in summary(model).

p-values

As for t-test and as for ANOVA.

Confidence intervals

As for t-test.

R command

model = lm(v ~ w0, data)

model = lm(v ~ w0 + w1 + ... + wk, data)`

Both F- and t-tests are reported by summary(model).

Model info	`model$`
Residuals	`residuals`
Coefficients	`coefficients`
Fitted values	`fitted.values`

Model info	`summary(model)$`
Coefficient t-test estimate	`coefficients[,"Estimate"]`
Coefficient t-test std.err.	`coefficients[,"Std. Error"]`
Coefficient t-test T	`coefficients[,"t value"]`
Coefficient t-test p	`coefficients[,"Pr(>\\|t\\|)"]`
Correlation square $r^2$ (use for effect size)	`r.squared`

You can use the estimate and standard error to generate confidence intervals for each coefficient

Model info	`anova(model)$`
F	`F value`
F-test p-value	`Pr(>\\|F\\|)`

Effect size measure: Cohen's $f^2 = r^2 / (1-r^2)$

Test name: Correlation test

Input	Name
Data	`x`, `y` both numeric; or `data$v` numeric and `data$w` categorical with two labels
Size	`N` or `length(x)`
Significance level	`alpha`
Confidence level	`1-alpha`

Requirements

Values and pairs of values are independent
For each x, the corresponding y are normal with equal variances
For each y, the corresponding x are normal with equal variances
Either x is a linear function of y, or y is linear function of x

If you know which is the response, the linear regression is better. If you don't know, the correlation test lets you test for the presence of an effect without committing to an explanatory and response variable.

Null hypothesis cor(x,y) is 0

Alternative hypothesis cor(x,y) is [> or < or ≠] to 0

Test statistic Write $r_{xy}$ for the correlation.

$T = \frac{r_{xy}\sqrt{N-2}}{\sqrt{1-r_{xy}^2}}$

Alternative	p-value
>	`pt(T, df=N-2, lower.tail=FALSE)`
≠	`2*pt(abs(T), df=N-2, lower.tail=FALSE)`
<	`pt(T, df=N-2)`

Confidence intervals

Write $r_{xy}$ for the correlation, $s_\Delta$ for the standard deviation of the difference $x-y$. Use $t_\alpha$=qt(1-alpha, df=N-2) and $t_{\alpha/2}$=qt(1-alpha/2, df=N-2)

Alternative	From	To
>	$r_{xy} - t_\alpha s_\Delta/\sqrt{N}$	∞
≠	$r_{xy} - t_\alpha s_\Delta/\sqrt{N}$	$r_{xy} + t_\alpha s_\Delta/\sqrt{N}$
<	-∞	$r_{xy} + t_\alpha s_\Delta/\sqrt{N}$

R command

cor.test(v ~ w, data, subset.condition) where subset.condition is optional but useful to pick out good subsets, eg w %in% c("a", "b")

cor.test(x, y)

Extra parameter	Values	Default value
`alternative`	`two.sided` or `greater` or `less`	`two.sided`
`conf.level`	`1-alpha`	0.95

Effect size measure: $f^2 = r_{xy}^2 / (1-r_{xy}^2 )$ where r_{xy} = cor(x,y).

Alternative	From	To
>	\(\overline{x}-t_{\alpha}\cdot s_x/\sqrt{N}\)	∞
≠	\(\overline{x}-t_{\alpha/2}\cdot s_x/\sqrt{N}\)	\(\overline{x}+t_{\alpha/2}\cdot s_x/\sqrt{N}\)
<	-∞	\(\overline{x}+t_{\alpha}\cdot s_x/\sqrt{N}\)

Extra parameter	Values	Default value
`mu`	\(\mu_0\)	0
`alternative`	`two.sided` or `greater` or `less`	`two.sided`
`conf.level`	\(1-\alpha\)	0.95

Alternative	From	To
>	\(\overline{x}-\overline{y}-t_{\alpha}\cdot SE\)	∞
≠	\(\overline{x}-\overline{y}-t_{\alpha/2}\cdot SE\)	\(\overline{x}-\overline{y}+t_{\alpha/2}\cdot SE\)
<	-∞	\(\overline{x}-\overline{y}+t_{\alpha}\cdot SE\)

Alternative	From	To
>	\(\overline{x-y}-t_{\alpha}\cdot s_\Delta/\sqrt{N}\)	∞
≠	\(\overline{x-y}-t_{\alpha/2}\cdot s_\Delta/\sqrt{N}\)	\(\overline{x-y}+t_{\alpha/2}\cdot s_\Delta/\sqrt{N}\)
<	-∞	\(\overline{x-y}+t_{\alpha}\cdot s_\Delta/\sqrt{N}\)

Alternative	From	To
>	\(p_L-z_\alpha/\sqrt{4N}\)	∞
≠	\(p_L-z_\alpha/\sqrt{4N}\)	\(p_L+z_\alpha/\sqrt{4N}\)
<	∞	\(p_L+z_\alpha/\sqrt{4N}\)

Alternative	From	To
>	\(p_L - p_K - z_\alpha\sqrt{\frac{p_L(1-p_L)}{N_1} + \frac{p_K(1-p_K)}{N_2}}\)	∞
≠	\(p_L - p_K - z_\alpha\sqrt{\frac{p_L(1-p_L)}{N_1} + \frac{p_K(1-p_K)}{N_2}}\)	\(p_L + p_K - z_\alpha\sqrt{\frac{p_L(1-p_L)}{N_1} + \frac{p_K(1-p_K)}{N_2}}\)
<	-∞	\(p_L + p_K - z_\alpha\sqrt{\frac{p_L(1-p_L)}{N_1} + \frac{p_K(1-p_K)}{N_2}}\)

Alternative	From	To
>	\(r_{xy} - t_\alpha s_\Delta/\sqrt{N}\)	∞
≠	\(r_{xy} - t_\alpha s_\Delta/\sqrt{N}\)	\(r_{xy} + t_\alpha s_\Delta/\sqrt{N}\)
<	-∞	\(r_{xy} + t_\alpha s_\Delta/\sqrt{N}\)

Statistical Tests

Means testing

One sample tests

One sample means test

Two sample means test

Two sample t-test

Paired two sample t-test

Many sample means test (ANOVA)

One-way ANOVA

Proportions testing

One-sample tests

Normal approximation one sample proportions test

Exact one sample proportions test

One sample median test

Two sample proportions test

Normal approximation proportion test for two independent samples

Exact test for two proportions from the same variable

Many sample proportions test

Goodness of fit test

Regressions

Two way table independence test / chi-squared test

Linear regression and correlation test