Testing for population means

We will be looking at a family of tests, built to test hypotheses concerning a population mean of some value.

Null hypotheses for these look like

Mean of $X$ is equal to $40$
Mean of $X$ is equal to the mean of $Y$
Mean of $X$ is equal to the mean of $Y$ in paired data
Mean of $X$ is equal between the subpopulations $A$ and $B$

Normal distributions, T distributions and beer

If we knew the population standard deviation $\sigma$, we could do these tests using the central limit theorem and the normal distribution.

We usually never know the population standard deviation $\sigma$.

Normal distributions, T distributions and beer

If we knew the population standard deviation $\sigma$, we could do these tests using the central limit theorem and the normal distribution.

We usually never know the population standard deviation $\sigma$.

In 1906-1907, William S Gosset worked with the statistician Karl Pearson and studied barley varieties to find the best yielding type. His samples were usually very small: too small for the normal distribution to work.

Together with Pearson, Gosset worked out the distribution of the t-statistic

\[ T = \frac{\overline x-\mu_0}{s_x/\sqrt{n}} \]

which is almost like a $z$-score, but uses the sample standard deviation instead of the population standard deviation.

Normal distributions, T distributions and beer

After discovering it, Gosset tried to publish his results – but Guinness had had problems with another engineer publishing trade secrets, and had a blanket ban on research publications for their employees.

Gosset managed to argue that the discovery, while important for the world, contained nothing critical about their brewery process. To avoid complaints from other engineers still under the publication ban, Gosset agreed to publish pseudonomously. The chosen pseudonym was A. Student, and the distribution has become known as Student’s t-distribution.

The $t$-distribution is different for each sample size. It has heavier tails than the normal distribution (extreme values more likely), but approaches the normal distribution as the sample size increases.

\[ T = \frac{\overline x - \mu}{s_x/\sqrt{n}}\sim T(n-1) \]

The value $n-1$ is called the degrees of freedom for the $t$-distribution.

The t-distribution

In R, the $t$-distribution is available through the commands rt, dt, pt and qt. These take the additional mandatory argument df.

pt(2.5, df=5)

## [1] 0.972755

t confidence intervals

Suppose $x$ is a simple random sample of size $n$ from a population with unknown mean $\mu$. A level $\alpha$ confidence interval for $\mu$ is \[ \overline x \pm t^*\frac{s}{\sqrt{n}} \] where $t^*=$qt(1-alpha/2, n)

If the population is normal, this confidence interval is exact. If the population is not normal, but the sample large enough, the confidence interval is approximately correct.

Here, the standard error of the sample mean is $s/\sqrt{n}$ and the margin of error is $t^*\cdot s/\sqrt{n}$.

One-sample T-test


Input	Vector `x`, or variable `dataset$x`, of size $N$
Null hypothesis	Population mean is equal to `mu.0`
Alternative hypothesis	Population mean is [less than / not equal / greater than] `mu.0`
Test statistic	`T = (mean(x) - mu.0)/( sd(x)/sqrt(N) )`

Requirements

Dataset size	Requires	How to check
$N\leq 15$	Normal distribution	`gf_qq`
$15<N\leq40$	No skew, no outliers	`gf_histogram` and `gf_boxplot`
$40<N\leq$hundreds	No outliers	`gf_boxplot`
hundreds$<N$	Few outliers	`gf_boxplot`

One-sample T-test

Command: t.test with arguments mu, alternative, conf.level

test = t.test(~cty, data=mpg, mu=17, alternative="two.sided", conf.level=0.99)
test

## 
##  One Sample t-test
## 
## data:  cty
## t = -0.50689, df = 233, p-value = 0.6127
## alternative hypothesis: true mean is not equal to 17
## 99 percent confidence interval:
##  16.13641 17.58154
## sample estimates:
## mean of x 
##  16.85897

One-sample T-test

Command: t.test with arguments mu, alternative, conf.level

test = t.test(x, mu=17, alternative="two.sided", conf.level=0.99)
test

## 
##  One Sample t-test
## 
## data:  x
## t = 16.544, df = 233, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 17
## 99 percent confidence interval:
##  22.42921 24.45113
## sample estimates:
## mean of x 
##  23.44017

One-sample T-test

Effect size: Cohen’s $d$ - library effsize, command cohen.d

d = cohen.d(mpg$cty, c(17,17))
d = cohen.d(x, c(17,17))
d

## 
## Cohen's d
## 
## d estimate: 1.083856 (large)
## 95 percent confidence interval:
##       inf       sup 
## -0.318642  2.486354

One-sample T-test

To check validity, check data size first, then use a QQ- or a boxplot to check the distribution

NROW(x)

## [1] 234

gf_boxploth("x" ~ x)

Two-sample testing

Very commonly, we are less interested in seeing how data compares to a fixed value, but more interested in comparing groups.

Treatment vs. Placebo
A/B testing
Gender differences

This is so common that for many functions in R, the one-sample cases is not even considered. This is the case for cohen.d, where we used a hack to get it to work for a one-sample value.

Two-sample testing

The basic null hypothesis for two-sample testing is that population means in the two samples is equal.

If not equal, then that the differences is some particular known value.

The test statistic is similar to the one-sample $t$-statistic, but the standard deviation estimation changes a bit: the two samples have different standard deviations that need to be used to estimate a total standard deviation.

Formulas are simplified if we can assume the samples to have the same standard deviation, or if we have samples of equal size. For full generality, the denominator is determined by a complex formula due to Welch.

Two-sample testing: test statistic

The generic form of the test statistic for the two-sample case is

\[ T = \frac{\overline x-\overline y}{\sqrt{\frac{s_x^2}{N_x}+\frac{s_y^2}{N_y}}} \]

This follows approximately a $t$-distribution. The Welch complexity comes in the determination of degrees of freedom for the $t$-distribution. Ask me if you really want to know how.

You can approximate the true degrees of freedom with $\min(N_x-1,N_y-1)$.

The test is more robust if $N_x = N_y = N$. In this case, the formula is

\[ T = \frac{\overline x-\overline y}{\sqrt{s_x^2+s_y^2}{\Big/}\sqrt{N}} \]

Two-sample testing: pooled two-sample test

If we have reason to believe the population standard deviation to be equal, then the variance can be estimated as

\[ s^2_p = \frac{\sum_{x_i}(x_i-\overline x)^2 + \sum_{y_i}(y_i-\overline y)^2}{n_x+n_y-2} \qquad s_p = \sqrt{s^2_p} \]

Using the addition rule for variances, this means that the sample distribution variance for $\overline x-\overline y$ is \[ s^2_{\overline x-\overline y} = \frac{\sigma^2}{n_x} + \frac{\sigma^2}{n_y} = \sigma^2\left(\frac{1}{n_x}+\frac{1}{n_y}\right) \]

The pooled $t$-statistic is

\[ T = \frac{\overline x - \overline y}{s_p\sqrt{\frac{1}{n_x}+\frac{1}{n_y}}} \sim T(n_1+n_2-2) \]

A pooled test is performed by providing the argument var.equal=TRUE to the function t.test.

Two-sample T-test

Input	Vectors `x, y`, of sizes $N_x$ and $N_y$, or variables `key`, `value` in a data frame `dataset` where `dataset$key` only has two values
Null hypothesis	Difference in population means $\mu_x-\mu_y$ is equal to `mu.0` (usually 0)
Alternative hypothesis	Difference in population means $\mu_x-\mu_y$ is [less than / not equal / greater than] `mu.0`
Test statistic	$T = {\overline x-\overline y}{\Big/}{\sqrt{\frac{\sigma_x^2}{N_x}+\frac{\sigma_y^2}{N_y}}}$

Requirements Both x and y need to be appropriate for a one-sample test; use $N=N_x+N_y$ for the conditions; better if $N_x\approx N_y$:

Dataset size	Requires	How to check
$N\leq 15$	Normal distribution	`gf_qq`
$15<N\leq40$	No skew, no outliers	`gf_histogram` and `gf_boxplot`
$40<N\leq$hundreds	No outliers	`gf_boxplot`
hundreds$<N$	Few outliers	`gf_boxplot`

Two-sample T-test

Command: t.test with arguments mu, alternative, conf.level

test = t.test(mpg~type, data=mpg.cty.hwy, mu=5, 
              alternative="two.sided", conf.level=0.99)
test

## 
##  Welch Two Sample t-test
## 
## data:  mpg by type
## t = 3.3047, df = 421.79, p-value = 0.001032
## alternative hypothesis: true difference in means is not equal to 5
## 99 percent confidence interval:
##  5.343134 7.819259
## sample estimates:
## mean in group hwy mean in group cty 
##          23.44017          16.85897

Two-sample T-test

Command: t.test with arguments mu, alternative, conf.level

test = t.test(x, y, mu=5, alternative="two.sided", conf.level=0.99)
test

## 
##  Welch Two Sample t-test
## 
## data:  x and y
## t = 3.3047, df = 421.79, p-value = 0.001032
## alternative hypothesis: true difference in means is not equal to 5
## 99 percent confidence interval:
##  5.343134 7.819259
## sample estimates:
## mean of x mean of y 
##  23.44017  16.85897

Two-sample T-test

Effect size: Cohen’s $d$ - library effsize, command cohen.d

d = cohen.d(mpg$cty, mpg$hwy)
d = cohen.d(mpg~type, data=mpg.cty.hwy)
d

## 
## Cohen's d
## 
## d estimate: 1.271615 (large)
## 95 percent confidence interval:
##      inf      sup 
## 1.072429 1.470801

Two-sample T-test

To check validity, check data size first, then use a QQ- or a boxplot to check the distribution

favstats(mpg~type, data=mpg.cty.hwy)

type	min	Q1	median	Q3	max	mean	sd	n	missing
hwy	12	18	24	27	44	23.44017	5.954643	234	0
cty	9	14	17	19	35	16.85897	4.255946	234	0

gf_boxploth(type ~ mpg, data=mpg.cty.hwy)

Paired data

Crucial in the assumptions behind the two-sample $t$-tests is that the two samples be independent of each other.

Paired variables are not independent.

Instead, for paired data we need to do a different test: the paired two-sample t-test.

For the paired two-sample t-test, instead of comparing $\overline x$ with $\overline y$, we compare $\overline{x-y}$: the mean of the differences instead of the difference of means.

This works even if the two samples do not come from nice distributions, as long as their differences do.

Paired two-sample T-test


Input	Vectors `x, y` of equal length or variables `dataset$x`, `dataset$y`
Null hypothesis	Population mean difference $\mu_{x-y}$ is equal to `mu.0` (usually 0)
Alternative hypothesis	Population mean difference $\mu_{x-y}$ is [less than / not equal / greater than] `mu.0`
Test statistic	`T = ( (mean(x-y)) - mu.0)/( df )` where `df` in full generality is given by a complicated formula

Requirements The differences x-y need to be appropriate for a one-sample test:

Dataset size	Requires	How to check
$N\leq 15$	Normal distribution	`gf_qq`
$15<N\leq40$	No skew, no outliers	`gf_histogram` and `gf_boxplot`
$40<N\leq$hundreds	No outliers	`gf_boxplot`
hundreds$<N$	Few outliers	`gf_boxplot`

Paired two-sample T-test

Command: t.test with arguments mu, alternative, conf.level

test = t.test(x, y, mu=5, alternative="two.sided", conf.level=0.99, paired=TRUE)
test

## 
##  Paired t-test
## 
## data:  x and y
## t = 10.69, df = 233, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 5
## 99 percent confidence interval:
##  6.197035 6.965358
## sample estimates:
## mean of the differences 
##                6.581197

Paired two-sample T-test

Effect size: Cohen’s $d$ - library effsize, command cohen.d

d = cohen.d(x, y, paired=TRUE)
d

## 
## Cohen's d
## 
## d estimate: 2.908509 (large)
## 95 percent confidence interval:
##      inf      sup 
## 2.647925 3.169092

Paired two-sample T-test

To check validity, check data size first, then use a QQ- or a boxplot to check the distribution

NROW(mpg)

## [1] 234

gf_boxploth("hwy-cty" ~ hwy-cty, data=mpg)

Dataset size	Requires	How to check
\(N\leq 15\)	Normal distribution	`gf_qq`
\(15<N\leq40\)	No skew, no outliers	`gf_histogram` and `gf_boxplot`
\(40<N\leq\)hundreds	No outliers	`gf_boxplot`
hundreds\(<N\)	Few outliers	`gf_boxplot`

Input	Vectors `x, y`, of sizes \(N_x\) and \(N_y\), or variables `key`, `value` in a data frame `dataset` where `dataset$key` only has two values
Null hypothesis	Difference in population means \(\mu_x-\mu_y\) is equal to `mu.0` (usually 0)
Alternative hypothesis	Difference in population means \(\mu_x-\mu_y\) is [less than / not equal / greater than] `mu.0`
Test statistic	\(T = {\overline x-\overline y}{\Big/}{\sqrt{\frac{\sigma_x^2}{N_x}+\frac{\sigma_y^2}{N_y}}}\)

Lecture 19

Testing for population means

Normal distributions, T distributions and beer

Normal distributions, T distributions and beer

Normal distributions, T distributions and beer

The t-distribution

t confidence intervals

One-sample T-test

One-sample T-test

One-sample T-test

One-sample T-test

One-sample T-test

Two-sample testing

Two-sample testing

Two-sample testing: test statistic

Two-sample testing: pooled two-sample test

Two-sample T-test

Two-sample T-test

Two-sample T-test

Two-sample T-test

Two-sample T-test

Paired data

Paired two-sample T-test

Paired two-sample T-test

Paired two-sample T-test

Paired two-sample T-test