Lecture 19

MVJ

12April, 2018

Testing for population means

We will be looking at a family of tests, built to test hypotheses concerning a population mean of some value.

Null hypotheses for these look like

Normal distributions, T distributions and beer

If we knew the population standard deviation \(\sigma\), we could do these tests using the central limit theorem and the normal distribution.

We usually never know the population standard deviation \(\sigma\).

Normal distributions, T distributions and beer

If we knew the population standard deviation \(\sigma\), we could do these tests using the central limit theorem and the normal distribution.

We usually never know the population standard deviation \(\sigma\).

In 1906-1907, William S Gosset worked with the statistician Karl Pearson and studied barley varieties to find the best yielding type. His samples were usually very small: too small for the normal distribution to work.

Together with Pearson, Gosset worked out the distribution of the t-statistic

\[ T = \frac{\overline x-\mu_0}{s_x/\sqrt{n}} \]

which is almost like a \(z\)-score, but uses the sample standard deviation instead of the population standard deviation.

Normal distributions, T distributions and beer

After discovering it, Gosset tried to publish his results – but Guinness had had problems with another engineer publishing trade secrets, and had a blanket ban on research publications for their employees.

Gosset managed to argue that the discovery, while important for the world, contained nothing critical about their brewery process. To avoid complaints from other engineers still under the publication ban, Gosset agreed to publish pseudonomously. The chosen pseudonym was A. Student, and the distribution has become known as Student’s t-distribution.

The \(t\)-distribution is different for each sample size. It has heavier tails than the normal distribution (extreme values more likely), but approaches the normal distribution as the sample size increases.

\[ T = \frac{\overline x - \mu}{s_x/\sqrt{n}}\sim T(n-1) \]

The value \(n-1\) is called the degrees of freedom for the \(t\)-distribution.

The t-distribution

In R, the \(t\)-distribution is available through the commands rt, dt, pt and qt. These take the additional mandatory argument df.

pt(2.5, df=5)
## [1] 0.972755

t confidence intervals

Suppose \(x\) is a simple random sample of size \(n\) from a population with unknown mean \(\mu\). A level \(\alpha\) confidence interval for \(\mu\) is \[ \overline x \pm t^*\frac{s}{\sqrt{n}} \] where \(t^*=\)qt(1-alpha/2, n)

If the population is normal, this confidence interval is exact. If the population is not normal, but the sample large enough, the confidence interval is approximately correct.

Here, the standard error of the sample mean is \(s/\sqrt{n}\) and the margin of error is \(t^*\cdot s/\sqrt{n}\).

One-sample T-test

   
Input Vector x, or variable dataset$x, of size \(N\)
Null hypothesis Population mean is equal to mu.0
Alternative hypothesis Population mean is [less than / not equal / greater than] mu.0
Test statistic T = (mean(x) - mu.0)/( sd(x)/sqrt(N) )

Requirements

Dataset size Requires How to check
\(N\leq 15\) Normal distribution gf_qq
\(15<N\leq40\) No skew, no outliers gf_histogram and gf_boxplot
\(40<N\leq\)hundreds No outliers gf_boxplot
hundreds\(<N\) Few outliers gf_boxplot

One-sample T-test

Command: t.test with arguments mu, alternative, conf.level

test = t.test(~cty, data=mpg, mu=17, alternative="two.sided", conf.level=0.99)
test
## 
##  One Sample t-test
## 
## data:  cty
## t = -0.50689, df = 233, p-value = 0.6127
## alternative hypothesis: true mean is not equal to 17
## 99 percent confidence interval:
##  16.13641 17.58154
## sample estimates:
## mean of x 
##  16.85897

One-sample T-test

Command: t.test with arguments mu, alternative, conf.level

test = t.test(x, mu=17, alternative="two.sided", conf.level=0.99)
test
## 
##  One Sample t-test
## 
## data:  x
## t = 16.544, df = 233, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 17
## 99 percent confidence interval:
##  22.42921 24.45113
## sample estimates:
## mean of x 
##  23.44017

One-sample T-test

Effect size: Cohen’s \(d\) - library effsize, command cohen.d

d = cohen.d(mpg$cty, c(17,17))
d = cohen.d(x, c(17,17))
d
## 
## Cohen's d
## 
## d estimate: 1.083856 (large)
## 95 percent confidence interval:
##       inf       sup 
## -0.318642  2.486354

One-sample T-test

To check validity, check data size first, then use a QQ- or a boxplot to check the distribution

NROW(x)
## [1] 234
gf_boxploth("x" ~ x)

Two-sample testing

Very commonly, we are less interested in seeing how data compares to a fixed value, but more interested in comparing groups.

This is so common that for many functions in R, the one-sample cases is not even considered. This is the case for cohen.d, where we used a hack to get it to work for a one-sample value.

Two-sample testing

The basic null hypothesis for two-sample testing is that population means in the two samples is equal.

If not equal, then that the differences is some particular known value.

The test statistic is similar to the one-sample \(t\)-statistic, but the standard deviation estimation changes a bit: the two samples have different standard deviations that need to be used to estimate a total standard deviation.

Formulas are simplified if we can assume the samples to have the same standard deviation, or if we have samples of equal size. For full generality, the denominator is determined by a complex formula due to Welch.

Two-sample testing: test statistic

The generic form of the test statistic for the two-sample case is

\[ T = \frac{\overline x-\overline y}{\sqrt{\frac{s_x^2}{N_x}+\frac{s_y^2}{N_y}}} \]

This follows approximately a \(t\)-distribution. The Welch complexity comes in the determination of degrees of freedom for the \(t\)-distribution. Ask me if you really want to know how.

You can approximate the true degrees of freedom with \(\min(N_x-1,N_y-1)\).

The test is more robust if \(N_x = N_y = N\). In this case, the formula is

\[ T = \frac{\overline x-\overline y}{\sqrt{s_x^2+s_y^2}{\Big/}\sqrt{N}} \]

Two-sample testing: pooled two-sample test

If we have reason to believe the population standard deviation to be equal, then the variance can be estimated as

\[ s^2_p = \frac{\sum_{x_i}(x_i-\overline x)^2 + \sum_{y_i}(y_i-\overline y)^2}{n_x+n_y-2} \qquad s_p = \sqrt{s^2_p} \]

Using the addition rule for variances, this means that the sample distribution variance for \(\overline x-\overline y\) is \[ s^2_{\overline x-\overline y} = \frac{\sigma^2}{n_x} + \frac{\sigma^2}{n_y} = \sigma^2\left(\frac{1}{n_x}+\frac{1}{n_y}\right) \]

The pooled \(t\)-statistic is

\[ T = \frac{\overline x - \overline y}{s_p\sqrt{\frac{1}{n_x}+\frac{1}{n_y}}} \sim T(n_1+n_2-2) \]

A pooled test is performed by providing the argument var.equal=TRUE to the function t.test.

Two-sample T-test

Input Vectors x, y, of sizes \(N_x\) and \(N_y\), or variables key, value in a data frame dataset where dataset$key only has two values
Null hypothesis Difference in population means \(\mu_x-\mu_y\) is equal to mu.0 (usually 0)
Alternative hypothesis Difference in population means \(\mu_x-\mu_y\) is [less than / not equal / greater than] mu.0
Test statistic \(T = {\overline x-\overline y}{\Big/}{\sqrt{\frac{\sigma_x^2}{N_x}+\frac{\sigma_y^2}{N_y}}}\)

Requirements Both x and y need to be appropriate for a one-sample test; use \(N=N_x+N_y\) for the conditions; better if \(N_x\approx N_y\):

Dataset size Requires How to check
\(N\leq 15\) Normal distribution gf_qq
\(15<N\leq40\) No skew, no outliers gf_histogram and gf_boxplot
\(40<N\leq\)hundreds No outliers gf_boxplot
hundreds\(<N\) Few outliers gf_boxplot

Two-sample T-test

Command: t.test with arguments mu, alternative, conf.level

test = t.test(mpg~type, data=mpg.cty.hwy, mu=5, 
              alternative="two.sided", conf.level=0.99)
test
## 
##  Welch Two Sample t-test
## 
## data:  mpg by type
## t = 3.3047, df = 421.79, p-value = 0.001032
## alternative hypothesis: true difference in means is not equal to 5
## 99 percent confidence interval:
##  5.343134 7.819259
## sample estimates:
## mean in group hwy mean in group cty 
##          23.44017          16.85897

Two-sample T-test

Command: t.test with arguments mu, alternative, conf.level

test = t.test(x, y, mu=5, alternative="two.sided", conf.level=0.99)
test
## 
##  Welch Two Sample t-test
## 
## data:  x and y
## t = 3.3047, df = 421.79, p-value = 0.001032
## alternative hypothesis: true difference in means is not equal to 5
## 99 percent confidence interval:
##  5.343134 7.819259
## sample estimates:
## mean of x mean of y 
##  23.44017  16.85897

Two-sample T-test

Effect size: Cohen’s \(d\) - library effsize, command cohen.d

d = cohen.d(mpg$cty, mpg$hwy)
d = cohen.d(mpg~type, data=mpg.cty.hwy)
d
## 
## Cohen's d
## 
## d estimate: 1.271615 (large)
## 95 percent confidence interval:
##      inf      sup 
## 1.072429 1.470801

Two-sample T-test

To check validity, check data size first, then use a QQ- or a boxplot to check the distribution

favstats(mpg~type, data=mpg.cty.hwy)
type min Q1 median Q3 max mean sd n missing
hwy 12 18 24 27 44 23.44017 5.954643 234 0
cty 9 14 17 19 35 16.85897 4.255946 234 0
gf_boxploth(type ~ mpg, data=mpg.cty.hwy)

Paired data

Crucial in the assumptions behind the two-sample \(t\)-tests is that the two samples be independent of each other.

Paired variables are not independent.

Instead, for paired data we need to do a different test: the paired two-sample t-test.

For the paired two-sample t-test, instead of comparing \(\overline x\) with \(\overline y\), we compare \(\overline{x-y}\): the mean of the differences instead of the difference of means.

This works even if the two samples do not come from nice distributions, as long as their differences do.

Paired two-sample T-test

   
Input Vectors x, y of equal length or variables dataset$x, dataset$y
Null hypothesis Population mean difference \(\mu_{x-y}\) is equal to mu.0 (usually 0)
Alternative hypothesis Population mean difference \(\mu_{x-y}\) is [less than / not equal / greater than] mu.0
Test statistic T = ( (mean(x-y)) - mu.0)/( df ) where df in full generality is given by a complicated formula

Requirements The differences x-y need to be appropriate for a one-sample test:

Dataset size Requires How to check
\(N\leq 15\) Normal distribution gf_qq
\(15<N\leq40\) No skew, no outliers gf_histogram and gf_boxplot
\(40<N\leq\)hundreds No outliers gf_boxplot
hundreds\(<N\) Few outliers gf_boxplot

Paired two-sample T-test

Command: t.test with arguments mu, alternative, conf.level

test = t.test(x, y, mu=5, alternative="two.sided", conf.level=0.99, paired=TRUE)
test
## 
##  Paired t-test
## 
## data:  x and y
## t = 10.69, df = 233, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 5
## 99 percent confidence interval:
##  6.197035 6.965358
## sample estimates:
## mean of the differences 
##                6.581197

Paired two-sample T-test

Effect size: Cohen’s \(d\) - library effsize, command cohen.d

d = cohen.d(x, y, paired=TRUE)
d
## 
## Cohen's d
## 
## d estimate: 2.908509 (large)
## 95 percent confidence interval:
##      inf      sup 
## 2.647925 3.169092

Paired two-sample T-test

To check validity, check data size first, then use a QQ- or a boxplot to check the distribution

NROW(mpg)
## [1] 234
gf_boxploth("hwy-cty" ~ hwy-cty, data=mpg)