MVJ
12April, 2018
We will be looking at a family of tests, built to test hypotheses concerning a population mean of some value.
Null hypotheses for these look like
If we knew the population standard deviation \(\sigma\), we could do these tests using the central limit theorem and the normal distribution.
We usually never know the population standard deviation \(\sigma\).
If we knew the population standard deviation \(\sigma\), we could do these tests using the central limit theorem and the normal distribution.
We usually never know the population standard deviation \(\sigma\).
In 1906-1907, William S Gosset worked with the statistician Karl Pearson and studied barley varieties to find the best yielding type. His samples were usually very small: too small for the normal distribution to work.
Together with Pearson, Gosset worked out the distribution of the t-statistic
\[ T = \frac{\overline x-\mu_0}{s_x/\sqrt{n}} \]
which is almost like a \(z\)-score, but uses the sample standard deviation instead of the population standard deviation.
After discovering it, Gosset tried to publish his results – but Guinness had had problems with another engineer publishing trade secrets, and had a blanket ban on research publications for their employees.
Gosset managed to argue that the discovery, while important for the world, contained nothing critical about their brewery process. To avoid complaints from other engineers still under the publication ban, Gosset agreed to publish pseudonomously. The chosen pseudonym was A. Student, and the distribution has become known as Student’s t-distribution.
The \(t\)-distribution is different for each sample size. It has heavier tails than the normal distribution (extreme values more likely), but approaches the normal distribution as the sample size increases.
\[ T = \frac{\overline x - \mu}{s_x/\sqrt{n}}\sim T(n-1) \]
The value \(n-1\) is called the degrees of freedom for the \(t\)-distribution.
In R
, the \(t\)-distribution is available through the commands rt
, dt
, pt
and qt
. These take the additional mandatory argument df
.
pt(2.5, df=5)
## [1] 0.972755
Suppose \(x\) is a simple random sample of size \(n\) from a population with unknown mean \(\mu\). A level \(\alpha\) confidence interval for \(\mu\) is \[
\overline x \pm t^*\frac{s}{\sqrt{n}}
\] where \(t^*=\)qt(1-alpha/2, n)
If the population is normal, this confidence interval is exact. If the population is not normal, but the sample large enough, the confidence interval is approximately correct.
Here, the standard error of the sample mean is \(s/\sqrt{n}\) and the margin of error is \(t^*\cdot s/\sqrt{n}\).
Input | Vector x , or variable dataset$x , of size \(N\) |
Null hypothesis | Population mean is equal to mu.0 |
Alternative hypothesis | Population mean is [less than / not equal / greater than] mu.0 |
Test statistic | T = (mean(x) - mu.0)/( sd(x)/sqrt(N) ) |
Requirements
Dataset size | Requires | How to check |
---|---|---|
\(N\leq 15\) | Normal distribution |
gf_qq
|
\(15<N\leq40\) | No skew, no outliers |
gf_histogram and gf_boxplot
|
\(40<N\leq\)hundreds | No outliers |
gf_boxplot
|
hundreds\(<N\) | Few outliers |
gf_boxplot
|
Command: t.test
with arguments mu
, alternative
, conf.level
test = t.test(~cty, data=mpg, mu=17, alternative="two.sided", conf.level=0.99)
test
##
## One Sample t-test
##
## data: cty
## t = -0.50689, df = 233, p-value = 0.6127
## alternative hypothesis: true mean is not equal to 17
## 99 percent confidence interval:
## 16.13641 17.58154
## sample estimates:
## mean of x
## 16.85897
Command: t.test
with arguments mu
, alternative
, conf.level
test = t.test(x, mu=17, alternative="two.sided", conf.level=0.99)
test
##
## One Sample t-test
##
## data: x
## t = 16.544, df = 233, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 17
## 99 percent confidence interval:
## 22.42921 24.45113
## sample estimates:
## mean of x
## 23.44017
Effect size: Cohen’s \(d\) - library effsize
, command cohen.d
d = cohen.d(mpg$cty, c(17,17))
d = cohen.d(x, c(17,17))
d
##
## Cohen's d
##
## d estimate: 1.083856 (large)
## 95 percent confidence interval:
## inf sup
## -0.318642 2.486354
To check validity, check data size first, then use a QQ- or a boxplot to check the distribution
NROW(x)
## [1] 234
gf_boxploth("x" ~ x)
Very commonly, we are less interested in seeing how data compares to a fixed value, but more interested in comparing groups.
This is so common that for many functions in R
, the one-sample cases is not even considered. This is the case for cohen.d
, where we used a hack to get it to work for a one-sample value.
The basic null hypothesis for two-sample testing is that population means in the two samples is equal.
If not equal, then that the differences is some particular known value.
The test statistic is similar to the one-sample \(t\)-statistic, but the standard deviation estimation changes a bit: the two samples have different standard deviations that need to be used to estimate a total standard deviation.
Formulas are simplified if we can assume the samples to have the same standard deviation, or if we have samples of equal size. For full generality, the denominator is determined by a complex formula due to Welch.
The generic form of the test statistic for the two-sample case is
\[ T = \frac{\overline x-\overline y}{\sqrt{\frac{s_x^2}{N_x}+\frac{s_y^2}{N_y}}} \]
This follows approximately a \(t\)-distribution. The Welch complexity comes in the determination of degrees of freedom for the \(t\)-distribution. Ask me if you really want to know how.
You can approximate the true degrees of freedom with \(\min(N_x-1,N_y-1)\).
The test is more robust if \(N_x = N_y = N\). In this case, the formula is
\[ T = \frac{\overline x-\overline y}{\sqrt{s_x^2+s_y^2}{\Big/}\sqrt{N}} \]
If we have reason to believe the population standard deviation to be equal, then the variance can be estimated as
\[ s^2_p = \frac{\sum_{x_i}(x_i-\overline x)^2 + \sum_{y_i}(y_i-\overline y)^2}{n_x+n_y-2} \qquad s_p = \sqrt{s^2_p} \]
Using the addition rule for variances, this means that the sample distribution variance for \(\overline x-\overline y\) is \[ s^2_{\overline x-\overline y} = \frac{\sigma^2}{n_x} + \frac{\sigma^2}{n_y} = \sigma^2\left(\frac{1}{n_x}+\frac{1}{n_y}\right) \]
The pooled \(t\)-statistic is
\[ T = \frac{\overline x - \overline y}{s_p\sqrt{\frac{1}{n_x}+\frac{1}{n_y}}} \sim T(n_1+n_2-2) \]
A pooled test is performed by providing the argument var.equal=TRUE
to the function t.test
.
Input |
Vectors x, y , of sizes \(N_x\) and \(N_y\), or variables key , value in a data frame dataset where dataset$key only has two values
|
Null hypothesis |
Difference in population means \(\mu_x-\mu_y\) is equal to mu.0 (usually 0)
|
Alternative hypothesis |
Difference in population means \(\mu_x-\mu_y\) is [less than / not equal / greater than] mu.0
|
Test statistic | \(T = {\overline x-\overline y}{\Big/}{\sqrt{\frac{\sigma_x^2}{N_x}+\frac{\sigma_y^2}{N_y}}}\) |
Requirements Both x
and y
need to be appropriate for a one-sample test; use \(N=N_x+N_y\) for the conditions; better if \(N_x\approx N_y\):
Dataset size | Requires | How to check |
---|---|---|
\(N\leq 15\) | Normal distribution |
gf_qq
|
\(15<N\leq40\) | No skew, no outliers |
gf_histogram and gf_boxplot
|
\(40<N\leq\)hundreds | No outliers |
gf_boxplot
|
hundreds\(<N\) | Few outliers |
gf_boxplot
|
Command: t.test
with arguments mu
, alternative
, conf.level
test = t.test(mpg~type, data=mpg.cty.hwy, mu=5,
alternative="two.sided", conf.level=0.99)
test
##
## Welch Two Sample t-test
##
## data: mpg by type
## t = 3.3047, df = 421.79, p-value = 0.001032
## alternative hypothesis: true difference in means is not equal to 5
## 99 percent confidence interval:
## 5.343134 7.819259
## sample estimates:
## mean in group hwy mean in group cty
## 23.44017 16.85897
Command: t.test
with arguments mu
, alternative
, conf.level
test = t.test(x, y, mu=5, alternative="two.sided", conf.level=0.99)
test
##
## Welch Two Sample t-test
##
## data: x and y
## t = 3.3047, df = 421.79, p-value = 0.001032
## alternative hypothesis: true difference in means is not equal to 5
## 99 percent confidence interval:
## 5.343134 7.819259
## sample estimates:
## mean of x mean of y
## 23.44017 16.85897
Effect size: Cohen’s \(d\) - library effsize
, command cohen.d
d = cohen.d(mpg$cty, mpg$hwy)
d = cohen.d(mpg~type, data=mpg.cty.hwy)
d
##
## Cohen's d
##
## d estimate: 1.271615 (large)
## 95 percent confidence interval:
## inf sup
## 1.072429 1.470801
To check validity, check data size first, then use a QQ- or a boxplot to check the distribution
favstats(mpg~type, data=mpg.cty.hwy)
type | min | Q1 | median | Q3 | max | mean | sd | n | missing |
---|---|---|---|---|---|---|---|---|---|
hwy | 12 | 18 | 24 | 27 | 44 | 23.44017 | 5.954643 | 234 | 0 |
cty | 9 | 14 | 17 | 19 | 35 | 16.85897 | 4.255946 | 234 | 0 |
gf_boxploth(type ~ mpg, data=mpg.cty.hwy)
Crucial in the assumptions behind the two-sample \(t\)-tests is that the two samples be independent of each other.
Paired variables are not independent.
Instead, for paired data we need to do a different test: the paired two-sample t-test.
For the paired two-sample t-test, instead of comparing \(\overline x\) with \(\overline y\), we compare \(\overline{x-y}\): the mean of the differences instead of the difference of means.
This works even if the two samples do not come from nice distributions, as long as their differences do.
Input | Vectors x, y of equal length or variables dataset$x , dataset$y |
Null hypothesis | Population mean difference \(\mu_{x-y}\) is equal to mu.0 (usually 0) |
Alternative hypothesis | Population mean difference \(\mu_{x-y}\) is [less than / not equal / greater than] mu.0 |
Test statistic | T = ( (mean(x-y)) - mu.0)/( df ) where df in full generality is given by a complicated formula |
Requirements The differences x-y
need to be appropriate for a one-sample test:
Dataset size | Requires | How to check |
---|---|---|
\(N\leq 15\) | Normal distribution |
gf_qq
|
\(15<N\leq40\) | No skew, no outliers |
gf_histogram and gf_boxplot
|
\(40<N\leq\)hundreds | No outliers |
gf_boxplot
|
hundreds\(<N\) | Few outliers |
gf_boxplot
|
Command: t.test
with arguments mu
, alternative
, conf.level
test = t.test(x, y, mu=5, alternative="two.sided", conf.level=0.99, paired=TRUE)
test
##
## Paired t-test
##
## data: x and y
## t = 10.69, df = 233, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 5
## 99 percent confidence interval:
## 6.197035 6.965358
## sample estimates:
## mean of the differences
## 6.581197
Effect size: Cohen’s \(d\) - library effsize
, command cohen.d
d = cohen.d(x, y, paired=TRUE)
d
##
## Cohen's d
##
## d estimate: 2.908509 (large)
## 95 percent confidence interval:
## inf sup
## 2.647925 3.169092
To check validity, check data size first, then use a QQ- or a boxplot to check the distribution
NROW(mpg)
## [1] 234
gf_boxploth("hwy-cty" ~ hwy-cty, data=mpg)