\[ \def\RR{{\mathbb{R}}} \def\PP{{\mathbb{P}}} \def\EE{{\mathbb{E}}} \def\VV{{\mathbb{V}}} \]
Recall from MLE that we defined the score as \(U(\theta)=\frac{d\ell(\theta)}{d\theta}\) and the Fisher information as \(I(\theta)=-\EE\left[\frac{d^2}{d\theta^2}\ell(\theta)\right]\).
Theorem (Rao)
Asymptotically, as \(n\to\infty\), we approach distributions \[ S(\theta_0) = \frac{U(\theta_0)^2}{\mathcal{I}(\theta_0)} \overset\to\sim \chi^2(1) \qquad \text{or equivalently} \qquad \sqrt{S(\theta_0)} \overset\to\sim \mathcal{N}(0,1) \]
This gives rise to
The Rao Test / The Score Test / The Lagrange Multiplier Test
Consider Neyman-Pearson: a most powerful test for \(H_0:\theta=\theta_0\) vs. \(H_a:\theta>\theta_0\) is one that rejects if \[ \frac{\mathcal{L}(\theta_0+h|x)}{\mathcal{L}(\theta_0 | x)} \geq K \quad\text{or equivalently}\quad \ell(\theta_0+h|x) - \ell(\theta_0|x) \geq \log K \]
For small \(h\), we can make a Taylor series approximation: \[ \ell(\theta_0+h|x) \approx \ell(\theta_0|x) + h\cdot\left(\frac{d}{d\theta}\ell(\theta)\right)_{\theta=\theta_0} \]
So we would get a most powerful test using \(U(\theta_0|x)\geq\frac{\log K}{h}\). If we knew the sampling distribution of \(U(\theta_0|x)\), we would be done here. For large samples, however, we do know that \(U(\theta_0|x)\overset\approx\sim\mathcal{N}(0,\mathcal{I}_n(\theta_0))\).
Hence \(\frac{U(\theta_0|x)}{\sqrt{\mathcal{I}_n(\theta_0)}}\overset\approx\sim\mathcal{N}(0,1)\), and the test follows immediately.
From \(\mathcal{L}(p|k) = {n\choose k}p^k(1-p)^{n-k}\) follows \[ \begin{align*} U(p) = \frac{d}{dp}\log\mathcal{L}(p|k) &= \frac{d}{dp}\left(\log{n\choose k} + k\log p + (n-k)\log(1-p)\right) \\ &= \frac{k}{p} + \frac{n-k}{1-p} = \frac{k-np}{p-p^2}\\ \mathcal{I}(p) = -\EE_k\left[\frac{d^2}{dp^2}\log PDF(k|p)\right] &= -\EE_k\left[\frac{d}{dp}\left(U(p)\right)\right] = -\EE_k\left[\frac{k-n}{(1-p)^2} - \frac{k}{p^2}\right] \\ \text{because }\EE[k] = np\qquad\qquad &=-\frac{np-n}{(1-p)^2} + \frac{np}{p^2} = \frac{n}{1-p} + \frac{n}{p} \\ S = \frac{U^2}{\mathcal{I}} &= \left(\frac{k-np_0}{p_0(1-p_0)}\right)^2\cdot\frac{p_0(1-p_0)}{n} = \frac{(k-np_0)^2}{np_0(1-p_0)} \\ \sqrt{S} = \frac{k-np_0}{\sqrt{np_0(1-p_0)}} &= \frac{\frac{k}{n}-p_0}{\sqrt{p_0(1-p_0)/n}} = \frac{\hat{p}-p_0}{\sqrt{p_0(1-p_0)/n}} \end{align*} \]
…which we recognize from earlier proportions tests.
So far most of our null hypotheses have been of the shape
Some parameter of the population from which we take a single sample takes a single specific value.
However, there is frequent need to compare not just one observation against our expectations, but to compare two observations against each other.
For possibly the primary source of why we care about comparing two samples, we turn to causality.
Causality is the (difficult) study of whether one thing causes another thing. It is notoriously difficult to establish, primarily because we cannot usually observe the exact same process once with and once without a potential cause present.
You have likely all heard at some point:
Correlation is not causation.
There are (philosophically) four possible causes for correlation:
Sir Austin Bradford Hill proposed 9 criteria to provide epidemiological evidence for a causal relationship between a potential cause and an effect:
We will work with counterfactual hypotheses - about potential outcomes if some treatment was taken or not.
Consider a set of experimental units \(I\). For any \(i\in I\), we define: * \(Y_{0i}\) the potential outcome for unit \(i\) without the treatment. * \(Y_{1i}\) the potential outcome for unit \(i\) with the treatment. * \(T_i\) a treatment indicator, so that \(T_i=1\) if unit \(i\) received treatment, and \(T_i=0\) if the unit did not.
We can define the individual treatment effect \(Y_{1i}-Y_{0i}\), but this can not actually be measured. Instead we may focus on the average treatment effect \(ATE = \EE[Y_1-Y_0]\). Or even easier to work with would be the ATE on the treated \(ATT=\EE[Y_1-Y_0|T=1]\).
We still, however, only have quantities we are not able to actually measure - since we cannot simultaneously measure \(Y_{0i}\) and \(Y_{1i}\) for any one unit.
\[ \begin{align*} \EE[Y|T=1] - \EE[Y|T=0] &= \EE[Y_1|T=1] - \EE[Y_0|T=0] \\ &= \underbrace{\EE[Y_1|T=1] \color{SteelBlue}{-\EE[Y_0|T=1]}}_{ATT} + \underbrace{\color{SteelBlue}{\EE[Y_0|T=1]} - \EE[Y_0|T=0]}_{\text{Bias}} \end{align*} \]
ATT measures the effect of the treatment on the treated, and Bias measures the extent to which the treated and control groups differ without the treatment.
This means, if we could eliminate the Bias term, then we would be measuring precisely the impact of the treatment in general. And eliminating Bias means both groups are comparable before the treatment.
And if the two groups are comparable before the treatment, they are still exchangeable after the treatment. So Not only does the association measure the ATT, but it also measures the ATE.
Randomization of the treatment annihilates bias by making the potential outcomes independent of the treatment. In other words, knowing anything about who got assigned which treatment should not give me any information about how the outcome was before the treatment.
Or in other words, that on average, the groups are indistinguishable.
Independence implies precisely that \(\EE[Y_0|T=0] = \EE[Y_0] = \EE[Y_0|T=1]\).
Gold Standard for Causality
The Randomized Controlled Trial (RCT) is the best way to eliminate Bias from group comparisons, as such a carefully designed RCT is our best access to causal inference.
In an RCT, we would split our experimental units \(I\) into two groups: \(I_1\) and \(I_0\) (or \(I_T\) and \(I_C\)), apply treatment to the units in \(I_1\), and then compare the expected values between the two groups.
In other words, the underlying null hypothesis for an RCT is \(\boxed{H_0: \mu_{X} = \mu_{Y}}\).
We know how this goes by now… We know a good estimator for \(\mu_X - \mu_Y\), namely \(\overline{X}-\overline{Y}\).
If \(X = X_1,\dots,X_{n_X} \sim \mathcal{N}(\mu_X, \sigma^2_X)\) and \(Y = Y_1,\dots,Y_{n_Y}\sim \mathcal{N}(\mu_Y, \sigma^2_Y)\) are independent iid samples, then by the rules for computing expected values and variances of sums of random variables, and by the theorem that two normal variables sum to a new normal variable, it follows:
\[ \begin{align*} \EE[\overline{X}-\overline{Y}] &= \EE\left[\frac{\sum X_i}{n_X} - \frac{\sum Y_i}{n_Y}\right] & \VV[\overline{X}-\overline{Y}] &= \VV\left[\frac{\sum X_i}{n_X} - \frac{\sum Y_i}{n_Y}\right] \\ &= \frac{\sum \EE[X_i]}{n_X} - \frac{\sum \EE[Y_i]}{n_Y} & &= \frac{1}{n_X^2}\sum\VV[X_i] + \frac{1}{n_Y^2}\sum\VV[Y_i] \\ &= \frac{n_X\mu_X}{n_X} - \frac{n_Y\mu_Y}{n_Y} = \mu_X-\mu_Y& &= \frac{n_X\sigma_X^2}{n_X^2} + \frac{n_Y\sigma_Y^2}{n_Y^2} = \frac{\sigma_X^2}{n_X} + \frac{\sigma_Y^2}{n_Y} \end{align*} \]
We are ignoring all the \(Cov(X_i,X_j)\) and \(Cov(Y_i,Y_j)\) terms in the variance computation because each sample was assumed to be iid, and we are ignoring all the \(Cov(X_i,Y_j)\) terms in the variance because the two samples were assumed to be independent.
So \(\overline{X}-\overline{Y}\sim\mathcal{N}\left(\mu_X-\mu_Y, \frac{\sigma_X^2}{n_X}+\frac{\sigma_Y^2}{n_Y}\right)\) and we get
Two-sample test for population means with known variances
Alternative Hypothesis | Rejection Region for Level \(\alpha\) | Power at \(\Delta\) | Sample size needed for power \(1-\beta\) at \(\Delta\) with balanced groups |
---|---|---|---|
Upper Tail | \(z \geq z_\alpha\) | \(1-\Phi\left(z_\alpha-\frac{\Delta-\Delta_0}{\sqrt{\frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X}}}\right)\) | \(m=n=\frac{(\sigma_1^2+\sigma_2^2)(z_\alpha+z_\beta)^2}{(\Delta-\Delta_0)^2}\) |
Two-tailed | \(|z| \geq z_{\alpha/2}\) | \(\Phi\left(z_{\alpha/2}-\frac{\Delta-\Delta_0}{\sqrt{\frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X}}}\right)-\Phi\left(-z_{\alpha/2}-\frac{\Delta-\Delta_0}{\sqrt{\frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X}}}\right)\) | \(m=n=\frac{(\sigma_1^2+\sigma_2^2)(z_{\alpha/2}+z_\beta)^2}{(\Delta-\Delta_0)^2}\) |
Lower Tail | \(z \leq -z_\alpha\) | \(\Phi\left(-z_\alpha-\frac{\Delta-\Delta_0}{\sqrt{\frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X}}}\right)\) | \(n_X=n_Y=\frac{(\sigma_X^2+\sigma_Y^2)(z_\alpha+z_\beta)^2}{(\Delta-\Delta_0)^2}\) |
If \(m>40\) and \(n>40\), we can get away with estimating \(\sigma_X^2\approx S_X^2\) and \(\sigma_Y^2\approx S_Y^2\).
Knowing a pivot quantity allows us to derive a confidence interval - and \(\frac{(\overline{X}-\overline{Y})-(\mu_X-\mu_Y)}{\sqrt{\frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X}}}\) is exactly the pivot we need.
Solve the simultaneous inequalities: \[ \PP\left(-z_{\alpha/2} \leq \frac{(\overline{X}-\overline{Y})-(\mu_X-\mu_Y)}{\sqrt{\frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X}}} \leq z_{\alpha/2}\right) \\ \PP\left( -(\overline{X}-\overline{Y}) -z_{\alpha/2}\sqrt{\frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X}} \leq \mu_X-\mu_Y \leq -(\overline{X}-\overline{Y}) +z_{\alpha/2}\sqrt{\frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X}} \right) \\ \PP\left( (\overline{X}-\overline{Y}) +z_{\alpha/2}\sqrt{\frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X}} \geq \mu_X-\mu_Y \geq (\overline{X}-\overline{Y}) -z_{\alpha/2}\sqrt{\frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X}} \right) \]
Again, the Central Limit Theorem allows us to replace \(\sigma_X^2\approx s_X^2\) and \(\sigma_Y^2\approx s_Y^2\).
As usual, ensure that both your samples look normal with probability plots or QQ-plots before relying on this.
Theorem
With \(X_1,\dots,X_{n_X}\sim\mathcal{N}(\mu_X,\sigma_X^2)\) and \(Y_1,\dots,Y_{n_Y}\sim\mathcal{N}(\mu_Y,\sigma_Y^2)\), the standardized variable \[ T = \frac{(\overline{X}-\overline{Y})-(\mu_X-\mu_Y)}{\sqrt{se_X^2+se_Y^2}} \sim T(\nu) \quad\text{where}\quad se_X = \frac{s_X}{\sqrt{n_X}}, se_Y=\frac{s_Y}{\sqrt{n_Y}} \quad\text{and}\quad \nu \approx \frac{(se_X^2+se_Y^2)^2}{\frac{se_X^4}{n_X-1}+\frac{se_Y^4}{n_Y-1}} \]
To see this, start by dividing numerator and denominator of \(T\) by the standard deviation of the numerator (…just like for the one-sample case…) to get
\[ \frac{\frac{(\overline{X}-\overline{Y})-(\mu_X-\mu_Y)}{\sqrt{\frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X}}}}{\frac{\sqrt{se_X^2+se_Y^2}}{\sqrt{\frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X}} }} \]
The numerator is a standard normal, but this denominator is not usually going to be a ratio of a \(\chi^2\)-variable by its degrees of freedom.
But suppose there was some \(W\sim\chi^2(\nu)\) such that \[ \frac{S_X^2}{n_X}+\frac{S_Y^2}{n_Y} = \left(\frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X}\right)\frac{W}{\nu} \]
We could try to determine a good value for \(\nu\) by matching expected values and variances.
We want to find \(W\sim\chi^2(\nu)\) such that \[ \frac{S_X^2}{n_X}+\frac{S_Y^2}{n_Y} = \left(\frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X}\right)\frac{W}{\nu} \]
We know many of the component variables and their behaviors here: \[ \begin{align*} \EE[W] &= \nu & \VV[W] &= 2\nu & \frac{(n_X-1)S_X^2}{\sigma_X^2}&\sim\chi^2(n_X-1) & \frac{(n_Y-1)S_Y^2}{\sigma_Y^2}&\sim\chi^2(n_Y-1) \\ \EE\left[\frac{W}{\nu}\right] &= 1 & && \EE\left[\frac{S_X^2}{n_X}+\frac{S_Y^2}{n_Y}\right] &= \frac{\sigma^2_X}{n_X}+\frac{\sigma^2_X}{n_X} \end{align*} \]
So expected values agree between the two sides. (bear in mind that \(\sigma_X^2\) and \(\sigma_Y^2\) are deterministic, not random)
\[ 2(n_X-1) = \VV\left[\frac{(n_X-1)S_X^2}{\sigma_X^2}\right] = \frac{(n_X-1)^2}{\sigma_X^4}\VV[S_X^2] \\ \VV[S_X^2] = \frac{2\sigma_X^4}{n_X-1} \quad\text{and similar for }\VV[S_Y^2] \\ \VV\left[\frac{S_X^2}{n_X}+\frac{S_Y^2}{n_Y}\right] = \frac{1}{n_X^2}\VV[S_X^2] + \frac{1}{n_Y^2}\VV[S_Y^2] = \frac{2\sigma_X^4}{(n_X-1)n_X^2} + \frac{2\sigma_Y^4}{(n_Y-1)n_Y^2} \]
And for the other side of the equation, \[ \VV\left[\left(\frac{\sigma_X^2}{n_X}+\frac{\sigma_Y^2}{n_Y}\right)\frac{W}{\nu}\right] = \left(\frac{\sigma_X^2}{n_X}+\frac{\sigma_Y^2}{n_Y}\right)^2\frac{1}{\nu^2}\VV[W] = \left(\frac{\sigma_X^2}{n_X}+\frac{\sigma_Y^2}{n_Y}\right)^2\frac{2\nu}{\nu^2} \]
Equating the two variances and solving for \(\nu\) finishes the computation of an appropriate choice of Degrees of Freedom.
The resulting hypothesis test and confidence interval constructions are known as Welch’s Two Sample T-Test after the statistician who derived the approximation for \(\nu\).
We might have reason to believe the two data sets may have the same variance. If we do, both samples contribute to estimating the same parameter - so by rights, we should be able to get a better estimate of that parameter that way.
A first approach could be to take the average of \(S_X^2\) and \(S_Y^2\) as an estimator: surely if \(\hat\theta_1\) is unbiased for \(\theta\) and \(\hat\theta_2\) is unbiased for \(\theta\), the variance of their average is: \[ \VV\left[\frac{\hat\theta_1+\hat\theta_2}{2}\right] = \frac{1}{4}\VV[\hat\theta_1] + \frac{1}{4}\VV[\hat\theta_2] \]
So the variance is half of the average of the variances.
This works for equal sample sizes, fine - but if sample sizes are unequal, then the information contained in either estimator varies. Instead we’d want a weighted average.
Remember that \(\frac{(n-1)S^2}{\sigma^2}\) is \(\chi^2\)-distributed. But this is just our way of rewriting the claim that \(\frac{\sum(X_i-\overline{X})^2}{\sigma^2}\) is \(\chi^2\)-distributed.
Suppose we pick some \(S_p^2 = wS_X^2 + (1-w)S_Y^2\). We would like to pick \(w\) to minimize the variance of this (unbiased) estimator.
\[ \begin{align*} \VV[S_p^2] &= \VV[wS_X^2 + (1-w)S_Y^2] & \frac{d}{dw}\VV[S_p^2] = 2w\VV[S_X^2] - 2(1-w)\VV[S_Y^2] &= 0 \\ &= w^2\VV[S_X^2] + (1-w)^2\VV[S_Y^2] & \text{precisely when }\qquad \frac{w}{1-w} &= \frac{\VV[S_Y^2]}{\VV[S_X^2]} \end{align*} \]
The variance \(\VV[S^2]\) can be shown1 to take the value \(\frac{\mu_4}{n}-\frac{n-1}{n(n-1)}\sigma^4\) (where \(\mu_4\) is the 4th central moment)
For most distributions, this makes the weighting very complicated. However, for the normal distribution, \(\mu_4 = 3\sigma^4\), and thus
\[ \frac{\mu_4}{n}-\frac{n-1}{n(n-1)}\sigma^4 = \frac{3\sigma^4}{n}-\frac{n-3}{n(n-1)}\sigma^4 = \frac{2}{n-1}\sigma^4\\ \text{and therefore, since we assume equal variances}\\ \frac{w}{1-w} = \frac{\VV[S_Y^2]}{\VV[S_X^2]} = \frac{2\sigma^4/(n_X-1)}{2\sigma^4/(n_Y-1)} = \frac{n_X-1}{n_Y-1} \]
so we choose some \(w\) such that
\[ \begin{align*} w(n_Y-1) &= (1-w)(n_X-1) \\ w((n_Y-1) + (n_X-1)) &= n_X-1 \\ w &= \frac{n_X-1}{n_X+n_Y-2} \end{align*} \]
To sum up all this work, we can make a better estimator for \(\sigma^2\) by taking the weighted average, or pooled estimator
\[ S_P^2 = \frac{(n_X-1)S_X^2 + (n_Y-1)S_Y^2}{n_X+n_Y-2} \]
Note that \[ (n_X+n_Y-2)\frac{S_P^2}{\sigma^2} = \frac{(n_X-1)S_X^2}{\sigma^2} + \frac{(n_Y-1)S_Y^2}{\sigma^2} \]
is a sum of two variables distributed as \(\chi^2(n_X-1)\) and \(\chi^2(n_Y-1)\) respectively. It follows that their sum is distributed as a \(\chi^2(n_X-1+n_Y-1)=\chi^2(n_X+n_Y-2)\) variable.
Deriving a T-distributed test statistic and confidence interval with this variance estimator goes exactly as for every other time we have done it.
Caveat!
Since the derivation crucially involved the behavior of the 4th central moment of a normal distribution, this is more sensitive than many other things to the assumptions of normality. And working with ratios of the shape \(\sigma_X^4/\sigma_Y^4\), even small deviations from equality of the two variances get amplified a lot.
This approach used to be widely recommended, to a large part because the degrees of freedom are so much easier to compute, and because when the assumptions are true we do get sharper estimates and more precise statements (both in terms of p-values and in terms of confidence intervals).
However, it is easy to be led astray by small deviations from the assumptions, and the Welch T-test works regardless of whether the variances do or do not match. We could imagine first testing for equal variances, but the F-test that can do that is even more sensitive to data being normally distributed.
The safe bet is to only use pooled variances when you have very compelling evidence of equal variances.
Use the pooled \(T\) pivot variable to obtain a pooled \(T\) confidence interval:
\[ \PP\left( -t_{\alpha/2, m+n-2} \leq \frac{(\overline{X}-\overline{Y})-(\mu_1-\mu_2)}{S_p\sqrt{\frac{1}{m}+\frac{1}{n}}} \leq t_{\alpha/2, m+n-2} \right) = 1-\alpha \\ -t_{\alpha/2, m+n-2}S_p\sqrt{\frac{1}{m}+\frac{1}{n}} \leq (\overline{X}-\overline{Y})-(\mu_1-\mu_2) \leq t_{\alpha/2, m+n-2}S_p\sqrt{\frac{1}{m}+\frac{1}{n}} \\ -(\overline{X}-\overline{Y}) - t_{\alpha/2, m+n-2}S_p\sqrt{\frac{1}{m}+\frac{1}{n}} \leq -(\mu_1-\mu_2) \leq -(\overline{X}-\overline{Y}) + t_{\alpha/2, m+n-2}S_p\sqrt{\frac{1}{m}+\frac{1}{n}} \\ (\overline{X}-\overline{Y}) + t_{\alpha/2, m+n-2}S_p\sqrt{\frac{1}{m}+\frac{1}{n}} \geq (\mu_1-\mu_2) \geq (\overline{X}-\overline{Y}) - t_{\alpha/2, m+n-2}S_p\sqrt{\frac{1}{m}+\frac{1}{n}} \]
So we get the pooled \(T\) confidence interval
\[ \mu_1-\mu_2 \in (\overline{X}-\overline{Y}) \pm t_{\alpha/2, m+n-2}S_p\sqrt{\frac{1}{m}+\frac{1}{n}} \]
Samples of maximum output of moisture from ultrasonic humidifiers of two particular brands were taken, yielding values:
We compute:
X_Y = moisture_1.mean()-moisture_2.mean()
df = len(moisture_1)+len(moisture_2)-2
Sp = ((len(moisture_1)-1)*moisture_1.std() + (len(moisture_2)-1)*moisture_2.std())/df
tadf = scipy.stats.t(df).ppf(0.975)
margin = tadf * Sp * np.sqrt(1/len(moisture_1)+1/len(moisture_2))
print(f"""
$\overline{{X}}-\overline{{Y}}$ | $S_p$ | $t_{{0.025,{df}}}$ | Margin of Error | CI
-|-|-|-|-
{X_Y:.4f} | {Sp:.4f} | {tadf:.4f} | {margin:.4f} | ({X_Y-margin:.4f}, {X_Y+margin:.4f})
""")
\(\overline{X}-\overline{Y}\) | \(S_p\) | \(t_{0.025,6}\) | Margin of Error | CI |
---|---|---|---|---|
1.7000 | 0.9677 | 2.4469 | 1.6743 | (0.0257, 3.3743) |
Using Welch’s T Confidence Interval instead, we compute:
sesq_1 = moisture_1.var()/len(moisture_1)
sesq_2 = moisture_2.var()/len(moisture_2)
df_W = (sesq_1 + sesq_2)**2/((sesq_1**2)/(len(moisture_1)-1)+(sesq_2**2)/(len(moisture_2)-1))
tadf_W = scipy.stats.t(df_W).ppf(0.975)
margin_W = tadf_W*np.sqrt(sesq_1+sesq_2)
print(f"""
$\overline{{X}}-\overline{{Y}}$ | $se_1^2$ | $se_2^2$ | $t_{{0.025,{df_W:.4f}}}$ | Margin of Error | CI
-|-|-|-|-|-
{X_Y:.4f} | {sesq_1:.4f} | {sesq_2:.4f} | {tadf_W:.4f} | {margin_W:.4f} | ({X_Y-margin_W:.4f}, {X_Y+margin_W:.4f})
""")
\(\overline{X}-\overline{Y}\) | \(se_1^2\) | \(se_2^2\) | \(t_{0.025,5.7899}\) | Margin of Error | CI |
---|---|---|---|---|---|
1.7000 | 0.2813 | 0.1913 | 2.4686 | 1.6969 | (0.0031, 3.3969) |
We can see that the margin of error is slightly bigger for the Welch method, yielding a confidence interval that just barely misses including \(0\).
Pooled T-Test for Difference of Means
Alternative Hypothesis | Rejection Region for Level \(\alpha\) | P-value |
---|---|---|
Upper Tail | \(T>t_{\alpha,m+n-2}\) | \(1-CDF_{T(m+n-2)}(T)\) |
Two-tailed | \(|T|>t_{\alpha/2,m+n-2}\) | \(2\min\{1-CDF_{T(m+n-2)}(T), CDF_{T(m+n-2)}(T)\}\) |
Lower Tail | \(T<-t_{\alpha,m+n-2}\) | \(CDF_{T(m+n-2)}(T)\) |
For our data on moisturizer output, we compute:
\(\overline{X}-\overline{Y}\) | \(\Delta_0\) | \(S_p\) | \(S_p\sqrt{\frac1m+\frac1n}\) | \(T\) | df | \(p\) |
---|---|---|---|---|---|---|
1.7000 | 0 | 0.9677 | 0.6842 | 2.4845 | 6 | 0.0475 |
A two-tailed test can conclude at a 5% significance level that there is a difference in moisture output between the two brands.