28 February, 2018

Composite hypotheses

As we recall, a simple hypothesis \(\Omega_j\) is one where \(|\Omega_j|=1\). As contrast, a composite hypothesis has more than one candidate member in its family of distributions.

When \(\Omega\subset\mathbb R\), we distinguish between

  • Lower tail: \(\Omega_1 = \{\theta: \theta<\theta_j\}\)
  • Upper tail: \(\Omega_1 = \{\theta: \theta>\theta_j\}\)
  • Two tailed: \(\Omega_1 = \{\theta: \theta\neq\theta_j\}\)

Composite tests

Hypothesis space \(\Omega=\Omega_0\cup\Omega_1\). The Neyman-Pearson test type generalizes to the likelihood ratio tests

Define \[ \lambda = \lambda(X) = \frac{\mathcal L(\Omega_0|X)}{\mathcal L(\Omega|X)} = \frac{\sup_{\theta\in\Omega_0}\mathcal L(\theta|X)} {\sup_{\theta\in\Omega}\mathcal L(\theta|X)} \\ \phi(X) = \begin{cases} 1 & \lambda(X) < k \\ p & \lambda(X) = k \\ 0 & \lambda(X) > k \end{cases} \]

Since an overall most likely model is at least as likely as a most likely null model, \(\lambda\in[0,1]\). High values indicate that the null hypothesis does well to explain the data.

Uniformly most powerful tests

A test \(\phi^*\) of composite (or simple) hypotheses is uniformly most powerful of its level \(\alpha\), if for all \(\phi\) with level at most \(\alpha\) \[ \mathbb E_\theta\phi^* \geq \mathbb E_\theta\phi \qquad\forall\theta\in\Omega_1 \]

A family of densities \(p_\theta, \theta\in\Omega\subset\mathbb R\) has monotonic likelihood ratios if there is a statistic \(T\) that makes \(\lambda\) monotonic when conditioned on \(T\), in other words: \[ \theta_1<\theta_2 \qquad\text{implies}\qquad \frac{\mathcal{L}(\theta_1|T)}{\mathcal{L}(\theta_2|T)} \text{ is non-decreasing with $T$} \] for almost every \(T\).

Example

For an exponential family \[ p_\theta(x) = \exp\left[ \color{green}{\eta(\theta)}\color{orange}{T(x)} - \color{blue}{B(\theta)} + A(x) \right] \] with strictly increasing \(\eta(\theta)\), the likelihood ratio can be written \[ \frac{p_{\theta_1}(x)}{p_{\theta_2}(x)} = \exp\left[ \color{green}{(\eta(\theta_2)-\eta(\theta_1))}\color{orange}{T(x)} - \color{blue}{B(\theta_1)+B(\theta_2)} \right] \] which is increasing in \(T\).

Uniformly most powerful test

Theorem If the densities \(p_\theta\) have monotonic likelihood ratios, then \[ \phi^*(x) = \begin{cases} 1 & T(x) > c \\ p & T(x) = c \\ 0 & T(x) < c \\ \end{cases} \]

is uniformly most powerful for \(H_0:\theta\leq\theta_0\) and \(H_1:\theta>\theta_0\) for the level \(\alpha=\mathbb E_{\theta_0}\phi^*\). The values \(c\) and \(p\) can be adjusted to achieve any desired level.

The power function \(\beta(\theta)=\mathbb E_\theta\phi^*\) is non-decreasing, strictly increasing when \(0<\beta<1\).

If \(\theta_1<\theta_0\), then \(\phi^*\) minimizes \(\mathbb E_{\theta_1}\phi\) among all tests with \(\mathbb E_{\theta_0}\phi=\alpha\).

Example

\(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\), both parameters unknown.

\(H_0:\mu\leq\mu_0\), and \(H_1:\mu>\mu_0\).

Find the uniformly most powerful test.

Example

\(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\), both parameters unknown.

\(H_0:\mu\leq\mu_0\), and \(H_1:\mu>\mu_0\).

By the preceding theorem, we should focus on likelihood ratio tests.

From the normal distribution follows: \[ \lambda(X) = \frac {\max_{\sigma^2>0}(2\pi\sigma^2)^{-n/2}\exp[-\sum(x_i-\mu_0)^2/2\sigma^2]} {\max_{\mu>\mu_0; \sigma^2>0}(2\pi\sigma^2)^{-n/2}\exp[-\sum(x_i-\mu)^2/2\sigma^2]} \]

We need \(\hat\mu\) and \(\hat\sigma^2\) that attain the maxima to continue.

Example

\(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\), both parameters unknown. Take derivatives of \(\log\mathcal L\).

\[ \frac{\partial\log\mathcal L}{\partial\mu} = -\frac12\cdot\sum-2(x_i-\mu)/\sigma^2 = \frac{1}{\sigma^2}\left(\sum x_i-n\mu\right) \\ \frac{\partial\log\mathcal L}{\partial\sigma^2} = -\frac12\left[ \frac{n}{\sigma^2} + (-1)\frac{\sum(x_i-\mu)^2}{(\sigma^2)^2} \right] \]

From \(\partial/\partial\mu=0\) follows \(\hat\mu=\overline x\) maximizes the likelihood, regardless of \(\hat\sigma^2\). If \(\mu\geq\mu_0\), the maximizer is \(\hat\mu_0=\max(\overline x, \mu_0)\).

From \(\partial/\partial\sigma^2=0\) follows \(\hat\sigma^2=\sum(x_i-\hat\mu)^2/n\) maximizes the likelihood.

Example

\(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\), both parameters unknown.

Since \(\sum(x_i-\hat\mu)^2/\hat\sigma^2 = \sum(x_i-\hat\mu)^2/(\sum(x_i-\hat\mu)^2/n)\) the likelihood ratio is \[ \lambda(x) = \frac{(2\pi\hat\sigma_0^2)^{-n/2}e^{-n/2}} {(2\pi\hat\sigma^2)^{-n/2}e^{-n/2}} = \left(\frac{\hat\sigma_0^2}{\hat\sigma^2}\right)^{-n/2} \]

It is enough to compare \(\hat\sigma^2/\hat\sigma_0^2\) to a cutoff constant. If \(\lambda(x)\) is small, then \(\hat\sigma_0^2/\hat\sigma^2\) is large, so our test depends on whether \(\hat\sigma_0^2/\hat\sigma^2>k''\).

To pick our cutoff \(k''\) we need to be able to calculate \(\mathbb{P}(\lambda(x)>k'')\).

Example

\(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\), both parameters unknown. Since \[ \sum(x_i-\mu_0)^2 = \sum[(x_i-\overline x)+(\overline x-\mu_0)]^2 = \\ \sum (x_i-\overline x)^2 + n(\overline x-\mu_0)^2 \\ k'' < \frac{\hat\sigma_0^2}{\hat\sigma^2} = \frac{\sum (x_i-\overline x)^2 + n(\overline x-\mu_0)^2} {\sum(x_i-\overline x)^2}= \\ 1+\frac{n(\overline x-\mu_0)^2}{\sum (x_i-\overline x)^2} \qquad\text{ is equivalent to} \\ \frac{n(\overline x-\mu_0)^2}{\sum (x_i-\overline x)^2} > k''-1=k' \]

Example

\(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\), both parameters unknown.

Introduce a factor of \((n-1)\) to get \(\sum(x_i-\overline x)^2\) to the sample variance: \[ \frac{n(\overline x-\mu_0)^2}{\sum(x_i-\overline x)^2/(n-1)} > (n-1)k' \\ \sqrt{ \frac{n(\overline x-\mu_0)^2}{\sum(x_i-\overline x)^2/(n-1)}} > \sqrt{(n-1)k'} = k \\ \text{So the test we derived consists of comparing} \\ \frac{\overline x-\mu_0}{S/\sqrt{n}} > k \]

Example

Theorem (A. Student [W. Gosset]) If \(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\) then \[ \frac{\overline x-\mu_0}{S/\sqrt{n}}\sim T(n-1) \]

The likelihood ratio test for normal variables with unknown mean and variance is the \(t\)-test.

This construction fits with the uniformly most powerful test setup: \((\overline x-\mu_0)/(S/\sqrt{n})\) is a monotonic statistic: if \(\overline x\) increases (but the independent \(S^2\) stays unchanged), then the statistic increases.

Example

\(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\), with known variance \(\sigma^2\). \(H_0:\mu=\mu_0\) and \(H_1:\mu>\mu_0\). Here the null hypothesis is simple, the alternative is composite.

The likelihood ratio is \[ \lambda(x) = \frac{(2\pi\sigma^2)^{-n/2}\exp\left[-\sum(x_i-\mu_0)^2/2\sigma^2\right]} {\max_{\mu>\mu_0}(2\pi\sigma^2)^{-n/2}\exp\left[-\sum(x_i-\mu)^2/2\sigma^2\right]} = \\ \exp\left[ \sum(x_i-\hat\mu)^2-\sum(x_i-\mu_0)^2 \right] \] where \(\hat\mu=\max(\mu_0,\overline x)\) like before.

Example

\(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\), with known variance \(\sigma^2\).

By the same maneuver as before, we can rewrite \[ \log\lambda(x) = \sum((x_i-\overline x)^2 + (\overline x-\mu_0))^2 = \sum(x_i-\overline x)^2 + n(\overline x-\mu_0)^2 \]

Our test would reject when \(\log\lambda(x) < k'\). Divide through by \(\sigma^2\): \[ n-1 + n(n-1)\frac{(\overline x-\mu_0)^2}{\sigma^2} < k'/\sigma^2 \]

Rearrange and take a square root, we reject when \[ \frac{\overline x-\mu_0}{\sigma/\sqrt{n}} < \left(\frac{k'}{\sigma^2}+1-n\right)\Big/(n-1) = k \]

Large sample likelihood ratios

In both the preceding examples, the most difficult step was to find a distribution for \(\lambda(X)\) so that we could choose a cutoff value \(k\).

Theorem Let \(X_1,\dots,X_n\) be iid. Let \(r_0\) be the number of free parameters specified by \(H_0:\theta\in\Omega_0\) and let \(r\) be the number of free parameters specified by \(\theta\in\Omega\). Then for large \(n\), the random value \(-2\log\lambda(X)\sim\chi^2(r_0-r)\).

Example

Tree counts in 50 different 1-hectare plots on the Barro Colorado Island.

Turpinia Occidentalis (Muttonwood) and Aspidosperma Desmanthum (Aracaranga) are two of the tree types counted.

Example

Tree counts in 50 different 1-hectare plots on the Barro Colorado Island.

These counts can be seen as instances of a Poisson random variable. \(H_0:\lambda_T=\lambda_A\), \(H_1:\lambda_T\neq\lambda_A\).

\[ \lambda(t,a) = \frac {\sup_{\lambda_T=\lambda_A=\lambda}\prod p(t_i|\lambda)\prod p(a_i|\lambda)} {\sup_{\lambda_T,\lambda_A}\prod p(t_i|\lambda_T)\prod p(a_i|\lambda_A)} = \\ \frac {\hat\lambda^{\sum t_i+\sum a_i}\exp[2n\hat\lambda]/\prod t_i!\prod a_i!} {\hat\lambda_T^{\sum t_i}\hat\lambda_A^{\sum a_i}\exp[n\hat\lambda_T+n\hat\lambda_A]/\prod t_i!\prod a_i!} = \\ \exp\left[ (\log\hat\lambda-\log\hat\lambda_T)\sum t_i+ (\log\hat\lambda-\log\hat\lambda_A)\sum a_i+ \\ n(2\hat\lambda-\hat\lambda_T-\hat\lambda_A) \right] \]

Example

Tree counts in 50 different 1-hectare plots on the Barro Colorado Island.

The sample mean is a complete sufficient estimator of \(\lambda\); this gives us \(\hat\lambda=\left(\sum t_i+\sum a_i\right)/2n\), \(\hat\lambda_T=\overline t\), \(\hat\lambda_A=\overline a\). Thus

\[ -2\log\lambda(t,a) = \\ \exp\left[ (\log\hat\lambda-\log\hat\lambda_T)\sum t_i+ (\log\hat\lambda-\log\hat\lambda_A)\sum a_i+ \\ n(2\hat\lambda-\hat\lambda_T-\hat\lambda_A) \right] = \\ \]

Example

Tree counts in 50 different 1-hectare plots on the Barro Colorado Island.

We have two free parameters (\(\lambda_T\) and \(\lambda_A\)). The null hypothesis fixes \(r_0=1\), the alternative fixes \(r=0\).

\[ -2\log\lambda(t,a) = \\-2n\left[ (\log\hat\lambda-\log \overline t)\overline t + (\log\hat\lambda-\log \overline a)\overline a + \hat\lambda-\overline t + \hat\lambda-\overline a \right] \\ \sim\chi^2(r_0-r) = \chi^2(1) \]

Example

Tree counts in 50 different 1-hectare plots on the Barro Colorado Island.

In the data, we get

Turpinia.occidentalis Aspidosperma.desmanthum lambda.hat
1.16 1.04 1.1

From this we get \(-2\log\lambda(t,a) \approx 2.2336388\). The \(\chi^2\) probability of at least this large a value is \(14\%\)

We would not, at a significance level of \(5\%\), say, reject the null hypothesis of equal density for the two tree types.

12.8:14

\(X\) is the number of tails flipped until the first head. \(H_0: p=1/2\), \(H_1: p > 1/2\).

Determine a most powerful test with level \(5\)%. What is the power if \(p=40\)%?

12.8:14

\(X\) is the number of tails flipped until the first head. \(H_0: p=1/2\), \(H_1: p > 1/2\). Determine a most powerful test with level \(5\)%.

This is the geometric distribution; likelihood is \(\mathcal L(p|x) = p(1-p)^{x} = \exp[\color{orange}{\log(1-p)}\color{green}{\cdot x}+\color{blue}{\log p}]\). As \(x\) grows, so does \(x\log(1-p)\), so the likelihood ratio is monotonic in \(x\).

The uniformly most powerful test has the form \[ \phi(x) = \begin{cases} 1 & x > k \\ \gamma & x = k \\ 0 & x < k \end{cases} \]

12.8:14

\(X\) is the number of tails flipped until the first head. \(H_0: p=1/2\), \(H_1: p > 1/2\). Determine a most powerful test with level \(5\)%.

For \(p=1/2\), \(\mathbb{P}(X\leq 3) = 93.75\%\) and \(\mathbb P(X=4)=3.125\%\): setting \(k=4\) and solving for \(\gamma\) in \[ \alpha=\mathbb{E}_0\phi = \gamma\mathbb P(x=4) + \mathbb P(x>4) = \gamma\cdot\frac{1}{32} + \frac{1}{32} \\ \frac{1}{20} = \frac{1}{32}(\gamma+1) \\ \gamma = \frac{32}{20}-1 = \frac{12}{20} = \frac35 \]

12.8:14

\(X\) is the number of tails flipped until the first head. \(H_0: p=1/2\), \(H_1: p > 1/2\). What is the power if \(p=40\)%?

The power at \(p=40\%\) is \[ \mathbb{E}(\phi | p=0.4) = 0.0518\frac35 + 0.0778 = 0.1089 \]

12.8:27

\(X_1,\dots,X_n \sim \mathcal N(0,\sigma^2)\). \(H_0: \sigma=\sigma_0\). \(H_1: \sigma>\sigma_0\). Find a uniformly most powerful test.

12.8:27

\(X_1,\dots,X_n \sim \mathcal N(0,\sigma^2)\). \(H_0: \sigma=\sigma_0\). \(H_1: \sigma>\sigma_0\). Find a uniformly most powerful test.

The likelihood ratio is \[ \frac{\mathcal{L}(\sigma_2^2,x)}{\mathcal{L}(\sigma_1^2,x)} \frac {(\sigma_2^2)^{-1/2}\exp\left[-\sum x_i^2/2\sigma_2^2\right]} {(\sigma_1^2)^{-1/2}\exp\left[-\sum x_i^2/2\sigma_1^2\right]} = \\ \exp\left[ -\frac12\left( \log\sigma_2^2-\log\sigma_1^2 +\left(\frac{1}{\sigma_2^2}-\frac{1}{\sigma_1^2}\right)\sum x_i^2 \right) \right] \] If \(\sigma_2^2 > \sigma_1^2\), then the difference of fractions is negative, so the statistic \(\sum x_i\) is multiplied by a positive quantity, so the likelihood ratio is monotonic in \(T=\sum x_i\). The UMP is produced by comparing \(T\) to some constant.

12.8:27

\(X_1,\dots,X_n \sim \mathcal N(0,\sigma^2)\). \(H_0: \sigma=\sigma_0\). \(H_1: \sigma>\sigma_0\). A uniformly most powerful test compares \(\sum x_i^2\) to a constant. To find the constant, consider \(X_i^2/\sigma^2\). Under \(H_0\), these are squares of standard normals, so

\[ \sum\frac{x_i^2}{\sigma_0^2} \sim \chi^2(n) \]

So for \(\chi_0 = \text{CDF}^{-1}_{\chi^2(n)}(0.95)\), our test rejects the null if \(\sum{x_i^2} > \sigma_0^2\chi_0\).

Note that the \(\chi^2\) distribution here has \(n\) degrees of freedom, not \(n-1\). This is because the population mean is given, not estimated.

Two sample tests

A common distinction when applying statistics is between one-sample tests and two-sample tests.

For a one-sample test, one sample \(X_1,\dots,X_n\) is given, and hypotheses tend to be a simple \(H_0\) vs a composite \(H_1\).

For a two-sample test, two different samples \(X_1,\dots,X_n\) and \(Y_1,\dots,Y_m\), possibly from different distributions are given and instead of checking alternatives of the shape \(\theta>\theta_0\) the test would check whether \(\theta_X > \theta_Y\).

By considering the joint hypothesis space to be \(\Omega_{X,Y} = \Omega_X\times\Omega_Y\), these fit in this framework: the null and alternative hypotheses form a partition of \(\Omega_{X,Y}\), and the test proceeds as above.

Example

\(X_1,\dots,X_n\sim\mathcal N(\mu_X,\sigma^2)\) and \(Y_1,\dots,Y_n\sim\mathcal N(\mu_Y,\sigma^2)\). Known and identical variances, same sample size. Test for \(H_0:\mu_X=\mu_Y\) and \(H_1:\mu_X>\mu_Y\).

Using \(\Delta\mu=\mu_X-\mu_Y\) we can rewrite the hypotheses to \(H_0:\Delta\mu=0\) and \(H_1:\Delta\mu>0\).

Since \(\overline X\sim\mathcal N(\mu_X,\sigma^2/n)\) and \(\overline Y\sim\mathcal N(\mu_Y,\sigma^2/n)\), \[ \overline X-\overline Y \sim \mathcal N(\Delta\mu,2\sigma^2/n) \]

So we can build a test by calculating the two-sample z-score \[ \frac{\overline X-\overline Y}{2\sigma/\sqrt{n}} \sim \mathcal N(0,1) \]