Lecture 9

28 February, 2018

Composite hypotheses

As we recall, a simple hypothesis $\Omega_j$ is one where $|\Omega_j|=1$. As contrast, a composite hypothesis has more than one candidate member in its family of distributions.

When $\Omega\subset\mathbb R$, we distinguish between

Lower tail: $\Omega_1 = \{\theta: \theta<\theta_j\}$
Upper tail: $\Omega_1 = \{\theta: \theta>\theta_j\}$
Two tailed: $\Omega_1 = \{\theta: \theta\neq\theta_j\}$

Composite tests

Hypothesis space $\Omega=\Omega_0\cup\Omega_1$. The Neyman-Pearson test type generalizes to the likelihood ratio tests

Define \[ \lambda = \lambda(X) = \frac{\mathcal L(\Omega_0|X)}{\mathcal L(\Omega|X)} = \frac{\sup_{\theta\in\Omega_0}\mathcal L(\theta|X)} {\sup_{\theta\in\Omega}\mathcal L(\theta|X)} \\ \phi(X) = \begin{cases} 1 & \lambda(X) < k \\ p & \lambda(X) = k \\ 0 & \lambda(X) > k \end{cases} \]

Since an overall most likely model is at least as likely as a most likely null model, $\lambda\in[0,1]$. High values indicate that the null hypothesis does well to explain the data.

Uniformly most powerful tests

A test $\phi^*$ of composite (or simple) hypotheses is uniformly most powerful of its level $\alpha$, if for all $\phi$ with level at most $\alpha$ \[ \mathbb E_\theta\phi^* \geq \mathbb E_\theta\phi \qquad\forall\theta\in\Omega_1 \]

A family of densities $p_\theta, \theta\in\Omega\subset\mathbb R$ has monotonic likelihood ratios if there is a statistic $T$ that makes $\lambda$ monotonic when conditioned on $T$, in other words: \[ \theta_1<\theta_2 \qquad\text{implies}\qquad \frac{\mathcal{L}(\theta_1|T)}{\mathcal{L}(\theta_2|T)} \text{ is non-decreasing with $T$} \] for almost every $T$.

Example

For an exponential family \[ p_\theta(x) = \exp\left[ \color{green}{\eta(\theta)}\color{orange}{T(x)} - \color{blue}{B(\theta)} + A(x) \right] \] with strictly increasing $\eta(\theta)$, the likelihood ratio can be written \[ \frac{p_{\theta_1}(x)}{p_{\theta_2}(x)} = \exp\left[ \color{green}{(\eta(\theta_2)-\eta(\theta_1))}\color{orange}{T(x)} - \color{blue}{B(\theta_1)+B(\theta_2)} \right] \] which is increasing in $T$.

Uniformly most powerful test

Theorem If the densities $p_\theta$ have monotonic likelihood ratios, then \[ \phi^*(x) = \begin{cases} 1 & T(x) > c \\ p & T(x) = c \\ 0 & T(x) < c \\ \end{cases} \]

is uniformly most powerful for $H_0:\theta\leq\theta_0$ and $H_1:\theta>\theta_0$ for the level $\alpha=\mathbb E_{\theta_0}\phi^*$. The values $c$ and $p$ can be adjusted to achieve any desired level.

The power function $\beta(\theta)=\mathbb E_\theta\phi^*$ is non-decreasing, strictly increasing when $0<\beta<1$.

If $\theta_1<\theta_0$, then $\phi^*$ minimizes $\mathbb E_{\theta_1}\phi$ among all tests with $\mathbb E_{\theta_0}\phi=\alpha$.

Example

$X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)$, both parameters unknown.

$H_0:\mu\leq\mu_0$, and $H_1:\mu>\mu_0$.

Find the uniformly most powerful test.

Example

$X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)$, both parameters unknown.

$H_0:\mu\leq\mu_0$, and $H_1:\mu>\mu_0$.

By the preceding theorem, we should focus on likelihood ratio tests.

From the normal distribution follows: \[ \lambda(X) = \frac {\max_{\sigma^2>0}(2\pi\sigma^2)^{-n/2}\exp[-\sum(x_i-\mu_0)^2/2\sigma^2]} {\max_{\mu>\mu_0; \sigma^2>0}(2\pi\sigma^2)^{-n/2}\exp[-\sum(x_i-\mu)^2/2\sigma^2]} \]

We need $\hat\mu$ and $\hat\sigma^2$ that attain the maxima to continue.

Example

$X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)$, both parameters unknown. Take derivatives of $\log\mathcal L$.

\[ \frac{\partial\log\mathcal L}{\partial\mu} = -\frac12\cdot\sum-2(x_i-\mu)/\sigma^2 = \frac{1}{\sigma^2}\left(\sum x_i-n\mu\right) \\ \frac{\partial\log\mathcal L}{\partial\sigma^2} = -\frac12\left[ \frac{n}{\sigma^2} + (-1)\frac{\sum(x_i-\mu)^2}{(\sigma^2)^2} \right] \]

From $\partial/\partial\mu=0$ follows $\hat\mu=\overline x$ maximizes the likelihood, regardless of $\hat\sigma^2$. If $\mu\geq\mu_0$, the maximizer is $\hat\mu_0=\max(\overline x, \mu_0)$.

From $\partial/\partial\sigma^2=0$ follows $\hat\sigma^2=\sum(x_i-\hat\mu)^2/n$ maximizes the likelihood.

Example

$X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)$, both parameters unknown.

Since $\sum(x_i-\hat\mu)^2/\hat\sigma^2 = \sum(x_i-\hat\mu)^2/(\sum(x_i-\hat\mu)^2/n)$ the likelihood ratio is \[ \lambda(x) = \frac{(2\pi\hat\sigma_0^2)^{-n/2}e^{-n/2}} {(2\pi\hat\sigma^2)^{-n/2}e^{-n/2}} = \left(\frac{\hat\sigma_0^2}{\hat\sigma^2}\right)^{-n/2} \]

It is enough to compare $\hat\sigma^2/\hat\sigma_0^2$ to a cutoff constant. If $\lambda(x)$ is small, then $\hat\sigma_0^2/\hat\sigma^2$ is large, so our test depends on whether $\hat\sigma_0^2/\hat\sigma^2>k''$.

To pick our cutoff $k''$ we need to be able to calculate $\mathbb{P}(\lambda(x)>k'')$.

Example

$X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)$, both parameters unknown. Since \[ \sum(x_i-\mu_0)^2 = \sum[(x_i-\overline x)+(\overline x-\mu_0)]^2 = \\ \sum (x_i-\overline x)^2 + n(\overline x-\mu_0)^2 \\ k'' < \frac{\hat\sigma_0^2}{\hat\sigma^2} = \frac{\sum (x_i-\overline x)^2 + n(\overline x-\mu_0)^2} {\sum(x_i-\overline x)^2}= \\ 1+\frac{n(\overline x-\mu_0)^2}{\sum (x_i-\overline x)^2} \qquad\text{ is equivalent to} \\ \frac{n(\overline x-\mu_0)^2}{\sum (x_i-\overline x)^2} > k''-1=k' \]

Example

$X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)$, both parameters unknown.

Introduce a factor of $(n-1)$ to get $\sum(x_i-\overline x)^2$ to the sample variance: \[ \frac{n(\overline x-\mu_0)^2}{\sum(x_i-\overline x)^2/(n-1)} > (n-1)k' \\ \sqrt{ \frac{n(\overline x-\mu_0)^2}{\sum(x_i-\overline x)^2/(n-1)}} > \sqrt{(n-1)k'} = k \\ \text{So the test we derived consists of comparing} \\ \frac{\overline x-\mu_0}{S/\sqrt{n}} > k \]

Example

Theorem (A. Student [W. Gosset]) If $X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)$ then \[ \frac{\overline x-\mu_0}{S/\sqrt{n}}\sim T(n-1) \]

The likelihood ratio test for normal variables with unknown mean and variance is the $t$-test.

This construction fits with the uniformly most powerful test setup: $(\overline x-\mu_0)/(S/\sqrt{n})$ is a monotonic statistic: if $\overline x$ increases (but the independent $S^2$ stays unchanged), then the statistic increases.

Example

$X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)$, with known variance $\sigma^2$. $H_0:\mu=\mu_0$ and $H_1:\mu>\mu_0$. Here the null hypothesis is simple, the alternative is composite.

The likelihood ratio is \[ \lambda(x) = \frac{(2\pi\sigma^2)^{-n/2}\exp\left[-\sum(x_i-\mu_0)^2/2\sigma^2\right]} {\max_{\mu>\mu_0}(2\pi\sigma^2)^{-n/2}\exp\left[-\sum(x_i-\mu)^2/2\sigma^2\right]} = \\ \exp\left[ \sum(x_i-\hat\mu)^2-\sum(x_i-\mu_0)^2 \right] \] where $\hat\mu=\max(\mu_0,\overline x)$ like before.

Example

$X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)$, with known variance $\sigma^2$.

By the same maneuver as before, we can rewrite \[ \log\lambda(x) = \sum((x_i-\overline x)^2 + (\overline x-\mu_0))^2 = \sum(x_i-\overline x)^2 + n(\overline x-\mu_0)^2 \]

Our test would reject when $\log\lambda(x) < k'$. Divide through by $\sigma^2$: \[ n-1 + n(n-1)\frac{(\overline x-\mu_0)^2}{\sigma^2} < k'/\sigma^2 \]

Rearrange and take a square root, we reject when \[ \frac{\overline x-\mu_0}{\sigma/\sqrt{n}} < \left(\frac{k'}{\sigma^2}+1-n\right)\Big/(n-1) = k \]

Large sample likelihood ratios

In both the preceding examples, the most difficult step was to find a distribution for $\lambda(X)$ so that we could choose a cutoff value $k$.

Theorem Let $X_1,\dots,X_n$ be iid. Let $r_0$ be the number of free parameters specified by $H_0:\theta\in\Omega_0$ and let $r$ be the number of free parameters specified by $\theta\in\Omega$. Then for large $n$, the random value $-2\log\lambda(X)\sim\chi^2(r_0-r)$.