5 March, 2018

Confidence intervals

An estimator \(\delta\) of a parameter \(g(\theta)\) provides no information on precision of the estimate.

Definition The random interval \((\delta_0,\delta_1)\) determined by a pair of statistics \(\delta_0,\delta_1\) is a \(1-\alpha\) confidence interval for \(g(\theta)\) if \[ \mathbb P(\delta_0<g(\theta)<\delta_1 | \theta) \geq 1-\alpha \qquad\forall\theta\in\Omega \]

A random set \(S(X)\) constructed from data is called a \(1-\alpha\) confidence region for \(g(\theta)\) if \[ \mathbb P(g(\theta)\in S)\geq1-\alpha \qquad\forall\theta\in\Omega \]

The confidence interval is a connected 1-dimensional analogy for a confidence region.

Confidence intervals

A confidence interval (region) may be called an exact confidence interval if \(\mathbb P(g(\theta)\in S) = 1-\alpha\).

Confidence intervals are not unique – picking what features are important for a confidence interval depends on the application.

Example

\(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\). We know \(\overline X\) to be an unbiased estimator of \(\mu\).

\[ \overline X \sim \mathcal N(\mu,\sigma^2/n) \qquad Z = \frac{\overline X-\mu}{\sigma/\sqrt{n}} \sim \mathcal N(0,1) \] \(Z\) is called the z-score of \(\overline X\).

Pick \(z\) to capture 95% of the standard normal distribution

Example

\(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\).

With probability 95%, \[ -z < \frac{\overline X-\mu}{\sigma/\sqrt{n}} < z \\ -z\sigma/\sqrt{n} < \overline X-\mu < z\sigma/\sqrt{n} \\ -\overline X-z\sigma/\sqrt{n} < -\mu < -\overline X+z\sigma/\sqrt{n} \\ \overline X+z\sigma/\sqrt{n} > \mu > \overline X-z\sigma/\sqrt{n} \]

Example

We could alternatively pick \(u\) and \(v\) asymmetrically

With probability 95%, \[ u < \frac{\overline X-\mu}{\sigma/\sqrt{n}} < v \\ \overline X-v\sigma/\sqrt{n} < \mu < \overline X-u\sigma/\sqrt{n} \]

Example

Example

How to create confidence intervals

A random variable \(T\) is a pivot with respect to a statistic \(\theta\) if its distribution does not depend on \(\theta\).

Note that \(T\) is allowed to depend on \(\theta\) – important here is that the distribution does not.

Example \(Z=(\overline X-\mu)/(\sigma/\sqrt{n})\sim\mathcal N(0,1)\) is a pivot for both \(\mu\) and \(\sigma^2\), for a normal sample \(X\).

How to create confidence intervals

A random variable \(T\) is a pivot with respect to a statistic \(\theta\) if its distribution does not depend on \(\theta\).

We can use a pivot that only involves a single unknown parameter to create confidence intervals: Pick \(\epsilon_\ell, \epsilon_u\) such that \(\epsilon_\ell+\epsilon_u = \alpha\). Then, writing \(F\) for the cumulative distribution function, \[ \mathbb P(F^{-1}(\epsilon_\ell) < T < F^{-1}(1-\epsilon_u)) = 1-\alpha \] Solve for \(\theta\) in the two inequalities \[ F^{-1}(\epsilon_\ell) < T \qquad\qquad T < F^{-1}(1-\epsilon_u) \] produces the confidence interval we seek.

Example

\(Y\sim\text{Exponential}(\theta)\) so that \(p(y)=\frac{e^{-y/\theta}}{\theta}\) for non-negative \(y\).

\(Y/\theta\sim\text{Exponential}(1)\) is a pivot.

For a confidence interval with symmetric tails, we could pick \(\ell\) and \(u\) so that \(\int_0^\ell e^{-x}dx=0.05\) and \(\int_{u}^\infty e^{-x}dx = 0.05\). We get \[ 1-e^{-\ell} = 0.05 \qquad\qquad e^{-u} = 0.05 \\ \ell = -\log(1-0.05) \qquad\qquad u = -\log(0.05) \\ \ell \approx 0.051 \qquad\qquad u \approx 2.996 \\ \frac{Y}{u} < \theta < \frac{Y}{\ell} \]

Example

\(X\sim\mathcal N(\mu,1)\) a single observation from a normal distribution.

Find a 95% confidence interval for \(\mu\).

Example

\(X\sim\mathcal N(\mu,1)\) a single observation from a normal distribution. Find a 95% confidence interval for \(\mu\).

Since \(X\sim\mathcal N(\mu,1)\) it follows \(X-\mu\sim\mathcal N(0,1)\) is a pivot. Write \(z\) for the 97.5th percentile of the standard normal (ie \(z=\text{CDF}(0.95)^{-1})\). Then with probability 95%, \(X-\mu\in[-z,z]\).

So \[ -z < X-\mu < z \\ -z < \mu-X < z \\ X-z < \mu < X+z \]

Large-sample confidence intervals

For large enough samples – 40 is a commonly used threshold – the normal approximation in the central limit theorem is good to within a few percentage units or better.

This provides a range of tests and confidence intervals from one common source. Write \(z=\text{CDF}_{\mathcal N(0,1)}^{-1}(1-\alpha/2)\), then \[ \hat\theta\sim\mathcal N(\theta,\sigma_{\hat\theta}^2) \\ \frac{\hat\theta-\theta}{\sigma_{\hat\theta}^2} \sim \mathcal N(0,1) \\ \hat\theta-z\sigma_{\hat\theta} < \theta < \hat\theta+z\sigma_{\hat\theta} \]

Large-sample confidence intervals: one-sample mean

Suppose \(X_1,\dots,X_n\sim \mathcal D\) with mean \(\mu\) and variance \(\sigma^2\), and \(z=\text{CDF}_{\mathcal N(0,1)}^{-1}(1-\alpha/2)\). The sample mean \[ \overline X = \frac1n\sum X_i \sim \mathcal N(\mu,\sigma^2/n) \]

So \[ \overline X-z\sigma/\sqrt{n} < \mu < \overline X+z\sigma/\sqrt{n} \]

Large-sample confidence intervals: one-sample proportion

Suppose \(X\sim\text{Binomial}(n,p)\), and \(z=\text{CDF}_{\mathcal N(0,1)}^{-1}(1-\alpha/2)\). Write \(q=1-p\). Recall the binomial distribution has mean \(np\) and variance \(npq\).

\[ \hat p=\frac{X}{n}\sim\mathcal N(p,pq/n) \\ \hat p-z\sqrt{\frac{pq}{n}} <p< \hat p+z\sqrt{\frac{pq}{n}} \]

But we don't know \(p\)! We could pretend \(\hat p\) is good enough (often is).

Large-sample confidence intervals: one-sample proportion

Suppose \(X\sim\text{Binomial}(n,p)\), and \(z=\text{CDF}_{\mathcal N(0,1)}^{-1}(1-\alpha/2)\). \(\hat p=X/n\) and \(q=1-p\).

\[ \hat p-z\sqrt{\frac{pq}{n}} <p< \hat p+z\sqrt{\frac{pq}{n}} \]

We can bound \(pq \leq 1/4\). Therefore \[ \hat p - z \frac{1}{2\sqrt{n}} \leq \hat p - z\sqrt{\frac{pq}{n}} < p < \hat p + z\sqrt{\frac{pq}{n}} \leq \hat p + z \frac{1}{2\sqrt{n}} \]

Large-sample confidence intervals: two-sample mean

Suppose \(X_1,\dots,X_n\sim\mathcal D_X\) with mean and variance \(\mu_X,\sigma_X^2\); \(Y_1,\dots,Y_m\sim\mathcal D_Y\), with mean and variance \(\mu_Y,\sigma_Y^2\), and \(z=\text{CDF}_{\mathcal N(0,1)}^{-1}(1-\alpha/2)\).

\[ \sum X_i \sim\mathcal N(n\mu_X, n\sigma_X^2) \qquad \sum Y_j \sim\mathcal N(m\mu_Y, m\sigma_Y^2) \\ \overline X \sim\mathcal N(\mu_X, \sigma_X^2/n) \qquad \overline Y \sim\mathcal N(\mu_Y, \sigma_Y^2/m) \\ \overline X-\overline Y\sim \mathcal N\left(\mu_X-\mu_Y, \frac{\sigma_X^2}{n} + \frac{\sigma_Y^2}{m}\right) \\ \overline X-\overline Y-z\sqrt{\frac{\sigma_X^2}{n} + \frac{\sigma_Y^2}{m}} < \mu_X-\mu_Y < \overline X-\overline Y+z\sqrt{\frac{\sigma_X^2}{n} + \frac{\sigma_Y^2}{m}} \]

Large-sample confidence intervals: two-sample proportion

Suppose \(X\sim\text{Binomial}(n,p_1)\), \(Y\sim\text{Binomial}(m,p_2)\), and \(z=\text{CDF}_{\mathcal N(0,1)}^{-1}(1-\alpha/2)\). Write \(q_i=1-p_i\).

\[ \hat p_1 = \frac{X}{n} \sim \mathcal N(p_1,p_1q_1/n) \qquad \hat p_2 = \frac{Y}{m} \sim \mathcal N(p_2,p_2q_2/n) \\ \Delta\hat p = \hat p_1 -\hat p_2 \sim \mathcal N\left( p_1-p_2, \frac{p_1q_1}{n} + \frac{p_2q_2}{m} \right) \\ \Delta\hat p - z\sqrt{\frac{p_1q_1}{n} + \frac{p_2q_2}{m}} < p_1-p_2 < \Delta\hat p + z\sqrt{\frac{p_1q_1}{n} + \frac{p_2q_2}{m}} \]

The same substitution \(p_iq_i\leq1/4\) works here.

Estimating variance for a proportion

Let's use a simulation to see how well our two choices for estimating \(\mathbb VX\) for \(X\sim\text{Binomial}(n,p)\) works.

Confidence intervals for \(\sigma^2\)

If \(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\) then \[ \frac{X_i-\mu}{\sigma}\sim\mathcal N(0,1) \qquad \sum\frac{(X_i-\mu)^2}{\sigma^2} \sim\chi^2(n) \] so if we know \(\mu\) and write \(\chi^2_{\alpha}=\text{CDF}_{\chi^2(n)}^{-1}(\alpha)\), then \[ \chi^2_{\alpha/2} < \frac{\sum(X_i-\mu)^2}{\sigma^2} <\chi^2_{1-\alpha/2} \\ \sigma^2\chi^2_{\alpha/2} < \sum(X_i-\mu)^2 < \sigma^2\chi^2_{1-\alpha/2} \\ \frac{\sum(X_i-\mu)^2}{\chi^2_{1-\alpha/2}} <\sigma^2< \frac{\sum(X_i-\mu)^2}{\chi^2_{\alpha/2}} \]

Confidence intervals for \(\sigma^2\)

\(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\).

Theorem \[ \frac{\sum(X_i-\overline X)^2}{\sigma^2} \sim\chi^2(n-1) \]

Proof For the case \(n=2\):

\[ \left(X_1 - \frac{X_1+X_2}{2}\right)^2 + \left(X_2 - \frac{X_1+X_2}{2}\right)^2 = \\ \left(\frac{X_1-X_2}{2}\right)^2 + \left(-\frac{X_1-X_2}{2}\right)^2 = \\ \frac{(X_1-X_2)^2}{2} \]

Confidence intervals for \(\sigma^2\)

Theorem \(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\). \(T=\sum(X_i-\overline X)^2/\sigma^2\sim\chi^2(n-1)\).

Proof… For \(n=2\), \(T=(X_1-X_2)^2/2\sigma^2\).

Note \(X_1-X_2\sim\mathcal N(0,2\sigma^2)\) so \[ \frac{(X_1-X_2)}{\sqrt{2\sigma^2}} \sim\mathcal N(0,1) \] and its square is a squared standard normal variable, so has a \(\chi^2(1)\) distribution.

Confidence intervals for \(\sigma^2\)

Theorem \(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\). \(T=\sum(X_i-\overline X)^2/\sigma^2\sim\chi^2(n-1)\).

Proof… In general, \[ \sum X_i^2 = n\overline X^2 + \left(\color{green}{\frac{X_1-X_2}{\sqrt{2}}}\right)^2 + \left(\color{green}{\frac{X_1+X_2-2X_3}{\sqrt{2\cdot 3}}}\right)^2 + \dots+ \\ \left(\color{green}{\frac{X_1+\dots+X_{n-1}-(n-1)X_n}{\sqrt{(n(n-1)}}}\right)^2 \]

Write \(U_i\) for the green parts. Then these are independent and normal, and \(\sum(X_i-\overline X)^2=\sum X_i^2-n\overline X^2=\sum U_i^2\).

Confidence intervals for \(\sigma^2\)

Theorem \(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\). \(T=\sum(X_i-\overline X)^2/\sigma^2\sim\chi^2(n-1)\).

Proof… Write \[ U_i = \frac{X_1+X_2+\dots+X_{i-1}-(i-1)X_i}{\sqrt{i(i-1)}} \\ X_1+\dots+X_{i-1}-(i-1)X_i\sim \mathcal N\left(0, (i-1)\sigma^2+(i-1)^2\sigma^2 \right) = \\ \mathcal N\left(0, i(i-1)\sigma^2 \right) \] So therefore \(U_i\sim\mathcal N(0,\sigma^2)\), and \[ \frac{\sum_{i=1}^n(X_i-\overline X)^2}{\sigma^2} = \sum_{i=1}^{n-1}\left(\frac{U_i}{\sigma}\right)^2 \sim\chi^2(n-1) \]

Confidence intervals for \(\sigma^2\)

\(X_1,\dots,X_n\sim\mathcal N(\mu,\sigma^2)\).

Write \(\chi^2_\alpha = \text{CDF}_{\chi^2(n-1)}^{-1}(\alpha)\). Then \[ \frac{(n-1)S^2}{\sigma^2} \sim \chi^2(n-1) \\ \chi^2_{\alpha/2} < \frac{(n-1)S^2}{\sigma^2} < \chi^2_{1-\alpha/2} \\ \frac{(n-1)S^2}{\chi^2_{1-\alpha/2}} <\sigma^2< \frac{(n-1)S^2}{\chi^2_{\alpha/2}} \]

Here, we pick a confidence interval with symmetric tails. A shortest possible is difficult to find.

Estimation vs prediction

Closely related to confidence intervals are prediction intervals: a probabilistic model can be used for making predictions, and a prediction then given together with an interval that with high likelihood contains the next value.

Normal Prediction: known \(\mu\) and \(\sigma^2\)

\(X\sim\mathcal N(\mu,\sigma^2)\). We seek \(\ell, u\) such that \[ \mathbb P(\ell < X < u) = 1-\alpha \]

Since we know both \(\mu\) and \(\sigma^2\), we can rewrite the inequalities to inequalities of a standard normal random variable. With \(z=\text{CDF}_{\mathcal N(0,1)}^{-1}(0.975)\),

\[ \ell < X < u \\ (\ell-\mu)/\sigma < (X-\mu)/\sigma < (u-\mu)/\sigma \\ \ell = \frac{\mu-z}{\sigma} \qquad\qquad u = \frac{\mu+z}{\sigma} \]

Normal Prediction: unknown \(\mu\)

When the mean is unknown, we can use a sample mean \(\overline X\) to estimate it. The sample mean is \(\overline X\sim\mathcal N(\mu,\sigma^2/n)\) whereas the next observation is \(X_{n+1}\sim\mathcal N(\mu, \sigma^2)\).

The difference is \(X_{n+1}-\overline X\sim\mathcal N(0,(1+1/n)\sigma^2)\), and so we get a pivot quantity \[ \frac{X_{n+1}-\overline X}{\sigma\sqrt{1+1/n}}\sim\mathcal N(0,1) \] producing a prediction interval, with \(z=\text{CDF}_{\mathcal N(0,1)}^{-1}(0.975)\), \[ \overline X-z\sqrt{1+\frac1n} < X_{n+1} < \overline X+z\sqrt{1+\frac1n} \]

Normal Prediction: unknown \(\sigma^2\)

Estimated variance; known or estimated mean: \[ \frac{(n-1)s^2}{\sigma^2} \sim \chi^2(n-1) \\ X_{n+1}-\mu \sim\mathcal N(0,\sigma^2) \qquad X_{n+1}-\overline X\sim\mathcal N(0, \sigma^2(1+1/n)) \]

Since the ratio \(\chi^2/\mathcal N\) follows a \(T\) distribution, we get \[ \frac{X_{n+1}-\mu}{S}\sim T(n-1)\qquad \frac{X_{n+1}-\overline X}{S\sqrt{1+1/n}}\sim T(n-1) \]

Write \(t = \text{CDF}_{T(n-1)}^{-1}(0.975)\), and the prediction intervals are \[ \mu-tS < X_{n+1} < \mu+tS \\ \overline X-tS\sqrt{1+1/n} < X_{n+1} < \overline X+tS\sqrt{1+1/n} \]

Non-parametric prediction intervals

Suppose, first, that \(X_1,X_2\) iid.

\[ \mathbb P(X_1>X_2) \]

Non-parametric prediction intervals

Suppose, first, that \(X_1,X_2\) iid.

\[ \mathbb P(X_1>X_2) = 1/2 \]

How about if \(X_1,\dots,X_n\) iid, all unique. Recall the order statistics \(X_{(1)} < \dots < X_{(n)}\).

\[ \mathbb P(X_1 = X_{(1)}) \]

Non-parametric prediction intervals

\(X_1,\dots,X_n\) iid, all unique. Recall the order statistics \(X_{(1)} < \dots < X_{(n)}\).

\[ \mathbb P(X_1 = X_{(1)}) = \frac{(n-1)!}{n!} = \frac{1}{n} \]

Similarily, \(\mathbb{P}(X_1=X_{(n)}) = 1/n\).

Non-parametric prediction intervals

\(X_1,\dots,X_n\) iid, all unique. Recall the order statistics \(X_{(1)} < \dots < X_{(n)}\).

\[ \mathbb P(X_1 = X_{(1)}) = \frac{(n-1)!}{n!} = \frac{1}{n} \]

Similarily, \(\mathbb{P}(X_1=X_{(k)}) = 1/n\) for any fixed \(k\).

So a \(\frac{n-1}{n+1}\) prediction interval for the next observation \(X_{n+1}\) is given by \(X_{(1)} < X_{n+1} < X_{(n)}\).

Non-parametric prediction intervals

\(X_1,\dots,X_n\) iid, all unique. Recall the order statistics \(X_{(1)} < \dots < X_{(n)}\).

A \((n+1-2k)/(n+1)\) prediction interval is given by \[ X_{(k)} < X_{n+1} < X_{(n-k+1)} \]

Requiring iid is more than this derivation needs: it is enough that every reordering is equally probable.