Lecture 5

14 February, 2018

Ancillary vs. Minimal sufficient

Recall:

Ancillary statistics contain no information about a parameter.

Sufficient statistics contain all information about a parameter.

Minimally sufficient statistics contain just enough information about a parameter.

Ancillary vs. Minimal sufficient

A reasonable guess could be that minimally sufficient statistics are independent of ancillary statistics.

Example Let \(X_1, \dots, X_n\) be iid from a uniform distribution on the range \(\{\theta, \theta+1\}\).

The order statistics \(X_{(1)}, \dots, X_{(n)}\) of a sample consists of the values in the sample in sorted order. So \(X_{(1)}=\min(X)\); max, median, quartiles etc are all order statistics.

Let \(R=X_{(n)}-X_{(1)}\) be the range and \(M=(X_{(1)}+X_{(n)})/2\) be the midpoint.

Ancillary vs. Minimal sufficient

Example \(X_1, \dots, X_n\sim\text{Uniform}([\theta,\theta+1])\) iid. \(R=X_{(n)}-X_{(1)}\) and \(M=(X_{(1)}+X_{(n)})/2\).

Claim \(R\) is ancillary for \(\theta\).

The joint PDF for \(X_1, \dots, X_n\) is \[ p(x_1,\dots,x_n|\theta) = \begin{cases} 1 & \text{if } \theta < x_{(1)}, x_{(n)} < \theta+1 \\ 0 & \text{otherwise} \end{cases} \]

Ancillary vs. Minimal sufficient

Example \(X_1, \dots, X_n\sim\text{Uniform}([\theta,\theta+1])\) iid. \(R=X_{(n)}-X_{(1)}\) and \(M=(X_{(1)}+X_{(n)})/2\).

Claim \(R\) is ancillary for \(\theta\).

The joint PDF for \(X_{(1)}, X_{(n)}\) is \[ p(x_{(1)},x_{(n)}|\theta) = \begin{cases} n(n-1)(x_{(n)}-x_{(1)})^{n-2} & \text{if } \theta < x_{(1)}, x_{(n)} < \theta+1 \\ 0 & \text{otherwise} \end{cases} \]

because over a range of size \(x_{(n)}-x_{(1)}\) we need to sample \(n-2\) additional observations in any order. The \(x_{(n)}\) and \(x_{(1)}\) sort into the sequence of other samples in \(n(n-1)\) different ways.

Ancillary vs. Minimal sufficient

Example \(X_1, \dots, X_n\sim\text{Uniform}([\theta,\theta+1])\) iid. \(R=X_{(n)}-X_{(1)}\) and \(M=(X_{(1)}+X_{(n)})/2\).

Claim \(R\) is ancillary for \(\theta\).

The joint PDF for \(R,M\) is \[ p(r,m|\theta) = \begin{cases} n(n-1)r^{n-2} & \text{if } \theta < m-r/2, m+r/2 < \theta+1 \\ 0 & \text{otherwise} \end{cases} \]

Ancillary vs. Minimal sufficient

Example \(X_1, \dots, X_n\sim\text{Uniform}([\theta,\theta+1])\) iid. \(R=X_{(n)}-X_{(1)}\) and \(M=(X_{(1)}+X_{(n)})/2\).

Claim \(R\) is ancillary for \(\theta\).

The marginal PDF for \(R\) is \[ p(r|\theta) = \int_{\theta+r/2}^{\theta+1-r/2}n(n-1)r^{n-2}dm = n(n-1)r^{n-2}(1-r) \]

This does not depend on \(\theta\), so \(R\) is ancillary.

Ancillary vs. Minimal sufficient

Example \(X_1, \dots, X_n\sim\text{Uniform}([\theta,\theta+1])\) iid. \(R=X_{(n)}-X_{(1)}\) and \(M=(X_{(1)}+X_{(n)})/2\).

Claim \(R, M\) is a sufficient statistic.

Notice that the joint PDF for the sample is

\[ p(x_1,\dots,x_n|\theta) = \begin{cases} 1 & \text{if } \theta < x_{(1)}, x_{(n)} < \theta+1 \\ 0 & \text{otherwise} \end{cases} \]

which only depends on the sample at all through \(x_{(1)}=M-R/2\) and \(x_{(n)}=M+R/2\).

Ancillary vs. Minimal sufficient

Example \(X_1, \dots, X_n\sim\text{Uniform}([\theta,\theta+1])\) iid. \(R=X_{(n)}-X_{(1)}\) and \(M=(X_{(1)}+X_{(n)})/2\).

Claim \(R, M\) is a minimally sufficient statistic.

For any two samples \(x_1,\dots,x_n\) and \(y_1,\dots,y_n\), the likelihood ratio will be well defined and non-zero for the same values only when \(x_{(1)}=y_{(1)}\) and \(x_{(n)}=y_{(n)}\).

Ancillary vs. Minimal sufficient

In conclusion:

range and midpoint together is a minimally sufficient statistic
range alone is ancillary.

So when are ancillary and minimally sufficient statistics independent?

The answer lies in completeness of a statistic (or equivalently of a family of distributions for that statistic).

Definition \(T\) is a complete statistic if for any \(g\) such that \(\mathbb{E}[g(T)|\theta]=0\) for all \(\theta\), the function value \(g(T)\) is 0 (almost always).

Complete statistics

Example Let \(T\sim\text{Binomial}(n,p)\), and let \(g\) be a function with \(\mathbb{E}[g(T)|p]=0\) for all \(p\).

Then \[ 0 = \mathbb{E}[g(T)|p] = \sum_{t=0}^n g(t){n\choose t}p^t(1-p)^{n-t} = \\ (1-p)^n\sum g(t){n\choose t}\left(\frac{p}{1-p}\right)^t \]

Set \(r=p/(1-p)\). Then the last sum is a polynomial of degree \(n\) in \(r\). If the polynomial vanishes for all \(r\), all coefficients are 0. (Almost) surely, \(T\) is one of \(0, \dots, n\), so \(g(T)=0\).

Hence \(T\) is a complete statistic.

Completeness and minimal sufficiency

Theorem If \(T\) is complete and sufficient, \(T\) is minimally sufficient.

Proof Let \(T'\) be a minimally sufficient statistic. Then there is some function \(f\) s.t. \(T'=f(T)\).

Define \(g(T')=\mathbb{E}_\theta[T|T']\). Because \(T'\) is sufficient, \(g(T')\) is independent of \(\theta\).

\(\mathbb{E}_\theta[\mathbb{E}_\theta[T|T']]=\mathbb{E}[T|\theta]\) by smoothing, so \(\mathbb{E}_\theta[T-g(T')]=0\) for all \(\theta\). This is \(\mathbb{E}_\theta[T-g(f(T))]\), so by completeness, \(T-g(T')=0\) so \(T=g(T')\) (almost always).

So \(T\) is the function image of a minimally sufficient statistic, so minimally sufficient.

Example

\(X_1,\dots,X_n\sim\text{Uniform}(0,\theta)\) and \(T(X)=\max_i X_i\).

Claim \(T\) is a complete statistic.

The PDF of \(T\) is \[ p(t|\theta)= \begin{cases} nt^{n-1}\theta^{-n} & \text{if } 0 < t < \theta \\ 0 & \text{otherwise} \end{cases} \]

Example

\(X_1,\dots,X_n\sim\text{Uniform}(0,\theta)\) and \(T(X)=\max_i X_i\).

Claim \(T\) is a complete statistic…

Suppose \(g(t)\) is a function s.t. \(\mathbb{E}_\theta g(T)=0\) for all \(\theta\). Then \[ 0 = \frac{d}{d\theta}\mathbb{E}_\theta g(T) = \int_0^\theta g(t)nt^{n-1}\theta^{-n}dt = \\ n\theta^{-n}\frac{d}{d\theta}\int_0^\theta g(t)t^{n-1}dt = n\theta^{-n}g(\theta)\theta^{n-1} = \\ ng(\theta)/\theta \]

Since \(n/\theta\neq0\) instead \(g(\theta)=0\).

Example

\(X_1,\dots,X_n\sim\text{Uniform}(0,\theta)\) and \(T(X)=\max_i X_i\).

Claim \(T\) is a sufficient statistic.

The Likelihood function is \[ \mathcal{L}(\theta|x) = \prod\mathbb{1}_{(0,\theta)}(x_i)/\theta = \color{green}{\theta^{-n}} \color{blue}{\mathbb{1}_{(0,\theta)}(\min x_i)}\cdot \color{green}{\mathbb{1}_{(0,\theta)}(T)} \]

From the factorization theorem, sufficiency follows.

Since \(T\) is sufficient and complete, \(T\) is minimally sufficient.

Exponential families

An exponential family with pdf \[ p(x|\theta)\sim\exp\left[\eta(\theta)^T T(x)+A(\theta)+B(x)\right] \] has full rank if the interior of \(\eta(\Omega)\) of possible vectors and \(T\) has full rank as a matrix.

Theorem A statistic \(T\) defining an exponential family of full rank is complete.

Basu's theorem

Theorem If \(T\) is a complete and minimally sufficient statistic, then \(T\) is independent of every ancillary statistic. In other words, if \(S\) is ancillary, then \(T\) and \(S\) are conditionally independent: \[ p(T,S|\theta) = p(T|\theta)p(S|\theta) \]

Proof Since \(S\) is ancillary \(\mathbb{P}(S=s)\) doesn't depend on \(\theta\). Since \(T\) is sufficient and \(S\) depends on \(X\), \(\mathbb{P}(S=s|T=t)\) doesn't depend on \(\theta\).

To show independence it suffices to show \(\mathbb{P}(s|t) = \mathbb{P}(s)\).

Basu's theorem

Theorem If \(T\) is complete and minimally sufficient, \(S\) ancillary, then \(T\) and \(V\) are conditionally independent.

Proof… we need to show \(\mathbb{P}(s|t)=\mathbb{P}(s)\), both are (irrelevantly) conditional on \(\theta\).

By smoothing \[ \mathbb{P}(s) = \sum_t \mathbb{P}(s|t)\mathbb{P}(t|\theta) \]

Define \(g(t)=\mathbb{P}(s|t)-\mathbb{P}(s)\). Then \[ \mathbb{E}_\theta g(T) = \sum_tg(t)\mathbb{P}_\theta(t) = \sum_t\left(\mathbb{P}(s|t)\mathbb{P}_\theta(t) - \mathbb{P}(s)\mathbb{P}_\theta(t)\right) = \\ \sum_t(\mathbb{P}(s)-\mathbb{P}(s)) = 0 \]

So because \(T\) is complete, \(g(t)=0\) so \(\mathbb{P}(s|t) = \mathbb{P}(s)\).

Example

\(X_1,\dots,X_n\sim\mathcal{N}(\mu,\sigma^2)\) and consider \(\sigma\) fixed. Write the sample mean as \(\overline x\). The joint densities for samples is a full rank exponential family:

\[ p(x_1,\dots,x_n|\mu) = \exp\left[ \color{blue}{n\mu\sigma^{-2}}\cdot \color{green}{\overline x} \color{orange}{-n\mu^2\sigma^{-2}/2}+ \color{black}{-\sigma^{-2}\sum x_i^2} \right] \]

Since full rank, \(\overline x\) is complete. Since factorizable, sufficient, therefore minimally sufficient.

Example

\(X_1,\dots,X_n\sim\mathcal{N}(\mu,\sigma^2)\) and consider \(\sigma\) fixed. The sample mean \(\overline x\) is minimally sufficient and complete. Write \(S^2=\left(\sum (x_i-\overline x)^2\right)/(n-1)\) for the sample variance.

Claim \(S^2\) is ancillary.

Let \(Y_i=X_i-\mu\). Then \(\mathbb{P}_\mu(Y_i\leq y) = \mathbb{P}_\mu(X_i\leq y+\mu) =\) \[ \int_{-\infty}^{y+\mu}\exp\left[-\frac{1}{2\sigma^2}(x-\mu)^2\right]\frac{dx}{\sqrt{2\pi\sigma^2}} = \\ \int_{-\infty}^y \exp\left[-\frac{1}{2\sigma^2}u^2\right]\frac{du}{\sqrt{2\pi\sigma^2}} \] The latter integrand is the density of \(\mathcal{N}(0,\sigma^2)\).

Example

\(X_1,\dots,X_n\sim\mathcal{N}(\mu,\sigma^2)\) and consider \(\sigma\) fixed. The sample mean \(\overline x\) is minimally sufficient and complete. Write \(S^2=\left(\sum (x_i-\overline x)^2\right)/(n-1)\) for the sample variance.

Claim \(S^2\) is ancillary.

Since \(Y_i=X_i-\mu\), it follows \(\overline Y = \overline X - \mu\). So \(Y_i-\overline Y = (X_i-\mu) - (\overline X-\mu) = X_i-\overline X\).

Zero-centering doesn't change the sample variance.

This shows that \(S^2\) has a distribution independent of \(\mu\): \[ S^2 = \frac{1}{n-1}\sum (X_i-\overline X)^2 = \frac{1}{n-1}\sum (Y_i-\overline Y)^2 \sim \mathcal{N}(0,\sigma^2) \]

Example

\(X_1,\dots,X_n\sim\mathcal{N}(\mu,\sigma^2)\) and consider \(\sigma\) fixed. The sample mean \(\overline x\) is minimally sufficient and complete. Write \(S^2=\left(\sum (x_i-\overline x)^2\right)/(n-1)\) for the sample variance.

\(S^2\) is ancillary. By Basu's theorem \(\overline x\) and \(S^2\) are independent.

Completeness implies uniqueness

Theorem Suppose \(U\) is sufficient for \(\theta\) and both \(g_1(U)\) and \(g_2(U)\) are unbiased estimators of \(\theta\).

If \(U\) is complete then \(g_1(U)=g_2(U)\)

Completeness implies uniqueness

Theorem Suppose \(U\) is sufficient for \(\theta\) and both \(g_1(U)\) and \(g_2(U)\) are unbiased estimators of \(\theta\).

If \(U\) is complete then \(g_1(U)=g_2(U)\)

Proof Let \(g(U)=g_1(U)-g_2(U)\). Then \[ \mathbb{E}_\theta g(U) = \mathbb{E}_\theta g_1(U) - \mathbb{E}_\theta g_2(U) = \theta - \theta = 0 \]

By completeness, \(g(U)=0\) so \(g_1(U)=g_2(U)\).