Recall:
Ancillary statistics contain no information about a parameter.
Sufficient statistics contain all information about a parameter.
Minimally sufficient statistics contain just enough information about a parameter.
14 February, 2018
Recall:
Ancillary statistics contain no information about a parameter.
Sufficient statistics contain all information about a parameter.
Minimally sufficient statistics contain just enough information about a parameter.
A reasonable guess could be that minimally sufficient statistics are independent of ancillary statistics.
Example Let \(X_1, \dots, X_n\) be iid from a uniform distribution on the range \(\{\theta, \theta+1\}\).
The order statistics \(X_{(1)}, \dots, X_{(n)}\) of a sample consists of the values in the sample in sorted order. So \(X_{(1)}=\min(X)\); max, median, quartiles etc are all order statistics.
Let \(R=X_{(n)}-X_{(1)}\) be the range and \(M=(X_{(1)}+X_{(n)})/2\) be the midpoint.
Example \(X_1, \dots, X_n\sim\text{Uniform}([\theta,\theta+1])\) iid. \(R=X_{(n)}-X_{(1)}\) and \(M=(X_{(1)}+X_{(n)})/2\).
Claim \(R\) is ancillary for \(\theta\).
The joint PDF for \(X_1, \dots, X_n\) is \[ p(x_1,\dots,x_n|\theta) = \begin{cases} 1 & \text{if } \theta < x_{(1)}, x_{(n)} < \theta+1 \\ 0 & \text{otherwise} \end{cases} \]
Example \(X_1, \dots, X_n\sim\text{Uniform}([\theta,\theta+1])\) iid. \(R=X_{(n)}-X_{(1)}\) and \(M=(X_{(1)}+X_{(n)})/2\).
Claim \(R\) is ancillary for \(\theta\).
The joint PDF for \(X_{(1)}, X_{(n)}\) is \[ p(x_{(1)},x_{(n)}|\theta) = \begin{cases} n(n-1)(x_{(n)}-x_{(1)})^{n-2} & \text{if } \theta < x_{(1)}, x_{(n)} < \theta+1 \\ 0 & \text{otherwise} \end{cases} \]
because over a range of size \(x_{(n)}-x_{(1)}\) we need to sample \(n-2\) additional observations in any order. The \(x_{(n)}\) and \(x_{(1)}\) sort into the sequence of other samples in \(n(n-1)\) different ways.
Example \(X_1, \dots, X_n\sim\text{Uniform}([\theta,\theta+1])\) iid. \(R=X_{(n)}-X_{(1)}\) and \(M=(X_{(1)}+X_{(n)})/2\).
Claim \(R\) is ancillary for \(\theta\).
The joint PDF for \(R,M\) is \[ p(r,m|\theta) = \begin{cases} n(n-1)r^{n-2} & \text{if } \theta < m-r/2, m+r/2 < \theta+1 \\ 0 & \text{otherwise} \end{cases} \]
Example \(X_1, \dots, X_n\sim\text{Uniform}([\theta,\theta+1])\) iid. \(R=X_{(n)}-X_{(1)}\) and \(M=(X_{(1)}+X_{(n)})/2\).
Claim \(R\) is ancillary for \(\theta\).
The marginal PDF for \(R\) is \[ p(r|\theta) = \int_{\theta+r/2}^{\theta+1-r/2}n(n-1)r^{n-2}dm = n(n-1)r^{n-2}(1-r) \]
This does not depend on \(\theta\), so \(R\) is ancillary.
Example \(X_1, \dots, X_n\sim\text{Uniform}([\theta,\theta+1])\) iid. \(R=X_{(n)}-X_{(1)}\) and \(M=(X_{(1)}+X_{(n)})/2\).
Claim \(R, M\) is a sufficient statistic.
Notice that the joint PDF for the sample is
\[ p(x_1,\dots,x_n|\theta) = \begin{cases} 1 & \text{if } \theta < x_{(1)}, x_{(n)} < \theta+1 \\ 0 & \text{otherwise} \end{cases} \]
which only depends on the sample at all through \(x_{(1)}=M-R/2\) and \(x_{(n)}=M+R/2\).
Example \(X_1, \dots, X_n\sim\text{Uniform}([\theta,\theta+1])\) iid. \(R=X_{(n)}-X_{(1)}\) and \(M=(X_{(1)}+X_{(n)})/2\).
Claim \(R, M\) is a minimally sufficient statistic.
For any two samples \(x_1,\dots,x_n\) and \(y_1,\dots,y_n\), the likelihood ratio will be well defined and non-zero for the same values only when \(x_{(1)}=y_{(1)}\) and \(x_{(n)}=y_{(n)}\).
In conclusion:
So when are ancillary and minimally sufficient statistics independent?
The answer lies in completeness of a statistic (or equivalently of a family of distributions for that statistic).
Definition \(T\) is a complete statistic if for any \(g\) such that \(\mathbb{E}[g(T)|\theta]=0\) for all \(\theta\), the function value \(g(T)\) is 0 (almost always).
Example Let \(T\sim\text{Binomial}(n,p)\), and let \(g\) be a function with \(\mathbb{E}[g(T)|p]=0\) for all \(p\).
Then \[ 0 = \mathbb{E}[g(T)|p] = \sum_{t=0}^n g(t){n\choose t}p^t(1-p)^{n-t} = \\ (1-p)^n\sum g(t){n\choose t}\left(\frac{p}{1-p}\right)^t \]
Set \(r=p/(1-p)\). Then the last sum is a polynomial of degree \(n\) in \(r\). If the polynomial vanishes for all \(r\), all coefficients are 0. (Almost) surely, \(T\) is one of \(0, \dots, n\), so \(g(T)=0\).
Hence \(T\) is a complete statistic.
Theorem If \(T\) is complete and sufficient, \(T\) is minimally sufficient.
Proof Let \(T'\) be a minimally sufficient statistic. Then there is some function \(f\) s.t. \(T'=f(T)\).
Define \(g(T')=\mathbb{E}_\theta[T|T']\). Because \(T'\) is sufficient, \(g(T')\) is independent of \(\theta\).
\(\mathbb{E}_\theta[\mathbb{E}_\theta[T|T']]=\mathbb{E}[T|\theta]\) by smoothing, so \(\mathbb{E}_\theta[T-g(T')]=0\) for all \(\theta\). This is \(\mathbb{E}_\theta[T-g(f(T))]\), so by completeness, \(T-g(T')=0\) so \(T=g(T')\) (almost always).
So \(T\) is the function image of a minimally sufficient statistic, so minimally sufficient.
\(X_1,\dots,X_n\sim\text{Uniform}(0,\theta)\) and \(T(X)=\max_i X_i\).
Claim \(T\) is a complete statistic.
The PDF of \(T\) is \[ p(t|\theta)= \begin{cases} nt^{n-1}\theta^{-n} & \text{if } 0 < t < \theta \\ 0 & \text{otherwise} \end{cases} \]
\(X_1,\dots,X_n\sim\text{Uniform}(0,\theta)\) and \(T(X)=\max_i X_i\).
Claim \(T\) is a complete statistic…
Suppose \(g(t)\) is a function s.t. \(\mathbb{E}_\theta g(T)=0\) for all \(\theta\). Then \[ 0 = \frac{d}{d\theta}\mathbb{E}_\theta g(T) = \int_0^\theta g(t)nt^{n-1}\theta^{-n}dt = \\ n\theta^{-n}\frac{d}{d\theta}\int_0^\theta g(t)t^{n-1}dt = n\theta^{-n}g(\theta)\theta^{n-1} = \\ ng(\theta)/\theta \]
Since \(n/\theta\neq0\) instead \(g(\theta)=0\).
\(X_1,\dots,X_n\sim\text{Uniform}(0,\theta)\) and \(T(X)=\max_i X_i\).
Claim \(T\) is a sufficient statistic.
The Likelihood function is \[ \mathcal{L}(\theta|x) = \prod\mathbb{1}_{(0,\theta)}(x_i)/\theta = \color{green}{\theta^{-n}} \color{blue}{\mathbb{1}_{(0,\theta)}(\min x_i)}\cdot \color{green}{\mathbb{1}_{(0,\theta)}(T)} \]
From the factorization theorem, sufficiency follows.
Since \(T\) is sufficient and complete, \(T\) is minimally sufficient.
An exponential family with pdf \[ p(x|\theta)\sim\exp\left[\eta(\theta)^T T(x)+A(\theta)+B(x)\right] \] has full rank if the interior of \(\eta(\Omega)\) of possible vectors and \(T\) has full rank as a matrix.
Theorem A statistic \(T\) defining an exponential family of full rank is complete.
Theorem If \(T\) is a complete and minimally sufficient statistic, then \(T\) is independent of every ancillary statistic. In other words, if \(S\) is ancillary, then \(T\) and \(S\) are conditionally independent: \[ p(T,S|\theta) = p(T|\theta)p(S|\theta) \]
Proof Since \(S\) is ancillary \(\mathbb{P}(S=s)\) doesn't depend on \(\theta\). Since \(T\) is sufficient and \(S\) depends on \(X\), \(\mathbb{P}(S=s|T=t)\) doesn't depend on \(\theta\).
To show independence it suffices to show \(\mathbb{P}(s|t) = \mathbb{P}(s)\).
Theorem If \(T\) is complete and minimally sufficient, \(S\) ancillary, then \(T\) and \(V\) are conditionally independent.
Proof… we need to show \(\mathbb{P}(s|t)=\mathbb{P}(s)\), both are (irrelevantly) conditional on \(\theta\).
By smoothing \[ \mathbb{P}(s) = \sum_t \mathbb{P}(s|t)\mathbb{P}(t|\theta) \]
Define \(g(t)=\mathbb{P}(s|t)-\mathbb{P}(s)\). Then \[ \mathbb{E}_\theta g(T) = \sum_tg(t)\mathbb{P}_\theta(t) = \sum_t\left(\mathbb{P}(s|t)\mathbb{P}_\theta(t) - \mathbb{P}(s)\mathbb{P}_\theta(t)\right) = \\ \sum_t(\mathbb{P}(s)-\mathbb{P}(s)) = 0 \]
So because \(T\) is complete, \(g(t)=0\) so \(\mathbb{P}(s|t) = \mathbb{P}(s)\).
\(X_1,\dots,X_n\sim\mathcal{N}(\mu,\sigma^2)\) and consider \(\sigma\) fixed. Write the sample mean as \(\overline x\). The joint densities for samples is a full rank exponential family:
\[ p(x_1,\dots,x_n|\mu) = \exp\left[ \color{blue}{n\mu\sigma^{-2}}\cdot \color{green}{\overline x} \color{orange}{-n\mu^2\sigma^{-2}/2}+ \color{black}{-\sigma^{-2}\sum x_i^2} \right] \]
Since full rank, \(\overline x\) is complete. Since factorizable, sufficient, therefore minimally sufficient.
\(X_1,\dots,X_n\sim\mathcal{N}(\mu,\sigma^2)\) and consider \(\sigma\) fixed. The sample mean \(\overline x\) is minimally sufficient and complete. Write \(S^2=\left(\sum (x_i-\overline x)^2\right)/(n-1)\) for the sample variance.
Claim \(S^2\) is ancillary.
Let \(Y_i=X_i-\mu\). Then \(\mathbb{P}_\mu(Y_i\leq y) = \mathbb{P}_\mu(X_i\leq y+\mu) =\) \[ \int_{-\infty}^{y+\mu}\exp\left[-\frac{1}{2\sigma^2}(x-\mu)^2\right]\frac{dx}{\sqrt{2\pi\sigma^2}} = \\ \int_{-\infty}^y \exp\left[-\frac{1}{2\sigma^2}u^2\right]\frac{du}{\sqrt{2\pi\sigma^2}} \] The latter integrand is the density of \(\mathcal{N}(0,\sigma^2)\).
\(X_1,\dots,X_n\sim\mathcal{N}(\mu,\sigma^2)\) and consider \(\sigma\) fixed. The sample mean \(\overline x\) is minimally sufficient and complete. Write \(S^2=\left(\sum (x_i-\overline x)^2\right)/(n-1)\) for the sample variance.
Claim \(S^2\) is ancillary.
Since \(Y_i=X_i-\mu\), it follows \(\overline Y = \overline X - \mu\). So \(Y_i-\overline Y = (X_i-\mu) - (\overline X-\mu) = X_i-\overline X\).
Zero-centering doesn't change the sample variance.
This shows that \(S^2\) has a distribution independent of \(\mu\): \[ S^2 = \frac{1}{n-1}\sum (X_i-\overline X)^2 = \frac{1}{n-1}\sum (Y_i-\overline Y)^2 \sim \mathcal{N}(0,\sigma^2) \]
\(X_1,\dots,X_n\sim\mathcal{N}(\mu,\sigma^2)\) and consider \(\sigma\) fixed. The sample mean \(\overline x\) is minimally sufficient and complete. Write \(S^2=\left(\sum (x_i-\overline x)^2\right)/(n-1)\) for the sample variance.
\(S^2\) is ancillary. By Basu's theorem \(\overline x\) and \(S^2\) are independent.
Theorem Suppose \(U\) is sufficient for \(\theta\) and both \(g_1(U)\) and \(g_2(U)\) are unbiased estimators of \(\theta\).
If \(U\) is complete then \(g_1(U)=g_2(U)\)
Theorem Suppose \(U\) is sufficient for \(\theta\) and both \(g_1(U)\) and \(g_2(U)\) are unbiased estimators of \(\theta\).
If \(U\) is complete then \(g_1(U)=g_2(U)\)
Proof Let \(g(U)=g_1(U)-g_2(U)\). Then \[ \mathbb{E}_\theta g(U) = \mathbb{E}_\theta g_1(U) - \mathbb{E}_\theta g_2(U) = \theta - \theta = 0 \]
By completeness, \(g(U)=0\) so \(g_1(U)=g_2(U)\).