3.7:2

We are given the CDFs \(x^{t_i\theta}\). The corresponding densities are the derivatives \[ p_i(x) = \frac{d}{dx} x^{t_i\theta} = t_i\theta x^{t_i\theta-1} = t_i\theta x^{t_i\theta}/x \]

The joint density function is \[ \prod_i t_i\theta x_i^{t_i\theta}/x_i = \theta^n\prod\frac{t_i}{x_i}\left(\prod x_i^{t_i}\right)^\theta \]

The \(t_i\) are known and \(\theta\) is the only parameter here. The factor \(\prod\frac{t_i}{x_i}\) only depends on \(x\), while \(\theta^n\left(\prod x_i^{t_i}\right)^\theta\) depends on \(\theta\) and on \(\prod x_i^{t_i}\), making this last product a viable sufficient statistic.

3.7:4

The distribution in question can be described by \(p_1, p_2, p_3\), probabilities for the three outcomes. These probabilities are subject to \(p_1+p_2+p_3=1\), and hence \(p_3=1-p_1-p_2\). This last gives us a suggestion for how to create a sufficient statistic.

From the sample \(X_1,\dots,X_n\) let \(n_1, n_2, n_3\) be the number of occurrences of each outcome. The probability for a fixed outcome \(X_1=x_1, \dots, X_n=x_n\) is \[ \mathbb{P}(x_1,\dots,x_n|p_1,p_2,p_3) = p_1^{n_1}p_2^{n_2}p_3^{n_3} = p_1^{n_1}p_2^{n_2}(1-p_1-p_2)^{n-n_1-n_2} \]

This expression only depends on the two-dimensional statistic \((n_1,n_2)\) and the parameters, so we can set \(h(x)=1\) and let \(g\) in the factorization theorem be this probability. Since the conditional probability factors, the statistic is sufficient.

3.7:7

The likelihood is \[ \mathcal{L}(x|p) = \prod_i p_i^{x_i}(1-p_i)^{1-x_i} \]

The logarithm of the likelihood (the log-likelihood) is \[ \log\mathcal{L}(x|p) = \log\prod_i p_i^{x_i}(1-p_i)^{1-x_i} = \sum\left[ x_i\log p_i + (1-x_i)\log(1-p_i) \right] = \\ \sum\left[ (1-p_i) + x_i(\log p_i - \log(1-p_i))\right] = n(1-p_i) + \sum x_i\log\frac{p_i}{1-p_i} \]

By the assumption, we can insert the expression given for log odds \[ \log\mathcal{L}(x|p,\alpha,\beta) = n(1-p_i) + \sum x_i(\alpha+\beta t_i) = n(1-p_i) + \alpha\sum x_i + \beta\sum x_it_i \]

The factorization theorem, as translated to log-likelihood, requires the log-likelihood to be writeable as a sum of functions only depending on the data for one function and on the parameters and the statistics for the other.

Here, we could solve the odds equation for \(p_i\) through

\[ \log\frac{p_i}{1-p_i} = \alpha+\beta t_i \\ \frac{p_i}{1-p_i} = \exp(\alpha+\beta t_i) \\ p_i = \exp(\alpha+\beta t_i) - p_i\exp(\alpha+\beta t_i) \\ p_i(1-\exp(\alpha+\beta t_i)) = \exp(\alpha+\beta t_i) \\ p_i = \frac{\exp(\alpha+\beta t_i)}{1-\exp(\alpha+\beta t_i)} \]

Hence, one sufficient statistic is the two-dimensional statistic \(T(x)=(\sum x_i, \sum x_it_i)^T\).

Now, since the log likelihood decomposes to (as vectors and matrices) \[ \log\mathcal{L}(x|\alpha,\beta) = \begin{pmatrix}\alpha & \beta\end{pmatrix}\cdot T(x) + n\left(1-\frac{\exp(\alpha+\beta t_i)}{1-\exp(\alpha+\beta t_i)}\right) \] we can pick \(B(\theta)=n\left(1-\frac{\exp(\alpha+\beta t_i)}{1-\exp(\alpha+\beta t_i)}\right)\), \(\eta(\theta)=\begin{pmatrix}\alpha & \beta\end{pmatrix}\) in the exponential family in Definition 3.18.

As long as not all \(t_i\) are equal, \(T(x)\) has full rank, and thus Theorem 3.19 applies to show that \(T\) is complete. Since \(T\) is complete and sufficient, by Basu’s theorem, \(T\) is minimally sufficient.

If the \(t_i\) are all equal, with the common value \(t\), then \(T(x)=\sum x_i\) suffices as a statistic. The same rank and decomposition argument holds – now with \(\eta(\theta)=\alpha+t\beta\).

3.7:11

The joint likelihood is \[ \mathcal{L}(x|\mu,\sigma_x^2,\sigma_y^2) = \prod \frac{1}{\sqrt{2\pi\sigma_x^2}} \exp\left[-\frac{(x_i-\mu)^2}{2\sigma_x^2}\right] \cdot \prod \frac{1}{\sqrt{2\pi\sigma_y^2}} \exp\left[-\frac{(y_i-\mu)^2}{2\sigma_y^2}\right] \]

This gives \[ \frac{1}{\sqrt{2\pi\sigma_x^2}^n}\cdot \frac{1}{\sqrt{2\pi\sigma_y^2}^m}\cdot \exp\left[ -\frac{1}{2\sigma_x^2}\left(\sum x_i^2 - 2\mu\sum x_i + n\mu^2\right) -\frac{1}{2\sigma_y^2}\left(\sum y_i^2 - 2\mu\sum y_i + n\mu^2\right) \right] \]

If we set \(C=\frac{1}{\sqrt{2\pi}^{n+m}}\), then this can be written

\[ C\cdot \exp\left[ \color{blue}{ -\frac{n}{2}\log\sigma_x^2 -\frac{n}{2}\log\sigma_y^2 -\frac{n\mu^2}{2\sigma_x^2} -\frac{n\mu^2}{2\sigma_y^2} } \color{green}{ -\begin{pmatrix} \frac{1}{2\sigma_x^2} & \frac{2\mu}{2\sigma_x^2} & \frac{1}{2\sigma_y^2} & \frac{2\mu}{2\sigma_y^2} \end{pmatrix} } \color{orange}{ \cdot\begin{pmatrix} \sum x_i^2 \\ \sum x_i \\ \sum y_i^2 \\ \sum y_i \end{pmatrix} } \right] \]

By definition 3.18, this produces a full rank family, with \(T(x)=(\sum x_i^2, \sum x_i, \sum y_i^2, \sum y_i)^T\). By the factorization theorem, \(T(x)\) is a sufficient statistic, by theorem 3.19, \(T(x)\) is complete, so by Basu’s theorem, \(T(x)\) is minimally sufficient.

Homework w2

MVJ

2January, 2018

3.7:2

3.7:4

3.7:7

3.7:11