Review for the midterm

The midterm will be October 18th. It will be in-person. I still need to get the room number to you!

The test will cover material in chapters 3,4,5,6,7.

What the semester has been about so far.

The semester has been building up to just 3 sections; other efforts were to ensure the probabilistic language of those 3 sections was understood.

Section 7.2

This is the critical section so far. It is where we finally get down to doing statistics, though specialized to the case that our sample, $Y_1, Y_2, \dots, Y_n$, is

independent
identically distributed
AND has a normal distribution with mean $\mu$ and standard deviation $\sigma$.

The latter allows many computations to be done. The first two are what we are now calling i.i.d

So, just to understand the three things above we needed to

review independence (5.4)
review the Normal distribution (4.5)
review what a mean and standard deviation are (4.3)

Starting with the $Y$s we then considered $\bar{Y} = (1/n) \Sigma Y_i$.

we used (5.6) to compute the mean and standard deviation of $\bar{Y}$. We had comment only about what computations look like in the case of non-independence. (I really should have covered 5.10 which covers the case of two non-independent normal random variables, but there is only so much)
With the last assumption of normal random variables we have the key fact that $\bar{Y}$ is normal. This required us to visit (6.5) on moment generating functions.

With that we have our first key results in statistics:

$\bar{Y_n}$ has a normal distribution which can be used to find confidence intervals

That is, we can figure out an interval around $\bar{Y}$ which is as likely as specified to contain $\mu$, and for a given value $\bar{y}$ is called a confidence interval.

Next, rather than consider $(\Sigma Y_i)/n$, the next distribution to consider was of $\Sigma Z_i^2$, where $Z_i$ is a standard normal. This distribution is Chi-squared with $n$ degrees of freedom (a parameter)

The key computation involves two things:

the distribution of $Z^2$ is a $Gamma(1/2, 2)$ distribution. To see this, we need to understand how to find the pdf of a transformation of a random variable (6.3, 6.4)
the sum of independent $Gamma(\alpha_1, \beta)$ and $Gamma(\alpha_2, \beta)$ is again a $Gamma$ with parameters $\alpha_1 + \alpha_2$ and $\beta$. (Could you show this on a test?)

With that distribution under your belt, you can show that $(n-1)S^2/\sigma^2$ has a chi-squared with $n-1$ degrees of freedom.

Key to that is a proof we avoided, but in the simple case of just 2 random variables (p358) comes down to the fact that $U_1 = Y_1+Y_2$ and $U_2 = Y_1 - Y_2$ are independent. (And more generally, for a normal i.i.d. sample $\bar{Y}$ and $S$ are independent.)

To see that they are independent, we needed (6.6) giving us a technique to compute the joint pdf of $U_1$ and $U_2$.

The normalized statistic $\sqrt{n}(\bar{Y}_n - \mu)/\sigma$ can be computed assuming both $\mu$ and $\sigma$ are known. We will see soon in Chapter 8, that it is more useful to have only one unknown parameter. Substituting $S$ for $\sigma$ leads to $T=\sqrt{n}(\bar{Y} - n)/S$. It can be shown (could you do that on a test, it appears on p360) that this statistic is of the form $Z/\sqrt{W/\nu}$ where $Z$, as usual, is a standard normal, and $W$ is a Chi-squared distribution with $\nu$ degrees of freedom. The distribution of $Z/\sqrt{W/\nu}$ is called the $t$-distribution with $n$ degrees of freedom. (The chi-square in $T$ has $n-1$ degrees of freedom.)

To compute its density we needed a transformation (6.6 again) to get the joint density of $T$ and $W$. From there, the standard trick (5.3) of marginalization gave us the density of $T$.
To compute the mean is easy – the density is even, so the mean is $0$ (if it exists)
To compute the variance required a few tricks for the Gamma distribution. Namely, knowing th eintegral of the Gamma over $[0,\infty)$ is $1$ (so the normalizing constants are explicit) and knowing that we can compute $E(Y^a)$ when $Y$ is a Gamma without much fuss. (Can you do that?)

Finally the $F$ distribution was introduced. This is something you need to know how to compute, but we didn't do any computations, mostly as they don't simplify so nicely as the $T$.

7.3 The CLT

The central limit theorem (CLT) is the major theorem of statistics. It saying that the limit of the c.d.f. of $\sqrt{n}(\bar{Y}_n) - \mu)/\sigma$ is the c.d.f. of the standard normal under the simple assumption that $Y_i$ are an i.i.d. sample from some distribution with mean $\mu$ and variance $\sigma^2$.

This is always true when the population is normal, the big deal is that this applies to any population (the common distribution of each $Y_i$) assuming only that the mean and standard deviation are finite.

to prove the CLT, the moment generating function (3.9, 6.5) is considered. This plus Taylor's theorem and the unproven Theorem 7.5 are used.
The CLT improves on Chebyshev's theorem (4.10) for understanding what large is. (More than 2 standard deviations from the mean)

From a statistical point of view, the CLT says how far $\bar{Y}$ and $\mu$ differ in probabilistic language. This allows discussion about confidence intervals.

7.5 The normal approximation to the binomial

The binomial distribution (3.4) is generated by a sum of i.i.d. Bernoulli random variables. Bernoulli random variables are also used to define the Geometric distribution (3.5), the negative binomial distribution (3.6). Binomial random variable assume independence. Related random variables – not assuming independence – are characterized by the hypergeometric distribution (3.7).

(7.5) shows one case of the limit of binomial random variables–the Poisson in (3.8) was another. Applying the CLT to the Binomial yields the normal approximation to the binomial, a useful tool for calculating probabilities.

Sample problems.

Probably the best source of potential test problems is to review the HW problems, but here are few for you to mull over if you want

Discrete

Suppose $X_1$ and $X_2$ are independent Poisson random variables with parameters $\lambda_i$. What is the distribution of $X_1 + X_2$? Verify for your answer that the mean of the found distribution matches what is expected.

In Bayesian statistics, reversing the role of value and parameter is of interest. For example, for a given $p$ the Binomial distribution is a distribution for values $k$ (in $0$ to $n$). Reversing this viewpoint, for a given $k$ what is the distribution of $p$? How is it normalized to make a distribution function. (You actually don't need to compute anything here.)

Continuous

Let $Y$ be chi-squared with $\nu$ degrees of freedom. For a given $a$, what is $E(Y^a)$.

Transformations

Let $Z$ be a standard normal. We can show $Z^2$ has a chi-square distribution, a special case of the Gamma. Does $Z^3$ have a Gamma distribution?

Let $U,V$ be Uniform(0,1) and independent. What is the density of $W=U/V$? Integrate it to ensure it has area $1$.

order statistics

Let $U=X_{(1)}$ and $V=X_{(n)}$ be the smallest and largest values of an iid sequence of Uniform(0,1) random variables.

In 6.10, we learned their joint density is given by

$$~ f(u,v) = n \cdot (n-1) \cdot (v-u)^{n-2}, 0 \leq u \leq v \leq 1 ~$$

Now let $D = V - U$. The density of $D$ can be found using the techniques of chapter 6. It will be

$$~ n \cdot (n-1) \cdot (1-d) d^{n-2} ~$$

What is the mean and variance of $D$? (Think before you integrate.)

Sampling

Suppose a researcher performs a survey where they call at random 100 people from a list of 10,000 phone numbers, never re-dialing the same number. Suppose, miraculously, that 100 people answer and respond. Let $\hat{p}$ be the proportion who said "yes." It is not true that the the distribution of $p$ is normal. However, specify all the steps one needs to argue to say it is approximately normal. (I.e. how to apply the CLT to this question.)

Compute

Let $X_1, \dots, X_{9}$ be an iid sample from a normal distribution with mean $\mu$ and standard deviation $\sigma$. Find the value $b$ for which $P(\bar{X} - \mu < b) = 0.95$.

Gossett

An internet source has this to say about the work of Gossett (aka Student) that drove the investigation of the $T$ statistic:

Young chemists at Guinness Brewery, then the world's largest, designed many field and lab experiments to determine the best barley, best hops, best temperatures for brewing, etc.

They began to accumulate data and, at once, they ran into difficulties because their measurements varied. The effects they were looking for were not usually clearcut or consistent, as they had expected, and they had no way of judging whether the differences they found were effects of treatment or accident.

Two difficulties were confounded: the variation was high and the observations were few. The young research brewers worked well together- some were very close friends. Each seemed to fit into own role in brewery affairs. And to them it seemed natural to take their numerical problems to Gosset. He had done some mathematics at Oxford and seemed less scared of mathematics than they were. (Gosset was around 23).

The term "variation was high" speaks to what parameter? The term "observations were few" speak to what symbol?