The Central Limit Theorem

There are 2 big theorems from probability that are used in statistics:

The law of large numbers: Assume a population with mean $\mu$. Let $x_1$, $x_2$, … be a random sample from this population. Then the sample mean, $\bar{x}$, eventually “approaches” the mean $\mu$.

To visualize this (in the 4th project), we took $n$ to be ever increasing and plotted the sample mean for different $n$ and saw that despite the randomness, this eventually came to hug the line specified by $y=\mu$.

The central limit theorem: Let the population have mean $\mu$ and finite standard deviation $\sigma$. Then when n is large, the sampling distribution of $\bar{x}$ is approximately normal with mean $\mu$ and standard deviation $\sigma/\sqrt{n}$.

We can visualize this using the following code snippet as a template. In the following the population is specified through dexp and random samples come from the function rexp:

M = 1000
n = 10

curve(dexp, -1, 4, ylim=1.1 * c(0, dnorm(0, sd=1/sqrt(n))))  # draw population

for (i in 1:5) {      # show a few samples
  xs = rexp(n)
  points(xs, (i-1)/10 + 0*xs, col=rainbow(i))    # draw a sample of size n
  abline(v = mean(xs), col=rainbow(i)) # mark the sample mean with a vertical line
}

xbars = replicate(M, {
   xs = rexp(n)
   y = mean(xs)
})
lines(density(xbars), lwd=5, col="blue")

We want to understand this diagram and how it illustrates the central limit theorem.

The use of simulation to understand an unknown distribution

In the example above, we have M=1000. Above, this means that a random sample means is taken 1000 times (the xbars) and visualized through a density plot (the last line).

The moral here is a large sample can inform on of the shape of the underlying population the sample is from.

Let’s see how large:

QUESTION: Take $M=3$ and compare the two graphs produced below (not shown):

M = 3
curve(dexp, -1, 4)
xs = rexp(M)
lines(density(xs))

Does the density plot look likes the theoretical distribution?

QUESTION: Repeat with $M=20$

QUESTION: Repeat with $M=1000$.

The mean of $\bar{x}$ is $\mu$

QUESTION: Refer to the main figure. What part(s) of the figure represent this statement?

QUESTION: Change to n=2. (Then the sampling distribution will be non bell-shaped.) Is this still the case that the two means are equal?

The standard deviation of $\bar{x}$ is $\sigma/\sqrt{n}$

QUESTION: Take n=10 and n=160 and make the main figure for each. Estimate the standard deviation of the sampling distribution (eyeballing the difference between the inflection points is 2 standard deviations). If should be that the standard deviation for n=160 is smaller. Exactly how much smaller is it (as a ratio)?

QUESTION: Estimate $\sigma$ by taking sd(rexp(M)). (Round to the nearby integer.) What is the ratio between this value and that of your estimate for the standard deviation of ${x}_{160}?

The shape of the sampling distribution of $\bar{x}$ is bell-shaped

The dlnorm and rlnorm functions can replace dexp and rexp in the main example and a different population will be used. This population is more skewed than the exponential, so plot the population over $[0,7]$ with curve(dlnorm, -1, 7, ylim=1.1*c(0, dnorm(0, sd=1/sqrt(n)))).

QUESTION: Make the main figure using n=10. Is the sampling distribution normal? (You can eyeball, or investigate via qqnorm(xbars).

QUESTION: Make the main figure using n=100. Is the sampling distribution normal? (You can eyeball, or investigate via qqnorm(xbars).

The Central Limit Theorem

The use of simulation to understand an unknown distribution

The mean of \(\bar{x}\) is \(\mu\)

The standard deviation of \(\bar{x}\) is \(\sigma/\sqrt{n}\)

The shape of the sampling distribution of \(\bar{x}\) is bell-shaped