The Central Limit Theorem

There are 2 big theorems from probability that are used in statistics:

The law of large numbers: Assume a population with mean \(\mu\). Let \(x_1\), \(x_2\), … be a random sample from this population. Then the sample mean, \(\bar{x}\), eventually “approaches” the mean \(\mu\).

To visualize this (in the 4th project), we took \(n\) to be ever increasing and plotted the sample mean for different \(n\) and saw that despite the randomness, this eventually came to hug the line specified by \(y=\mu\).

The central limit theorem: Let the population have mean \(\mu\) and finite standard deviation \(\sigma\). Then when n is large, the sampling distribution of \(\bar{x}\) is approximately normal with mean \(\mu\) and standard deviation \(\sigma/\sqrt{n}\).

We can visualize this using the following code snippet as a template. In the following the population is specified through dexp and random samples come from the function rexp:

M = 1000
n = 10

curve(dexp, -1, 4, ylim=1.1 * c(0, dnorm(0, sd=1/sqrt(n))))  # draw population

for (i in 1:5) {      # show a few samples
  xs = rexp(n)
  points(xs, (i-1)/10 + 0*xs, col=rainbow(i))    # draw a sample of size n
  abline(v = mean(xs), col=rainbow(i)) # mark the sample mean with a vertical line
}

xbars = replicate(M, {
   xs = rexp(n)
   y = mean(xs)
})
lines(density(xbars), lwd=5, col="blue")

We want to understand this diagram and how it illustrates the central limit theorem.

The use of simulation to understand an unknown distribution

In the example above, we have M=1000. Above, this means that a random sample means is taken 1000 times (the xbars) and visualized through a density plot (the last line).

The moral here is a large sample can inform on of the shape of the underlying population the sample is from.

Let’s see how large:

QUESTION: Take \(M=3\) and compare the two graphs produced below (not shown):

M = 3
curve(dexp, -1, 4)
xs = rexp(M)
lines(density(xs))

Does the density plot look likes the theoretical distribution?

QUESTION: Repeat with \(M=20\)

QUESTION: Repeat with \(M=1000\).

The mean of \(\bar{x}\) is \(\mu\)

QUESTION: Refer to the main figure. What part(s) of the figure represent this statement?

QUESTION: Change to n=2. (Then the sampling distribution will be non bell-shaped.) Is this still the case that the two means are equal?

The standard deviation of \(\bar{x}\) is \(\sigma/\sqrt{n}\)

QUESTION: Take n=10 and n=160 and make the main figure for each. Estimate the standard deviation of the sampling distribution (eyeballing the difference between the inflection points is 2 standard deviations). If should be that the standard deviation for n=160 is smaller. Exactly how much smaller is it (as a ratio)?

QUESTION: Estimate \(\sigma\) by taking sd(rexp(M)). (Round to the nearby integer.) What is the ratio between this value and that of your estimate for the standard deviation of ${x}_{160}?

The shape of the sampling distribution of \(\bar{x}\) is bell-shaped

The dlnorm and rlnorm functions can replace dexp and rexp in the main example and a different population will be used. This population is more skewed than the exponential, so plot the population over \([0,7]\) with curve(dlnorm, -1, 7, ylim=1.1*c(0, dnorm(0, sd=1/sqrt(n)))).

QUESTION: Make the main figure using n=10. Is the sampling distribution normal? (You can eyeball, or investigate via qqnorm(xbars).

QUESTION: Make the main figure using n=100. Is the sampling distribution normal? (You can eyeball, or investigate via qqnorm(xbars).