The Law of Large Numbers

The book discusses the probability of an event two ways:

It is derived from specific probabilities of the various possible outcomes, or through some other theoretical specification
It is the proportion of times the event will occur in the long run, were the random trial continuously repeated.

We can see that R can simulate data for specific probabilities. We can then check if the long run proportion converges to the same.

Here is a prototypical example, where we simulate a fair coin toss. We use the fact that a uniform random number will be less than 1/2 with probability 1/2:

runif(1) < 1/2

## [1] TRUE

This is TRUE or FALSE with equal probability. We can make a random variable, by wrapping the output in as.numeric:

as.numeric(runif(1) < 1/2)

## [1] 1

To simulate 10 such coin tosses we only have to modify the number of random numbers we ask for:

x = as.numeric(runif(10) < 1/2)
x

##  [1] 1 1 1 1 1 0 1 1 0 1

We stored these as x. We would like to do a running average, that is add the first \(2\) then divide by 2, then add the first 3 and divide by 3, then add the first 4 then divide by 4, …

We can do this with a cumulative sum, as follows:

y = cumsum(x) / 1:length(x)

The divisor is a short way of generating 1,2,3,4,5,6,7,8,9,10 that will generalize to larger simulations.

We can then plot to visualize:

plot(y, type="l") # draw a line; will vary for different simulations
abline(h = 1/2)

Here we repeat with 10,000 steps, combining all steps at once and modifying \(p\):

n = 10000

p = 2/3
x = as.numeric(runif(n)  < p)

y = cumsum(x) / 1:length(x)
plot(y, type="l")
abline(h=p)

What do you see? It should be that the averages converge around the probability \(p\).

This can be repeated for other “events”. For example, the rules of thumb for standard normal random variables say that the probability \(Z > 1\) should be \(1-0.68/2 = 0.16\). We can check this is a long term average, using the simulation rnorm(n) > 1 in place of runif(n) < p above:

n = 10000

x = as.numeric(rnorm(n) > 1)

y = cumsum(x) / 1:length(x)
plot(y, type="l")
abline(h=0.16)

Run a simulation to look at the long-term proportion of heads in coin tossing when the probability of heads is \(p=1/4\).

Run a simulation to check the long-term proportion of normal numbers less than \(-2\) is roughly \((1-0.95)/2\). (More accurately it is pnorm(-2).)

The command rt(n, df=3) > 1 will create n events checking if the \(t\)-distribution with 3 “degrees of freedom” is more than \(1\). Confirm through a long-term proportion simulation, that this proportion is not that for the normal (technically 0.1586553).

More on the law of large numbers

At the end of our simulations above, the last number simulated is the average of n terms. Call this \(\bar{x}_n\). If we focus our attention on just that, we can also see something interesting. Before formalizing, let’s mine our intuition:

The value \(\bar{x}_n\) is random but not arbitrary – we expect it to be around \(p\), the theoretical probability of the event.
The randomness should be less variable if we have large \(n\).

We will also see, perhaps surprisingly, that the randomness has a particular shape.

How to quickly simulate the last value? We can do so, inefficiently, but who cares, with replicate:

p = 1/2
M = 10
n = 1000
y = replicate(M, {
  x = runif(n) < p
  mean(x)
})
hist(y)
abline(v = p, lwd=5)   # make fatter

The code above simulates \(\bar{x}_{1000}\) 10 times (\(\bar{x}_n\) \(M\) times). Increasing \(M\) to 10,000 and varying \(n\) can show some of the expected behavior.

Let M=10000 and n=10. Make the same graphic as above. Is the vertical line at \(p\) in the “center” of the graphic? What is the “spread?” What is the shape of the distribution?

Let p=1/8 and repeat the above question, again with M=10000 and n=10.

Now, again with p=1/2, but this time with n=40, create the graphic and answer the same 3 questions.

Let p=1/8 again, but set n=100. Repeat the above 3 questions.

Based on the 4 simulations, does it appear that the mean value of the random values \(\bar{x}_n\) is \(p\)?

Based on the 4 simulations, does it appear that the shape of the distribution describing the randomness of \(\bar{x}_n\) is always normal?

Based on the 4 simulations, does it appear that the shape of the distribution describing the randomness of \(\bar{x}_n\) is not always normal, but does appear to be normal if \(n\) is large enough?

Replace x = runif(n) < p with rnorm(n) > 1 and repeat the simulation of \(\bar{x}_n\) for \(n=25\). Does the shape of the distribution of random values also appear to be bell-shaped?