The essence of parametric statistics

In parametric statistics, we use statistics calculated from samples to understand parameters that specify the distribution of a population.

The most common distributions we will be interested in are the normal distribution for numeric data, and the binomial distribution for categorical data.

The normal distribution is completely described by its mean $\mu$ and its standard deviation $\sigma$.
The binomial distribution is completely described by its number of trials $n$ and its success probability $p$.

The essence of parametric statistics

Definition

A parameter is a number that describes some characteristic of a population. Usually we model populations with probability distributions, and the distributions are determined by a collection of parameters.

A statistic is a number that can be computed directly from a sample. We will be using statistics to estimate parameters.

Parameters are usually not available: we usually cannot measure the entire population.

Estimation and Inference

The process of statistical inference uses information from a sample to draw conclusions about a population. Statistical estimation is when the conclusions are proposed values for a specific parameter.

Different samples yield different statistics.

Here are two collections of numbers drawn from the same binomial distribution. Rightmost is the sample mean of each:

x	2	2	0	2	1	1	1	0	1	1	1.1
y	1	1	2	0	1	2	3	0	1	1	1.2

Since the sample mean yields uncertain individual outcomes, but with a **regular distribution in large numbers of repetitions*, the sample mean is a random phenomenon.

Since the sample mean is a number, it is a random value.

This holds true for all sample statistics.

Sample statistics

Since sample statistics are random values, we can study the distribution of that random variable. This is called the sampling distribution.

One example is the bootstrap from Lab 5; you calculated sampling distributions of means and standard deviations by repeated sampling from a data set.

The Central Limit Theorem allows us to draw conclusions about population parameters from the sampling distribution of specific sample statistics

Sampling distribution

In this example, I sample 25 values from a normal distribution with $\mu=1$ and $\sigma=0.5$. I repeat this sampling 1000 times.

Sampling distribution

In this example, I sample 25 values from a normal distribution with $\mu=1$ and $\sigma=0.5$. I repeat this sampling 1000 times.

Sampling distribution

In this example, I sample 25 values from a normal distribution with $\mu=1$ and $\sigma=0.5$. I repeat this sampling 1000 times.

Reading the sampling distribution

The sampling distribution informs both an estimate of the population parameter, and demonstrates the sampling variability inherent in the statistic method we are using.

Simulation and bootstrapping

In practice, repeated sampling is often difficult and expensive. Instead, we can

Use theoretical results and calculus to estimate the sampling distribution from a sample
Simulate what the sampling distribution would look like using a theoretical model for the simulation
Use the bootstrap to create empirical sampling distributions from the data

Bias & Variability

Bias measures whether the expected value of the statistic is the true value of the parameter.

An unbiased statistic has an expected value of the sampling distribution equal to the true value of the parameter.

Variability measures the spread of the sampling distribution. It is usually determined by sample size, with smaller spreads from larger samples.

Bias

Consider variance. The average squared deviation from the average would give us the left. We usually use the formula on the right. \[ \sigma^2 = \frac{1}{N}\sum(x_i-\overline x)^2 \qquad \sigma^2 = \frac{1}{N-1}\sum(x_i-\overline x)^2 \]

How do we reduce bias & variability?

Bias is reduced by using random sampling: by randomizing over- and under-estimates tend to balance out.

Variability is reduced by using a larger sample.

Note: as long as the population is at least 20x larger than the sample, variability does not depend on population size. Variability only depends on sample size (and properties of the true distribution)

Population Distribution

No matter what the actual population, we use population distribution to refer to the distribution of the random variable of whatever property we are interested in measuring.

The population might not necessarily concretely exist:

Course grades in MTH214 has as its population a hypothetical collection of all students that will take the course in the future.

Sample distribution of the sample mean

The sample mean \[ \overline x = \frac{\sum x_i}{N} \] is an unbiased estimate of the population mean $\mu$.

The Central Limit Theorem tells us that for a population distribution with mean $\mu$ and standard deviation $\sigma$, \[ \overline x \sim \mathcal N\left(\mu, \frac{\sigma}{\sqrt{N}}\right) \]

The power of the Central Limit Theorem is that the distribution is approximately normal even if the population is not.

Note: averages vary less than observations.

Sample distribution of the sample mean

\[ \overline x \sim \mathcal N\left(\mu, \frac{\sigma}{\sqrt{N}}\right) \]

Sample distribution of the sample mean

\[ \overline x \sim \mathcal N\left(\mu, \frac{\sigma}{\sqrt{N}}\right) \]

Sampling distribution of the mean: example

A task has a distribution on work time distributed like this:

\[ \mu = 1 \qquad \sigma = 1 \]

Sampling distribution of the mean: example

A task has a distribution on work time distributed like this:

\[ \mu = 1 \qquad \sigma = 1 \]

Your manager allocates 1.1 hours per task to perform 70 tasks in two weeks: 80 hours of work time. As long as the mean time for these 70 task is less than $80/70=1.143 you will be able to do it without overtime.

Sampling distribution of the mean: example

A task has a distribution on work time distributed like this:

\[ \mu = 1 \qquad \sigma = 1 \]

Your manager allocates 1.1 hours per task to perform 70 tasks in two weeks: 80 hours of work time. As long as the mean time for these 70 task is less than $80/70=1.143$ you will be able to do it without overtime.

The Central Limit Theorem tells us that $\overline x\sim\mathcal N(1, 1/\sqrt{70})\approx\mathcal N(1, 0.12)$

The probability of exceeding the time can be calculated using pnorm:

pnorm(80/70, 1, 1/sqrt(70), lower.tail=FALSE)*100

## [1] 11.59989

There is an 11% chance that the time isn’t enough.

Sampling distributions for counts and proportions

For discrete data, we distinguish between two main ways of generating counts:

A binomial setting is when we perform several independent trials of the same process and record the number of times a particular outcome occurs.

A Poisson setting is when we consider the number of successes that occur in a fixed unit of measure. (time, region of space, …)

The difference is that for the binomial case, you specify how many trials you check; for Poisson you specify how long you watch.

Binomial counts