Syllabus, Background

Mikael Vejdemo-Johansson

Welcome, Syllabus

Welcome to MTH410

I am: Mikael Vejdemo-Johansson
Office: 1S-208; Office Hours T Th 11am - noon
Lectures: T Th 12.20 - 2.15

Course webpage: https://www.math.csi.cuny.edu/~mvj/MTH410

All course details: syllabus, grading scheme, report requirements, schedule, …
All additional course content: lecture slides, homework, …
Linked from Blackboard

Grading: Semester project

Course will be graded primarily on a written report:

critical analysis of statistical methods in a published research paper, or
explanation and illustration of a statistical method or concept not covered in the course

Grades are weighted as follows:

60% Written report
20% Presentation
10% Final Exam
10% Homework

Mathematical Statistics

Statistics is…

Statistics is the grammar of science.

Karl Pearson

Statistics is…

Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise.

John Wilder Tukey

Statistics is…

All models are wrong, but some are useful.

George EP Box

Statistics is…

Prediction is very difficult, especially about the future.

Niels Bohr

Statistics is…

On two occasions I have been asked [by members of Parliament], ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.

Charles Babbage

Statistics is…

…what do you think?

Take 5 minutes to write down what you think statistics is.

Talk in pairs, compare your notes.

Share with the class.

Statistics is…

The complement to probability theory: in probability, a distribution leads to random draws; in statistics, observations lead to a distribution.

A crash course in modern probability theory

The foundation we’re not telling you about

Many statistics text books make a big deal out of distinguishing between discrete distributions (characterized by a probability mass function) and continuous distributions (characterized by a cumulative distribution function and a probability density function).

There is a joint abstraction that makes it easier to talk about both settings at the same time: measure theory. Our current book does not use this abstraction.

Probability: Samples, outcomes, events

Definition

At the core of probability lies the experiment. This is any action or process, possibly with uncertainty, that we want to study.

Definition

The sample space \(\mathcal{S}\) of an experiment is the set of all possible outcomes of the experiment.

Definition

An event is a subset of the sample space. An event is simple if it has size 1, and compound otherwise.

Probability: the probability function

Definition

By a probability, we denote a function from events to the interval \([0,1]\), such that:

For any event \(A\), \(\PP(A)\geq 0\)
\(\PP(\mathcal{S}) = 1\).
For a (possibly infinite) collection of disjoint events \(A_1, A_2, \dots\), \[ \PP(A_1 \cup A_2 \cup \dots) = \sum_i \PP(A_i) \]

Probability: additional properties

From these axioms, one can also prove:

Properties

\(\PP(\emptyset) = 0\)
\(\PP(A^c) = 1 - \PP(A)\)
\(\PP(A) \leq 1\)
\(\PP(A\cup B) = \PP(A) + \PP(B) - \PP(A\cap B)\)

Probability: the easy case

For finitely many equally likely outcomes, the probability function is easy to define:

\[ \PP(A) = \frac{|A|}{|\mathcal{S}|} \]

Many probability calculations end up using a lot of combinatorics to accurate measure these set sizes.

Conditional Probability

Definition

For two events \(A\), \(B\) with \(\PP(B)>0\), the conditional probability of \(A\) given that \(B\) has occurred is \[ \PP(A|B) = \frac{\PP(A\cap B)}{\PP(B)} \]

Multiplication Rule

\[\PP(A\cap B)=\PP(A|B)\cdot\PP(B)\]

Bayes Theorem

The Law of Total Probability

If \(A_1, \dots, A_k\) is a partition of \(\mathcal{S}\) (ie pairwise disjoint, union contains \(\mathcal{S}\)), then for any other event \(B\):

\[ \PP(B) = \sum_{i=1}^k \PP(B|A_i)\cdot\PP(A_i) \]

Bayes Theorem

\(A_i, B\) as above, \(\PP(B)>0\). Then:

\[ \PP(A_j|B) = \frac{\PP(A_j\cap B)}{\PP(B)} = \frac{\PP(B|A_j)\cdot\PP(A_j)}{\sum_{i=1}^k\PP(B|A_i)\cdot\PP(A_i)} \]

Independent Events

Independent events are events where conditioning on one event does not change the probability.

Equivalent Definitions

Two events \(A\) and \(B\) are independent if:

\(\PP(A|B)=\PP(A)\)
\(\PP(A\cap B) = \PP(A)\cdot\PP(B)\)

Otherwise the events are dependent.

Many events are mutually independent if their intersection probability is the product of the individual probabilities.

Random Variables

Definition

A random variable for a given sample space \(\mathcal{S}\) is a function \(\mathcal{S}\to\mathbb{R}\).

A random variable with values in \(T\) is a function \(\mathcal{S}\to T\).

A random variable that takes only values 0 and 1 is a Bernoulli random variable.

Random variables can be discrete or continuous. They are discrete if they take on at most countably many values, and continuous if it is defined on a union of intervals in the reals and no single value has non-zero probability.

Discrete Random Variables

A discrete random variable is characterized by its probability distribution or probability mass function (pmf):

Definition

The probability mass function (pmf) of a discrete random variable \(X\) is the function \(p(x) = \PP(X=x) = \PP(\{s\in\mathcal{S} : X(s) = x\})\).

The cumulative distribution function (cdf) of a discrete random variable \(X\) is the function \(F(x) = \PP(X\leq x) = \sum_{y\leq x} p(y)\).

Continuous Random Variables

A continuous random variable is characterized primarily by its cumulative distribution function (cdf), but we usually work with its probability distribution or probability density function (pdf):

Definition

The cumulative distribution function (cdf) of a continuous random variable \(X\) is the function \(F(x) = \PP(X\leq x)\).

The probability density function (pdf) of a continuous random variable \(X\) is the derivative of its cdf: \(f(x) = F'(x)\).

Notice the similarity in the definition of the CDF between the two cases.

Interlude on measures and integration

Under the hood, the discrete and continuous cases both define a probability measure (measure with total mass 1) on \(\mathbb{R}\), where the measure is concentrated to discrete values for a discrete variable, and spread out continuously for a continuous variable.

The pdf is formally defined as a Radon-Nikodym derivative of this probability measure with respect to the usual Lebesgue measure. The pmf is just the measure itself evaluated at single points.

With this perspective, every time we are summing things over discrete random variables and integrating over continuous random variables, we are really taking integrals with respect to the underlying measure.

Expected Values

Definition

The expected value or mean value of a random variable is \[ \EE[X] = \int_\mathbb{R} x d\PP \]

For discrete random variables, this is \(\EE[X] = \sum_x x\cdot p(x)\).

For continuous random variables, \(\EE[X] = \int_\mathbb{R} x\cdot f(x) dx\).

Note that expectation is a linear operator: \(\EE[aX+bY+c] = a\EE[X]+b\EE[Y]+c\).

Variance and Standard Deviation

Definition

The variance of a random variable is \(\VV[X] = \sigma^2_X = \EE[(X-\EE[X])^2]\).

The _standard deviation of a random variable is \(\sigma_X = \sqrt{\sigma^2_X}\).

Proposition

\(\VV[X] = \EE[X^2] - (\EE[X])^2\)

Note that variance is not linear: \(\VV[aX+bY+c] = a^2\VV[X]+b^2\VV[Y]\)

Generalizing variances and means: moments

Definition

Write \(\mu_X\) for the mean and \(\sigma_X\) for the standard deviation of \(X\).

The \(k\)th moment of \(X\) is \(\EE[X^k]\).

The \(k\)th central moment of \(X\) is \(\EE[(X-\mu_X)^k]\).

The \(k\)th standard moment of \(X\) is \(\EE\left[\left(\frac{X-\mu_X}{\sigma_X}\right)^k\right]\)

The mean is the first moment. The variance is the second central moment. The third standard moment is known as skewness and measures departure from symmetry around the mean.

Moment Generating Function

The moments assemble into the moment generating function:

Definition

The moment generating function (mgf) of \(X\) is \(\EE[e^{tX}]\). The coefficients of the powers of \(t\) are (up to a factorial factor) the moments of \(X\), making the mgf an exponential generating function for the moments.

Theorem

If the mgf is equal for two distributions, then the distributions themselves are also equal.

Some Discrete Distributions

Name	Parameters	Values	PMF	\(\EE\)	\(\VV\)
Bernoulli	\(p\)	\(\{0,1\}\)	\(p^x(1-p)^{1-x}\)	\(p\)	\(p(1-p)\)
Binomial	\(p, n\)	\([n]\)	\({n\choose x}p^x(1-p)^{n-x}\)	\(np\)	\(np(1-p)\)
Poisson	\(\lambda\)	\(\mathbb{N}\)	\(\frac{e^{-\lambda}\lambda^x}{x!}\)	\(\lambda\)	\(\lambda\)

Bernoulli: Single trial, two outcomes, fixed probability of success.
Binomial: Fixed number of independent trials, two outcomes each, constant probability of success.
Poisson: Observe events during “time” intervals, time until next event independent of previous events, number of events proportional to interval length.

Some Continuous Distributions

Name	Parameters	Values	PDF	\(\EE\)	\(\VV\)
Normal	\(\mu, \sigma^2\)	\(\mathbb{R}\)	\(\frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/(2\sigma^2)}\)	\(\mu\)	\(\sigma^2\)
Exponential	\(\lambda\)	\(\mathbb{R}_{\geq 0}\)	\(\lambda e^{-\lambda x}\)	\(1/\lambda\)	\(1/\lambda^2\)
Chi-squared	\(\nu\)	\(\mathbb{R}_{\geq0}\)	\(\frac{1}{2^{\nu/2}\Gamma(\nu/2)}x^{(\nu/2)-1}e^{-x/2}\)	\(\nu\)	\(2\nu\)
Student’s T	\(\nu\)	\(\mathbb{R}\)	…complicated	\(0\)	\(1/(\nu-2)\)
F	\(\nu_1,\nu_2\)	\(\mathbb{R}_{>0}\)	…complicated	\(\nu_2/(\nu_2-2)\)	\(\frac{2\nu_2^2(\nu_1+\nu_2-2)}{\nu_1(\nu_2-2)^2(\nu_2-4)}\)
Gamma	\(\alpha, \beta\)	\(\mathbb{R}_{>0}\)	\(\frac{1}{\beta^\alpha\Gamma(\alpha)}x^{\alpha-1}e^{-x/\beta}\)	\(\alpha\beta\)	\(\alpha\beta^2\)

Normal - the limiting sample distribution of the sample mean (or more generally of sums of iid (independent identically distributed) random variables).
Exponential - the distribution of waiting time between Poisson events.
Chi-squared - the distribution of sums of squares of iid standard normal random variables.
Student’s T - the limiting sample distribution of random variables standardized using the sample standard deviation.
F - ratios of chi-squared variables.
Gamma - joint abstraction of several other continuous distributions.

\(\Gamma(\alpha)=\int_0^\infty x^{\alpha-1}e^{-x}dx\) is a continuous version of the factorial.