\[ \def\RR{{\mathbb{R}}} \def\PP{{\mathbb{P}}} \def\EE{{\mathbb{E}}} \def\VV{{\mathbb{V}}} \]
Course webpage: https://www.math.csi.cuny.edu/~mvj/MTH410
Course will be graded primarily on a written report:
Grades are weighted as follows:
Statistics is the grammar of science.
Karl Pearson
Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise.
John Wilder Tukey
All models are wrong, but some are useful.
George EP Box
Prediction is very difficult, especially about the future.
Niels Bohr
On two occasions I have been asked [by members of Parliament], ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
Charles Babbage
…what do you think?
Take 5 minutes to write down what you think statistics is.
Talk in pairs, compare your notes.
Share with the class.
The complement to probability theory: in probability, a distribution leads to random draws; in statistics, observations lead to a distribution.
Many statistics text books make a big deal out of distinguishing between discrete distributions (characterized by a probability mass function) and continuous distributions (characterized by a cumulative distribution function and a probability density function).
There is a joint abstraction that makes it easier to talk about both settings at the same time: measure theory. Our current book does not use this abstraction.
Definition
At the core of probability lies the experiment. This is any action or process, possibly with uncertainty, that we want to study.
Definition
The sample space \(\mathcal{S}\) of an experiment is the set of all possible outcomes of the experiment.
Definition
An event is a subset of the sample space. An event is simple if it has size 1, and compound otherwise.
Definition
By a probability, we denote a function from events to the interval \([0,1]\), such that:
From these axioms, one can also prove:
Properties
For finitely many equally likely outcomes, the probability function is easy to define:
\[ \PP(A) = \frac{|A|}{|\mathcal{S}|} \]
Many probability calculations end up using a lot of combinatorics to accurate measure these set sizes.
Definition
For two events \(A\), \(B\) with \(\PP(B)>0\), the conditional probability of \(A\) given that \(B\) has occurred is \[ \PP(A|B) = \frac{\PP(A\cap B)}{\PP(B)} \]
Multiplication Rule
\[\PP(A\cap B)=\PP(A|B)\cdot\PP(B)\]
The Law of Total Probability
If \(A_1, \dots, A_k\) is a partition of \(\mathcal{S}\) (ie pairwise disjoint, union contains \(\mathcal{S}\)), then for any other event \(B\):
\[ \PP(B) = \sum_{i=1}^k \PP(B|A_i)\cdot\PP(A_i) \]
Bayes Theorem
\(A_i, B\) as above, \(\PP(B)>0\). Then:
\[ \PP(A_j|B) = \frac{\PP(A_j\cap B)}{\PP(B)} = \frac{\PP(B|A_j)\cdot\PP(A_j)}{\sum_{i=1}^k\PP(B|A_i)\cdot\PP(A_i)} \]
Independent events are events where conditioning on one event does not change the probability.
Equivalent Definitions
Two events \(A\) and \(B\) are independent if:
Otherwise the events are dependent.
Many events are mutually independent if their intersection probability is the product of the individual probabilities.
Definition
A random variable for a given sample space \(\mathcal{S}\) is a function \(\mathcal{S}\to\mathbb{R}\).
A random variable with values in \(T\) is a function \(\mathcal{S}\to T\).
A random variable that takes only values 0 and 1 is a Bernoulli random variable.
Random variables can be discrete or continuous. They are discrete if they take on at most countably many values, and continuous if it is defined on a union of intervals in the reals and no single value has non-zero probability.
A discrete random variable is characterized by its probability distribution or probability mass function (pmf):
Definition
The probability mass function (pmf) of a discrete random variable \(X\) is the function \(p(x) = \PP(X=x) = \PP(\{s\in\mathcal{S} : X(s) = x\})\).
The cumulative distribution function (cdf) of a discrete random variable \(X\) is the function \(F(x) = \PP(X\leq x) = \sum_{y\leq x} p(y)\).
A continuous random variable is characterized primarily by its cumulative distribution function (cdf), but we usually work with its probability distribution or probability density function (pdf):
Definition
The cumulative distribution function (cdf) of a continuous random variable \(X\) is the function \(F(x) = \PP(X\leq x)\).
The probability density function (pdf) of a continuous random variable \(X\) is the derivative of its cdf: \(f(x) = F'(x)\).
Notice the similarity in the definition of the CDF between the two cases.
Under the hood, the discrete and continuous cases both define a probability measure (measure with total mass 1) on \(\mathbb{R}\), where the measure is concentrated to discrete values for a discrete variable, and spread out continuously for a continuous variable.
The pdf is formally defined as a Radon-Nikodym derivative of this probability measure with respect to the usual Lebesgue measure. The pmf is just the measure itself evaluated at single points.
With this perspective, every time we are summing things over discrete random variables and integrating over continuous random variables, we are really taking integrals with respect to the underlying measure.
Definition
The expected value or mean value of a random variable is \[ \EE[X] = \int_\mathbb{R} x d\PP \]
For discrete random variables, this is \(\EE[X] = \sum_x x\cdot p(x)\).
For continuous random variables, \(\EE[X] = \int_\mathbb{R} x\cdot f(x) dx\).
Note that expectation is a linear operator: \(\EE[aX+bY+c] = a\EE[X]+b\EE[Y]+c\).
Definition
The variance of a random variable is \(\VV[X] = \sigma^2_X = \EE[(X-\EE[X])^2]\).
The _standard deviation of a random variable is \(\sigma_X = \sqrt{\sigma^2_X}\).
Proposition
\(\VV[X] = \EE[X^2] - (\EE[X])^2\)
Note that variance is not linear: \(\VV[aX+bY+c] = a^2\VV[X]+b^2\VV[Y]\)
Definition
Write \(\mu_X\) for the mean and \(\sigma_X\) for the standard deviation of \(X\).
The \(k\)th moment of \(X\) is \(\EE[X^k]\).
The \(k\)th central moment of \(X\) is \(\EE[(X-\mu_X)^k]\).
The \(k\)th standard moment of \(X\) is \(\EE\left[\left(\frac{X-\mu_X}{\sigma_X}\right)^k\right]\)
The mean is the first moment. The variance is the second central moment. The third standard moment is known as skewness and measures departure from symmetry around the mean.
The moments assemble into the moment generating function:
Definition
The moment generating function (mgf) of \(X\) is \(\EE[e^{tX}]\). The coefficients of the powers of \(t\) are (up to a factorial factor) the moments of \(X\), making the mgf an exponential generating function for the moments.
Theorem
If the mgf is equal for two distributions, then the distributions themselves are also equal.
Name | Parameters | Values | PMF | \(\EE\) | \(\VV\) |
---|---|---|---|---|---|
Bernoulli | \(p\) | \(\{0,1\}\) | \(p^x(1-p)^{1-x}\) | \(p\) | \(p(1-p)\) |
Binomial | \(p, n\) | \([n]\) | \({n\choose x}p^x(1-p)^{n-x}\) | \(np\) | \(np(1-p)\) |
Poisson | \(\lambda\) | \(\mathbb{N}\) | \(\frac{e^{-\lambda}\lambda^x}{x!}\) | \(\lambda\) | \(\lambda\) |
Name | Parameters | Values | \(\EE\) | \(\VV\) | |
---|---|---|---|---|---|
Normal | \(\mu, \sigma^2\) | \(\mathbb{R}\) | \(\frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/(2\sigma^2)}\) | \(\mu\) | \(\sigma^2\) |
Exponential | \(\lambda\) | \(\mathbb{R}_{\geq 0}\) | \(\lambda e^{-\lambda x}\) | \(1/\lambda\) | \(1/\lambda^2\) |
Chi-squared | \(\nu\) | \(\mathbb{R}_{\geq0}\) | \(\frac{1}{2^{\nu/2}\Gamma(\nu/2)}x^{(\nu/2)-1}e^{-x/2}\) | \(\nu\) | \(2\nu\) |
Student’s T | \(\nu\) | \(\mathbb{R}\) | …complicated | \(0\) | \(1/(\nu-2)\) |
F | \(\nu_1,\nu_2\) | \(\mathbb{R}_{>0}\) | …complicated | \(\nu_2/(\nu_2-2)\) | \(\frac{2\nu_2^2(\nu_1+\nu_2-2)}{\nu_1(\nu_2-2)^2(\nu_2-4)}\) |
Gamma | \(\alpha, \beta\) | \(\mathbb{R}_{>0}\) | \(\frac{1}{\beta^\alpha\Gamma(\alpha)}x^{\alpha-1}e^{-x/\beta}\) | \(\alpha\beta\) | \(\alpha\beta^2\) |
\(\Gamma(\alpha)=\int_0^\infty x^{\alpha-1}e^{-x}dx\) is a continuous version of the factorial.