\[ \def\RR{{\mathbb{R}}} \def\PP{{\mathbb{P}}} \def\EE{{\mathbb{E}}} \def\VV{{\mathbb{V}}} \]
Course webpage: https://www.math.csi.cuny.edu/~mvj/MTH410
Course will be graded primarily on a written report:
Grades are weighted as follows:
Statistics is the grammar of science.
Karl Pearson
Far better an approximate answer to the right question, which is often vague, than the exact answer to the wrong question, which can always be made precise.
John Wilder Tukey
All models are wrong, but some are useful.
George EP Box
Prediction is very difficult, especially about the future.
Niels Bohr
On two occasions I have been asked [by members of Parliament], ‘Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?’ I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question.
Charles Babbage
…what do you think?
Take 5 minutes to write down what you think statistics is.
Talk in pairs, compare your notes.
Share with the class.
The complement to probability theory: in probability, a distribution leads to random draws; in statistics, observations lead to a distribution.
Many statistics text books make a big deal out of distinguishing between discrete distributions (characterized by a probability mass function) and continuous distributions (characterized by a cumulative distribution function and a probability density function).
There is a joint abstraction that makes it easier to talk about both settings at the same time: measure theory. Our current book does not use this abstraction.
From these axioms, one can also prove:
For finitely many equally likely outcomes, the probability function is easy to define:
\[ \PP(A) = \frac{|A|}{|\mathcal{S}|} \]
Many probability calculations end up using a lot of combinatorics to accurate measure these set sizes.
Independent events are events where conditioning on one event does not change the probability.
Random variables can be discrete or continuous. They are discrete if they take on at most countably many values, and continuous if it is defined on a union of intervals in the reals and no single value has non-zero probability.
A discrete random variable is characterized by its probability distribution or probability mass function (pmf):
A continuous random variable is characterized primarily by its cumulative distribution function (cdf), but we usually work with its probability distribution or probability density function (pdf):
Notice the similarity in the definition of the CDF between the two cases.
Under the hood, the discrete and continuous cases both define a probability measure (measure with total mass 1) on \(\mathbb{R}\), where the measure is concentrated to discrete values for a discrete variable, and spread out continuously for a continuous variable.
The pdf is formally defined as a Radon-Nikodym derivative of this probability measure with respect to the usual Lebesgue measure. The pmf is just the measure itself evaluated at single points.
With this perspective, every time we are summing things over discrete random variables and integrating over continuous random variables, we are really taking integrals with respect to the underlying measure.
For discrete random variables, this is \(\EE[X] = \sum_x x\cdot p(x)\).
For continuous random variables, \(\EE[X] = \int_\mathbb{R} x\cdot f(x) dx\).
Note that expectation is a linear operator: \(\EE[aX+bY+c] = a\EE[X]+b\EE[Y]+c\).
Note that variance is not linear: \(\VV[aX+bY+c] = a^2\VV[X]+b^2\VV[Y]\)
The mean is the first moment. The variance is the second central moment. The third standard moment is known as skewness and measures departure from symmetry around the mean.
The moments assemble into the moment generating function:
Name | Parameters | Values | PMF | \(\EE\) | \(\VV\) |
---|---|---|---|---|---|
Bernoulli | \(p\) | \(\{0,1\}\) | \(p^x(1-p)^{1-x}\) | \(p\) | \(p(1-p)\) |
Binomial | \(p, n\) | \([n]\) | \({n\choose x}p^x(1-p)^{n-x}\) | \(np\) | \(np(1-p)\) |
Poisson | \(\lambda\) | \(\mathbb{N}\) | \(\frac{e^{-\lambda}\lambda^x}{x!}\) | \(\lambda\) | \(\lambda\) |
Name | Parameters | Values | \(\EE\) | \(\VV\) | |
---|---|---|---|---|---|
Normal | \(\mu, \sigma^2\) | \(\mathbb{R}\) | \(\frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/(2\sigma^2)}\) | \(\mu\) | \(\sigma^2\) |
Exponential | \(\lambda\) | \(\mathbb{R}_{\geq 0}\) | \(\lambda e^{-\lambda x}\) | \(1/\lambda\) | \(1/\lambda^2\) |
Chi-squared | \(\nu\) | \(\mathbb{R}_{\geq0}\) | \(\frac{1}{2^{\nu/2}\Gamma(\nu/2)}x^{(\nu/2)-1}e^{-x/2}\) | \(\nu\) | \(2\nu\) |
Student’s T | \(\nu\) | \(\mathbb{R}\) | …complicated | \(0\) | \(1/(\nu-2)\) |
F | \(\nu_1,\nu_2\) | \(\mathbb{R}_{>0}\) | …complicated | \(\nu_2/(\nu_2-2)\) | \(\frac{2\nu_2^2(\nu_1+\nu_2-2)}{\nu_1(\nu_2-2)^2(\nu_2-4)}\) |
Gamma | \(\alpha, \beta\) | \(\mathbb{R}_{>0}\) | \(\frac{1}{\beta^\alpha\Gamma(\alpha)}x^{\alpha-1}e^{-x/\beta}\) | \(\alpha\beta\) | \(\alpha\beta^2\) |
\(\Gamma(\alpha)=\int_0^\infty x^{\alpha-1}e^{-x}dx\) is a continuous version of the factorial.