Course information will come through Blackboard. A detailed syllabus is available now.
Read now: Efron & Hastie, Part I.
Big Data is when you have to think about handling your data:
Based on observations \(x_i\) from a random variable \(X\), describe \(X\) sufficiently well to enable inferences and predictions.
Estimate the mean \(\mu_X\) based on repeated observations \(x_1,\dots,x_n\):
Based on observations \(x_i\) from a random variable \(X\), describe \(X\) sufficiently well to enable inferences and predictions.
Check whether a mean \(\mu_X\) is significantly different from a hypothesized mean \(\mu_0\), based on repeated observations \(x_1,\dots,x_n\):
Based on observations \(x_i\) from a random variable \(X\), describe \(X\) sufficiently well to enable inferences and predictions.
Check whether two means \(\mu_X\) and \(\mu_Y\) are significantly different from each other, based on repeated observations \(x_1,\dots,x_n\) and \(y_1,\dots,y_m\):
Based on observations \(x_i\) from a random variable \(X\), describe \(X\) sufficiently well to enable inferences and predictions.
Fundamental issue: as data sizes grow, \(1/\sqrt{n}\) will dominate everything else.
Everything is statistically significant.
Standard suggestion from Stats 101: look at effect sizes! look at domain specific significance concepts!
Denote:
Then the power of the T-test is the probability of detecting an effect of size \(\Delta\):
\[ 1-\beta \approx F_\mathcal{N}(\Delta\sqrt{n}-z_{1-\alpha/2}) \]
With a large sample size, we can solve for \(\alpha\) to find a significance cutoff value.
100k samples each from two normal distributions:
Quantity | Estimate |
---|---|
\(\overline x\) | \(-0.0141262\) |
\(\overline y\) | \(0.0085098\) |
\(s_X\) | \(1.0032245\) |
\(s_Y\) | \(1.0022946\) |
t-test \(p\) | \(4.4774123\times 10^{-7}\) |
100k samples each from two normal distributions:
100k samples each from two normal distributions:
100k samples each from two normal distributions:
We consider a difference to be significant if greater than \(0.1\).
power.t.test(n=100000, delta=0.1, sd=1, power=0.8, sig.level=NULL)
##
## Two-sample t test power calculation
##
## n = 1e+05
## delta = 0.1
## sd = 1
## sig.level = 1.030365e-102
## power = 0.8
## alternative = two.sided
##
## NOTE: n is number in *each* group
At \(p = 4.4774123\times 10^{-7}\) we are not able to reject the null.
With large scale data, even VERY small subpopulations can be studied.
Example (ongoing research): study only taxi rides in NYC that start and end at the same position.
Basic division: Frequentism vs. Bayesianism
 | Frequentist | Bayesian |
---|---|---|
Probability is… | …asymptotic proportion of successes in repeated trials | …measure of synthesized belief |
Parametric inference is… | …estimating value of parameters from data | …updating probability distributions on parameters from data |
Core interest: estimating some value \(\theta\) related to some real probability distribution on \(X\) from some estimator \(\hat\theta = t(X)\).
Bias - Variance trade-off: \[ \text{MSE} = \text{Bias}^2 + \text{Variance} \]
We use standard error to refer to \(\sqrt{\mathbb{V}(\hat\theta)}\) - the standard deviation of an estimator.
To derive quantities describing an estimator we can use:
Given a formula relating a quantity to parameters, plug in an estimator directly.
Example: The sample mean \(\overline{X} = \sum X_i/n\) has standard error \[ \text{se}(\overline{X}) = \sqrt{\mathbb{V}(X)/n} \]
We can estimate \(\mathbb{V}(X)\) using the sample variance \(\hat{\mathbb{V}}(X) = \sum (x_i-\overline{x})^2/(n-1)\). This yields an estimated standard error \[ \widehat{\text{se}}(\overline{X}) = \sqrt{\sum(x_i-\overline x)^2/(n(n-1))} \]
To derive quantities describing an estimator we can use:
More complicated statistics can be related back using linear approximations. For a function \(s(\hat\theta)\) we can Taylor expand around \(\theta=\mathbb{E}\hat\theta\): \[ s(\hat\theta) - s(\theta) \approx s'(\theta)(\hat\theta-\theta) \]
So \(\mathbb{V}[s(\hat\theta)] = \mathbb{E}[(s(\hat\theta)-s(\theta))^2] \approx |s'(\theta)|^2\mathbb{V}\theta\).
\(\hat\theta = \overline{x}^2\). Then \(d\hat\theta/d\overline{x} = 2\overline{x}\). Plugin into the Taylor expansion we get
\[ \text{se}(\overline{x}^2) = 2|\overline{x}|\widehat{\text{se}}(\overline{x}) \]
To derive quantities describing an estimator we can use:
We define the likelihood function as a function on parameter values:
\[ \mathcal{L}(\theta | x) = \mathbb{P}(x | \theta) \]
Neyman-Pearson’s Lemma When constructing a statistical testing rule to pick between two distributions \(f_0\) and \(f_1\), the smallest errors are achieved by \[ t_c(x) = \begin{cases} 1 & \text{if $\log(\mathcal{L}_1/\mathcal L_0) \geq c$} \\ 0 & \text{if $\log(\mathcal{L}_1/\mathcal L_0) < c$} \end{cases} \] for \(c\) chosen to achieve the desired confidence level
To derive quantities describing an estimator we can use:
We define the likelihood function as a function on parameter values:
\[ \mathcal{L}(\theta | x) = \mathbb{P}(x | \theta) \]
The Maximum Likelihood Estimator tends to be unbiased and with least possible variance – and even when not, tends to work very well.
\[ \hat\theta_{MLE} = \max_{\hat\theta} \mathcal{L}(\hat\theta | x) \]
To derive quantities describing an estimator we can use:
Frequentism wants us to focus on repeated experiments. …so let’s repeat some experiments.
Use the dataset \(x\) as a probability distribution itself; sample \(x^{(1)}, \dots, x^{(B)}\) repeatedly from \(x\) (with replacement). This approximates the true distribution, and we can use the observed distributions of \(t(x^{(k)})\) to study the statistical behavior of \(t(x)\).
To derive quantities describing an estimator we can use:
(Small data) example: mpg
dataset
manufacturer | model | displ | year | cyl | trans | drv | cty | hwy | fl | class |
---|---|---|---|---|---|---|---|---|---|---|
audi | a4 | 1.8 | 1999 | 4 | auto(l5) | f | 18 | 29 | p | compact |
audi | a4 | 1.8 | 1999 | 4 | manual(m5) | f | 21 | 29 | p | compact |
audi | a4 | 2.0 | 2008 | 4 | manual(m6) | f | 20 | 31 | p | compact |
audi | a4 | 2.0 | 2008 | 4 | auto(av) | f | 21 | 30 | p | compact |
audi | a4 | 2.8 | 1999 | 6 | auto(l5) | f | 16 | 26 | p | compact |
audi | a4 | 2.8 | 1999 | 6 | manual(m5) | f | 18 | 26 | p | compact |
audi | a4 | 3.1 | 2008 | 6 | auto(av) | f | 18 | 27 | p | compact |
audi | a4 quattro | 1.8 | 1999 | 4 | manual(m5) | 4 | 18 | 26 | p | compact |
audi | a4 quattro | 1.8 | 1999 | 4 | auto(l5) | 4 | 16 | 25 | p | compact |
audi | a4 quattro | 2.0 | 2008 | 4 | manual(m6) | 4 | 20 | 28 | p | compact |
audi | a4 quattro | 2.0 | 2008 | 4 | auto(s6) | 4 | 19 | 27 | p | compact |
audi | a4 quattro | 2.8 | 1999 | 6 | auto(l5) | 4 | 15 | 25 | p | compact |
audi | a4 quattro | 2.8 | 1999 | 6 | manual(m5) | 4 | 17 | 25 | p | compact |
audi | a4 quattro | 3.1 | 2008 | 6 | auto(s6) | 4 | 17 | 25 | p | compact |
audi | a4 quattro | 3.1 | 2008 | 6 | manual(m6) | 4 | 15 | 25 | p | compact |
audi | a6 quattro | 2.8 | 1999 | 6 | auto(l5) | 4 | 15 | 24 | p | midsize |
audi | a6 quattro | 3.1 | 2008 | 6 | auto(s6) | 4 | 17 | 25 | p | midsize |
audi | a6 quattro | 4.2 | 2008 | 8 | auto(s6) | 4 | 16 | 23 | p | midsize |
chevrolet | c1500 suburban 2wd | 5.3 | 2008 | 8 | auto(l4) | r | 14 | 20 | r | suv |
chevrolet | c1500 suburban 2wd | 5.3 | 2008 | 8 | auto(l4) | r | 11 | 15 | e | suv |
chevrolet | c1500 suburban 2wd | 5.3 | 2008 | 8 | auto(l4) | r | 14 | 20 | r | suv |
chevrolet | c1500 suburban 2wd | 5.7 | 1999 | 8 | auto(l4) | r | 13 | 17 | r | suv |
chevrolet | c1500 suburban 2wd | 6.0 | 2008 | 8 | auto(l4) | r | 12 | 17 | r | suv |
chevrolet | corvette | 5.7 | 1999 | 8 | manual(m6) | r | 16 | 26 | p | 2seater |
chevrolet | corvette | 5.7 | 1999 | 8 | auto(l4) | r | 15 | 23 | p | 2seater |
chevrolet | corvette | 6.2 | 2008 | 8 | manual(m6) | r | 16 | 26 | p | 2seater |
chevrolet | corvette | 6.2 | 2008 | 8 | auto(s6) | r | 15 | 25 | p | 2seater |
chevrolet | corvette | 7.0 | 2008 | 8 | manual(m6) | r | 15 | 24 | p | 2seater |
chevrolet | k1500 tahoe 4wd | 5.3 | 2008 | 8 | auto(l4) | 4 | 14 | 19 | r | suv |
chevrolet | k1500 tahoe 4wd | 5.3 | 2008 | 8 | auto(l4) | 4 | 11 | 14 | e | suv |
chevrolet | k1500 tahoe 4wd | 5.7 | 1999 | 8 | auto(l4) | 4 | 11 | 15 | r | suv |
chevrolet | k1500 tahoe 4wd | 6.5 | 1999 | 8 | auto(l4) | 4 | 14 | 17 | d | suv |
chevrolet | malibu | 2.4 | 1999 | 4 | auto(l4) | f | 19 | 27 | r | midsize |
chevrolet | malibu | 2.4 | 2008 | 4 | auto(l4) | f | 22 | 30 | r | midsize |
chevrolet | malibu | 3.1 | 1999 | 6 | auto(l4) | f | 18 | 26 | r | midsize |
chevrolet | malibu | 3.5 | 2008 | 6 | auto(l4) | f | 18 | 29 | r | midsize |
chevrolet | malibu | 3.6 | 2008 | 6 | auto(s6) | f | 17 | 26 | r | midsize |
dodge | caravan 2wd | 2.4 | 1999 | 4 | auto(l3) | f | 18 | 24 | r | minivan |
dodge | caravan 2wd | 3.0 | 1999 | 6 | auto(l4) | f | 17 | 24 | r | minivan |
dodge | caravan 2wd | 3.3 | 1999 | 6 | auto(l4) | f | 16 | 22 | r | minivan |
dodge | caravan 2wd | 3.3 | 1999 | 6 | auto(l4) | f | 16 | 22 | r | minivan |
dodge | caravan 2wd | 3.3 | 2008 | 6 | auto(l4) | f | 17 | 24 | r | minivan |
dodge | caravan 2wd | 3.3 | 2008 | 6 | auto(l4) | f | 17 | 24 | r | minivan |
dodge | caravan 2wd | 3.3 | 2008 | 6 | auto(l4) | f | 11 | 17 | e | minivan |
dodge | caravan 2wd | 3.8 | 1999 | 6 | auto(l4) | f | 15 | 22 | r | minivan |
dodge | caravan 2wd | 3.8 | 1999 | 6 | auto(l4) | f | 15 | 21 | r | minivan |
dodge | caravan 2wd | 3.8 | 2008 | 6 | auto(l6) | f | 16 | 23 | r | minivan |
dodge | caravan 2wd | 4.0 | 2008 | 6 | auto(l6) | f | 16 | 23 | r | minivan |
dodge | dakota pickup 4wd | 3.7 | 2008 | 6 | manual(m6) | 4 | 15 | 19 | r | pickup |
dodge | dakota pickup 4wd | 3.7 | 2008 | 6 | auto(l4) | 4 | 14 | 18 | r | pickup |
dodge | dakota pickup 4wd | 3.9 | 1999 | 6 | auto(l4) | 4 | 13 | 17 | r | pickup |
dodge | dakota pickup 4wd | 3.9 | 1999 | 6 | manual(m5) | 4 | 14 | 17 | r | pickup |
dodge | dakota pickup 4wd | 4.7 | 2008 | 8 | auto(l5) | 4 | 14 | 19 | r | pickup |
dodge | dakota pickup 4wd | 4.7 | 2008 | 8 | auto(l5) | 4 | 14 | 19 | r | pickup |
dodge | dakota pickup 4wd | 4.7 | 2008 | 8 | auto(l5) | 4 | 9 | 12 | e | pickup |
dodge | dakota pickup 4wd | 5.2 | 1999 | 8 | manual(m5) | 4 | 11 | 17 | r | pickup |
dodge | dakota pickup 4wd | 5.2 | 1999 | 8 | auto(l4) | 4 | 11 | 15 | r | pickup |
dodge | durango 4wd | 3.9 | 1999 | 6 | auto(l4) | 4 | 13 | 17 | r | suv |
dodge | durango 4wd | 4.7 | 2008 | 8 | auto(l5) | 4 | 13 | 17 | r | suv |
dodge | durango 4wd | 4.7 | 2008 | 8 | auto(l5) | 4 | 9 | 12 | e | suv |
dodge | durango 4wd | 4.7 | 2008 | 8 | auto(l5) | 4 | 13 | 17 | r | suv |
dodge | durango 4wd | 5.2 | 1999 | 8 | auto(l4) | 4 | 11 | 16 | r | suv |
dodge | durango 4wd | 5.7 | 2008 | 8 | auto(l5) | 4 | 13 | 18 | r | suv |
dodge | durango 4wd | 5.9 | 1999 | 8 | auto(l4) | 4 | 11 | 15 | r | suv |
dodge | ram 1500 pickup 4wd | 4.7 | 2008 | 8 | manual(m6) | 4 | 12 | 16 | r | pickup |
dodge | ram 1500 pickup 4wd | 4.7 | 2008 | 8 | auto(l5) | 4 | 9 | 12 | e | pickup |
dodge | ram 1500 pickup 4wd | 4.7 | 2008 | 8 | auto(l5) | 4 | 13 | 17 | r | pickup |
dodge | ram 1500 pickup 4wd | 4.7 | 2008 | 8 | auto(l5) | 4 | 13 | 17 | r | pickup |
dodge | ram 1500 pickup 4wd | 4.7 | 2008 | 8 | manual(m6) | 4 | 12 | 16 | r | pickup |
dodge | ram 1500 pickup 4wd | 4.7 | 2008 | 8 | manual(m6) | 4 | 9 | 12 | e | pickup |
dodge | ram 1500 pickup 4wd | 5.2 | 1999 | 8 | auto(l4) | 4 | 11 | 15 | r | pickup |
dodge | ram 1500 pickup 4wd | 5.2 | 1999 | 8 | manual(m5) | 4 | 11 | 16 | r | pickup |
dodge | ram 1500 pickup 4wd | 5.7 | 2008 | 8 | auto(l5) | 4 | 13 | 17 | r | pickup |
dodge | ram 1500 pickup 4wd | 5.9 | 1999 | 8 | auto(l4) | 4 | 11 | 15 | r | pickup |
ford | expedition 2wd | 4.6 | 1999 | 8 | auto(l4) | r | 11 | 17 | r | suv |
ford | expedition 2wd | 5.4 | 1999 | 8 | auto(l4) | r | 11 | 17 | r | suv |
ford | expedition 2wd | 5.4 | 2008 | 8 | auto(l6) | r | 12 | 18 | r | suv |
ford | explorer 4wd | 4.0 | 1999 | 6 | auto(l5) | 4 | 14 | 17 | r | suv |
ford | explorer 4wd | 4.0 | 1999 | 6 | manual(m5) | 4 | 15 | 19 | r | suv |
ford | explorer 4wd | 4.0 | 1999 | 6 | auto(l5) | 4 | 14 | 17 | r | suv |
ford | explorer 4wd | 4.0 | 2008 | 6 | auto(l5) | 4 | 13 | 19 | r | suv |
ford | explorer 4wd | 4.6 | 2008 | 8 | auto(l6) | 4 | 13 | 19 | r | suv |
ford | explorer 4wd | 5.0 | 1999 | 8 | auto(l4) | 4 | 13 | 17 | r | suv |
ford | f150 pickup 4wd | 4.2 | 1999 | 6 | auto(l4) | 4 | 14 | 17 | r | pickup |
ford | f150 pickup 4wd | 4.2 | 1999 | 6 | manual(m5) | 4 | 14 | 17 | r | pickup |
ford | f150 pickup 4wd | 4.6 | 1999 | 8 | manual(m5) | 4 | 13 | 16 | r | pickup |
ford | f150 pickup 4wd | 4.6 | 1999 | 8 | auto(l4) | 4 | 13 | 16 | r | pickup |
ford | f150 pickup 4wd | 4.6 | 2008 | 8 | auto(l4) | 4 | 13 | 17 | r | pickup |
ford | f150 pickup 4wd | 5.4 | 1999 | 8 | auto(l4) | 4 | 11 | 15 | r | pickup |
ford | f150 pickup 4wd | 5.4 | 2008 | 8 | auto(l4) | 4 | 13 | 17 | r | pickup |
ford | mustang | 3.8 | 1999 | 6 | manual(m5) | r | 18 | 26 | r | subcompact |
ford | mustang | 3.8 | 1999 | 6 | auto(l4) | r | 18 | 25 | r | subcompact |
ford | mustang | 4.0 | 2008 | 6 | manual(m5) | r | 17 | 26 | r | subcompact |
ford | mustang | 4.0 | 2008 | 6 | auto(l5) | r | 16 | 24 | r | subcompact |
ford | mustang | 4.6 | 1999 | 8 | auto(l4) | r | 15 | 21 | r | subcompact |
ford | mustang | 4.6 | 1999 | 8 | manual(m5) | r | 15 | 22 | r | subcompact |
ford | mustang | 4.6 | 2008 | 8 | manual(m5) | r | 15 | 23 | r | subcompact |
ford | mustang | 4.6 | 2008 | 8 | auto(l5) | r | 15 | 22 | r | subcompact |
ford | mustang | 5.4 | 2008 | 8 | manual(m6) | r | 14 | 20 | p | subcompact |
honda | civic | 1.6 | 1999 | 4 | manual(m5) | f | 28 | 33 | r | subcompact |
honda | civic | 1.6 | 1999 | 4 | auto(l4) | f | 24 | 32 | r | subcompact |
honda | civic | 1.6 | 1999 | 4 | manual(m5) | f | 25 | 32 | r | subcompact |
honda | civic | 1.6 | 1999 | 4 | manual(m5) | f | 23 | 29 | p | subcompact |
honda | civic | 1.6 | 1999 | 4 | auto(l4) | f | 24 | 32 | r | subcompact |
honda | civic | 1.8 | 2008 | 4 | manual(m5) | f | 26 | 34 | r | subcompact |
honda | civic | 1.8 | 2008 | 4 | auto(l5) | f | 25 | 36 | r | subcompact |
honda | civic | 1.8 | 2008 | 4 | auto(l5) | f | 24 | 36 | c | subcompact |
honda | civic | 2.0 | 2008 | 4 | manual(m6) | f | 21 | 29 | p | subcompact |
hyundai | sonata | 2.4 | 1999 | 4 | auto(l4) | f | 18 | 26 | r | midsize |
hyundai | sonata | 2.4 | 1999 | 4 | manual(m5) | f | 18 | 27 | r | midsize |
hyundai | sonata | 2.4 | 2008 | 4 | auto(l4) | f | 21 | 30 | r | midsize |
hyundai | sonata | 2.4 | 2008 | 4 | manual(m5) | f | 21 | 31 | r | midsize |
hyundai | sonata | 2.5 | 1999 | 6 | auto(l4) | f | 18 | 26 | r | midsize |
hyundai | sonata | 2.5 | 1999 | 6 | manual(m5) | f | 18 | 26 | r | midsize |
hyundai | sonata | 3.3 | 2008 | 6 | auto(l5) | f | 19 | 28 | r | midsize |
hyundai | tiburon | 2.0 | 1999 | 4 | auto(l4) | f | 19 | 26 | r | subcompact |
hyundai | tiburon | 2.0 | 1999 | 4 | manual(m5) | f | 19 | 29 | r | subcompact |
hyundai | tiburon | 2.0 | 2008 | 4 | manual(m5) | f | 20 | 28 | r | subcompact |
hyundai | tiburon | 2.0 | 2008 | 4 | auto(l4) | f | 20 | 27 | r | subcompact |
hyundai | tiburon | 2.7 | 2008 | 6 | auto(l4) | f | 17 | 24 | r | subcompact |
hyundai | tiburon | 2.7 | 2008 | 6 | manual(m6) | f | 16 | 24 | r | subcompact |
hyundai | tiburon | 2.7 | 2008 | 6 | manual(m5) | f | 17 | 24 | r | subcompact |
jeep | grand cherokee 4wd | 3.0 | 2008 | 6 | auto(l5) | 4 | 17 | 22 | d | suv |
jeep | grand cherokee 4wd | 3.7 | 2008 | 6 | auto(l5) | 4 | 15 | 19 | r | suv |
jeep | grand cherokee 4wd | 4.0 | 1999 | 6 | auto(l4) | 4 | 15 | 20 | r | suv |
jeep | grand cherokee 4wd | 4.7 | 1999 | 8 | auto(l4) | 4 | 14 | 17 | r | suv |
jeep | grand cherokee 4wd | 4.7 | 2008 | 8 | auto(l5) | 4 | 9 | 12 | e | suv |
jeep | grand cherokee 4wd | 4.7 | 2008 | 8 | auto(l5) | 4 | 14 | 19 | r | suv |
jeep | grand cherokee 4wd | 5.7 | 2008 | 8 | auto(l5) | 4 | 13 | 18 | r | suv |
jeep | grand cherokee 4wd | 6.1 | 2008 | 8 | auto(l5) | 4 | 11 | 14 | p | suv |
land rover | range rover | 4.0 | 1999 | 8 | auto(l4) | 4 | 11 | 15 | p | suv |
land rover | range rover | 4.2 | 2008 | 8 | auto(s6) | 4 | 12 | 18 | r | suv |
land rover | range rover | 4.4 | 2008 | 8 | auto(s6) | 4 | 12 | 18 | r | suv |
land rover | range rover | 4.6 | 1999 | 8 | auto(l4) | 4 | 11 | 15 | p | suv |
lincoln | navigator 2wd | 5.4 | 1999 | 8 | auto(l4) | r | 11 | 17 | r | suv |
lincoln | navigator 2wd | 5.4 | 1999 | 8 | auto(l4) | r | 11 | 16 | p | suv |
lincoln | navigator 2wd | 5.4 | 2008 | 8 | auto(l6) | r | 12 | 18 | r | suv |
mercury | mountaineer 4wd | 4.0 | 1999 | 6 | auto(l5) | 4 | 14 | 17 | r | suv |
mercury | mountaineer 4wd | 4.0 | 2008 | 6 | auto(l5) | 4 | 13 | 19 | r | suv |
mercury | mountaineer 4wd | 4.6 | 2008 | 8 | auto(l6) | 4 | 13 | 19 | r | suv |
mercury | mountaineer 4wd | 5.0 | 1999 | 8 | auto(l4) | 4 | 13 | 17 | r | suv |
nissan | altima | 2.4 | 1999 | 4 | manual(m5) | f | 21 | 29 | r | compact |
nissan | altima | 2.4 | 1999 | 4 | auto(l4) | f | 19 | 27 | r | compact |
nissan | altima | 2.5 | 2008 | 4 | auto(av) | f | 23 | 31 | r | midsize |
nissan | altima | 2.5 | 2008 | 4 | manual(m6) | f | 23 | 32 | r | midsize |
nissan | altima | 3.5 | 2008 | 6 | manual(m6) | f | 19 | 27 | p | midsize |
nissan | altima | 3.5 | 2008 | 6 | auto(av) | f | 19 | 26 | p | midsize |
nissan | maxima | 3.0 | 1999 | 6 | auto(l4) | f | 18 | 26 | r | midsize |
nissan | maxima | 3.0 | 1999 | 6 | manual(m5) | f | 19 | 25 | r | midsize |
nissan | maxima | 3.5 | 2008 | 6 | auto(av) | f | 19 | 25 | p | midsize |
nissan | pathfinder 4wd | 3.3 | 1999 | 6 | auto(l4) | 4 | 14 | 17 | r | suv |
nissan | pathfinder 4wd | 3.3 | 1999 | 6 | manual(m5) | 4 | 15 | 17 | r | suv |
nissan | pathfinder 4wd | 4.0 | 2008 | 6 | auto(l5) | 4 | 14 | 20 | p | suv |
nissan | pathfinder 4wd | 5.6 | 2008 | 8 | auto(s5) | 4 | 12 | 18 | p | suv |
pontiac | grand prix | 3.1 | 1999 | 6 | auto(l4) | f | 18 | 26 | r | midsize |
pontiac | grand prix | 3.8 | 1999 | 6 | auto(l4) | f | 16 | 26 | p | midsize |
pontiac | grand prix | 3.8 | 1999 | 6 | auto(l4) | f | 17 | 27 | r | midsize |
pontiac | grand prix | 3.8 | 2008 | 6 | auto(l4) | f | 18 | 28 | r | midsize |
pontiac | grand prix | 5.3 | 2008 | 8 | auto(s4) | f | 16 | 25 | p | midsize |
subaru | forester awd | 2.5 | 1999 | 4 | manual(m5) | 4 | 18 | 25 | r | suv |
subaru | forester awd | 2.5 | 1999 | 4 | auto(l4) | 4 | 18 | 24 | r | suv |
subaru | forester awd | 2.5 | 2008 | 4 | manual(m5) | 4 | 20 | 27 | r | suv |
subaru | forester awd | 2.5 | 2008 | 4 | manual(m5) | 4 | 19 | 25 | p | suv |
subaru | forester awd | 2.5 | 2008 | 4 | auto(l4) | 4 | 20 | 26 | r | suv |
subaru | forester awd | 2.5 | 2008 | 4 | auto(l4) | 4 | 18 | 23 | p | suv |
subaru | impreza awd | 2.2 | 1999 | 4 | auto(l4) | 4 | 21 | 26 | r | subcompact |
subaru | impreza awd | 2.2 | 1999 | 4 | manual(m5) | 4 | 19 | 26 | r | subcompact |
subaru | impreza awd | 2.5 | 1999 | 4 | manual(m5) | 4 | 19 | 26 | r | subcompact |
subaru | impreza awd | 2.5 | 1999 | 4 | auto(l4) | 4 | 19 | 26 | r | subcompact |
subaru | impreza awd | 2.5 | 2008 | 4 | auto(s4) | 4 | 20 | 25 | p | compact |
subaru | impreza awd | 2.5 | 2008 | 4 | auto(s4) | 4 | 20 | 27 | r | compact |
subaru | impreza awd | 2.5 | 2008 | 4 | manual(m5) | 4 | 19 | 25 | p | compact |
subaru | impreza awd | 2.5 | 2008 | 4 | manual(m5) | 4 | 20 | 27 | r | compact |
toyota | 4runner 4wd | 2.7 | 1999 | 4 | manual(m5) | 4 | 15 | 20 | r | suv |
toyota | 4runner 4wd | 2.7 | 1999 | 4 | auto(l4) | 4 | 16 | 20 | r | suv |
toyota | 4runner 4wd | 3.4 | 1999 | 6 | auto(l4) | 4 | 15 | 19 | r | suv |
toyota | 4runner 4wd | 3.4 | 1999 | 6 | manual(m5) | 4 | 15 | 17 | r | suv |
toyota | 4runner 4wd | 4.0 | 2008 | 6 | auto(l5) | 4 | 16 | 20 | r | suv |
toyota | 4runner 4wd | 4.7 | 2008 | 8 | auto(l5) | 4 | 14 | 17 | r | suv |
toyota | camry | 2.2 | 1999 | 4 | manual(m5) | f | 21 | 29 | r | midsize |
toyota | camry | 2.2 | 1999 | 4 | auto(l4) | f | 21 | 27 | r | midsize |
toyota | camry | 2.4 | 2008 | 4 | manual(m5) | f | 21 | 31 | r | midsize |
toyota | camry | 2.4 | 2008 | 4 | auto(l5) | f | 21 | 31 | r | midsize |
toyota | camry | 3.0 | 1999 | 6 | auto(l4) | f | 18 | 26 | r | midsize |
toyota | camry | 3.0 | 1999 | 6 | manual(m5) | f | 18 | 26 | r | midsize |
toyota | camry | 3.5 | 2008 | 6 | auto(s6) | f | 19 | 28 | r | midsize |
toyota | camry solara | 2.2 | 1999 | 4 | auto(l4) | f | 21 | 27 | r | compact |
toyota | camry solara | 2.2 | 1999 | 4 | manual(m5) | f | 21 | 29 | r | compact |
toyota | camry solara | 2.4 | 2008 | 4 | manual(m5) | f | 21 | 31 | r | compact |
toyota | camry solara | 2.4 | 2008 | 4 | auto(s5) | f | 22 | 31 | r | compact |
toyota | camry solara | 3.0 | 1999 | 6 | auto(l4) | f | 18 | 26 | r | compact |
toyota | camry solara | 3.0 | 1999 | 6 | manual(m5) | f | 18 | 26 | r | compact |
toyota | camry solara | 3.3 | 2008 | 6 | auto(s5) | f | 18 | 27 | r | compact |
toyota | corolla | 1.8 | 1999 | 4 | auto(l3) | f | 24 | 30 | r | compact |
toyota | corolla | 1.8 | 1999 | 4 | auto(l4) | f | 24 | 33 | r | compact |
toyota | corolla | 1.8 | 1999 | 4 | manual(m5) | f | 26 | 35 | r | compact |
toyota | corolla | 1.8 | 2008 | 4 | manual(m5) | f | 28 | 37 | r | compact |
toyota | corolla | 1.8 | 2008 | 4 | auto(l4) | f | 26 | 35 | r | compact |
toyota | land cruiser wagon 4wd | 4.7 | 1999 | 8 | auto(l4) | 4 | 11 | 15 | r | suv |
toyota | land cruiser wagon 4wd | 5.7 | 2008 | 8 | auto(s6) | 4 | 13 | 18 | r | suv |
toyota | toyota tacoma 4wd | 2.7 | 1999 | 4 | manual(m5) | 4 | 15 | 20 | r | pickup |
toyota | toyota tacoma 4wd | 2.7 | 1999 | 4 | auto(l4) | 4 | 16 | 20 | r | pickup |
toyota | toyota tacoma 4wd | 2.7 | 2008 | 4 | manual(m5) | 4 | 17 | 22 | r | pickup |
toyota | toyota tacoma 4wd | 3.4 | 1999 | 6 | manual(m5) | 4 | 15 | 17 | r | pickup |
toyota | toyota tacoma 4wd | 3.4 | 1999 | 6 | auto(l4) | 4 | 15 | 19 | r | pickup |
toyota | toyota tacoma 4wd | 4.0 | 2008 | 6 | manual(m6) | 4 | 15 | 18 | r | pickup |
toyota | toyota tacoma 4wd | 4.0 | 2008 | 6 | auto(l5) | 4 | 16 | 20 | r | pickup |
volkswagen | gti | 2.0 | 1999 | 4 | manual(m5) | f | 21 | 29 | r | compact |
volkswagen | gti | 2.0 | 1999 | 4 | auto(l4) | f | 19 | 26 | r | compact |
volkswagen | gti | 2.0 | 2008 | 4 | manual(m6) | f | 21 | 29 | p | compact |
volkswagen | gti | 2.0 | 2008 | 4 | auto(s6) | f | 22 | 29 | p | compact |
volkswagen | gti | 2.8 | 1999 | 6 | manual(m5) | f | 17 | 24 | r | compact |
volkswagen | jetta | 1.9 | 1999 | 4 | manual(m5) | f | 33 | 44 | d | compact |
volkswagen | jetta | 2.0 | 1999 | 4 | manual(m5) | f | 21 | 29 | r | compact |
volkswagen | jetta | 2.0 | 1999 | 4 | auto(l4) | f | 19 | 26 | r | compact |
volkswagen | jetta | 2.0 | 2008 | 4 | auto(s6) | f | 22 | 29 | p | compact |
volkswagen | jetta | 2.0 | 2008 | 4 | manual(m6) | f | 21 | 29 | p | compact |
volkswagen | jetta | 2.5 | 2008 | 5 | auto(s6) | f | 21 | 29 | r | compact |
volkswagen | jetta | 2.5 | 2008 | 5 | manual(m5) | f | 21 | 29 | r | compact |
volkswagen | jetta | 2.8 | 1999 | 6 | auto(l4) | f | 16 | 23 | r | compact |
volkswagen | jetta | 2.8 | 1999 | 6 | manual(m5) | f | 17 | 24 | r | compact |
volkswagen | new beetle | 1.9 | 1999 | 4 | manual(m5) | f | 35 | 44 | d | subcompact |
volkswagen | new beetle | 1.9 | 1999 | 4 | auto(l4) | f | 29 | 41 | d | subcompact |
volkswagen | new beetle | 2.0 | 1999 | 4 | manual(m5) | f | 21 | 29 | r | subcompact |
volkswagen | new beetle | 2.0 | 1999 | 4 | auto(l4) | f | 19 | 26 | r | subcompact |
volkswagen | new beetle | 2.5 | 2008 | 5 | manual(m5) | f | 20 | 28 | r | subcompact |
volkswagen | new beetle | 2.5 | 2008 | 5 | auto(s6) | f | 20 | 29 | r | subcompact |
volkswagen | passat | 1.8 | 1999 | 4 | manual(m5) | f | 21 | 29 | p | midsize |
volkswagen | passat | 1.8 | 1999 | 4 | auto(l5) | f | 18 | 29 | p | midsize |
volkswagen | passat | 2.0 | 2008 | 4 | auto(s6) | f | 19 | 28 | p | midsize |
volkswagen | passat | 2.0 | 2008 | 4 | manual(m6) | f | 21 | 29 | p | midsize |
volkswagen | passat | 2.8 | 1999 | 6 | auto(l5) | f | 16 | 26 | p | midsize |
volkswagen | passat | 2.8 | 1999 | 6 | manual(m5) | f | 18 | 26 | p | midsize |
volkswagen | passat | 3.6 | 2008 | 6 | auto(s6) | f | 17 | 26 | p | midsize |
To derive quantities describing an estimator we can use:
(Small data) example: mpg
dataset, cty
variable
Mean: \(16.8589744\), standard deviation: \(4.2559457\)
To derive quantities describing an estimator we can use:
(Small data) example: mpg
dataset, cty
variable
Mean: \(16.8589744\), standard error: \(0.2782199\)
Fundamental building block is Bayes’ Theorem. Let
Then \[ g(\mu | x) = \frac{f(x | \mu)g(\mu)}{f(x)} \]
If \(g\) measures our belief of possible distributions for \(\mu\), then Bayes rule provides a systematic update rule: how does new information change that belief.
By changing our notation, Bayes rule can be rewritten using likelihoods as:
\[ g(\mu | x) = c_x\mathcal L(\mu|x)g(\mu) \]
where \(c_x\) is a constant ensuring \(\int_\Omega g(\mu|x)d\mu = 1\).
When deciding between two specific points,
\[ \frac{g(\mu_1|x)}{g(\mu_2|x)} = \frac{g(\mu_1)}{g(\mu_2)}\cdot\frac{\mathcal L(\mu_1|x)}{\mathcal L(\mu_2|x)} \]
The posterior odds ratio is the prior odds ratio times the likelihood ratio
In Bayesian Inference, instead of a single value \(\hat\theta\) for a parameter, an entire probability distribution is estimated and updated.
Everything starts with the prior; the distribution that gets updated. This prior should preferably encode everything we know about the situation going in.
Even with a badly chosen prior, sufficiently consistent results will often quickly adjust the distribution.
Flip a coin to check if fair. Start with prior belief \(\mathbb{P}(H)\sim\text{Beta}(9,1)\) (mean \(\mathbb{P}(H)=0.9\))
If not enough information is present at the start, one way is to pick a prior designed to not encode assumptions.
An engineer makes 12 measurements, using a calibrated volt meter with normally distributed error, sd = 1.
92.50 | 91.12 | 92.37 | 92.17 | 93.95 | 92.16 |
90.43 | 92.33 | 92.45 | 91.99 | 92.05 | 92.68 |
She calculates \(\overline{x} = 92.18\), an unbiased estimate of the true voltage.
An engineer makes 12 measurements, using a calibrated volt meter with normally distributed error, sd = 1.
92.50 | 91.12 | 92.37 | 92.17 | 93.95 | 92.16 |
90.43 | 92.33 | 92.45 | 91.99 | 92.05 | 92.68 |
She calculates \(\overline{x} = 92.18\), an unbiased estimate of the true voltage.
The next day, she discovers the voltmeter truncates measurements at 100 - anything larger is reported as 100.
Is the estimate unbiased?
An engineer makes 12 measurements, using a calibrated volt meter with normally distributed error, sd = 1.
92.50 | 91.12 | 92.37 | 92.17 | 93.95 | 92.16 |
90.43 | 92.33 | 92.45 | 91.99 | 92.05 | 92.68 |
She calculates \(\overline{x} = 92.18\), an unbiased estimate of the true voltage.
The next day, she discovers the voltmeter truncates measurements at 100 - anything larger is reported as 100.
Is the estimate unbiased?
Frequentist answer: NO - because the probability family has changed.
An engineer makes 12 measurements, using a calibrated volt meter with normally distributed error, sd = 1.
92.50 | 91.12 | 92.37 | 92.17 | 93.95 | 92.16 |
90.43 | 92.33 | 92.45 | 91.99 | 92.05 | 92.68 |
She calculates \(\overline{x} = 92.18\), an unbiased estimate of the true voltage.
The next day, she discovers the voltmeter truncates measurements at 100 - anything larger is reported as 100.
Is the estimate unbiased?
Bayesian answer: YES - because the update rule only depends on the actual data points
Total Error = Bias + Variance + Irreducible Error
Speed can be increased by using more memory.
Memory footprint can be decreased by using more time.
More complex models adapt closer to training data.
More complex models may behave badly out-of-sample.