4/2/2020
Recall that, for normally distributed residuals,
We can use this information to construct confidence intervals and hypothesis tests for the coefficients \(\beta_i\).
Large sample confidence intervals and tests can be constructed from any normally distributed statistics with known variance. Here, we get a test statistic \[ Z = \frac{\hat\beta_i-\beta_{i0}}{\sigma\sqrt{c_{ii}}}\sim\mathcal{N}(0,1) \]
Using this as a pivot and \(z_\alpha=F^{-1}_{\mathcal{N}(0,1)}(1-\alpha)\), we get confidence intervals \[ \beta_i\in\hat\beta_i\pm z_{\alpha/2}\sigma\sqrt{c_{ii}} \]
Notice that \(Z\) is normal, \(W=(n-2)S^2/\sigma^2\) is \(\chi^2(n-2)\), and these quantities are independent of each other.
\[ T = \frac{Z}{\sqrt{W/(n-2)}} = \frac{\hat\beta_i-\beta_{i0}}{\sigma\sqrt{c_{ii}}}\cdot\sqrt{\frac{(n-2)S^2/\sigma^2}{n-2}} = \\ \frac{\hat\beta_i-\beta_{i0}}{\sigma\sqrt{c_{ii}}}\cdot\frac{\sigma}{S} = \frac{\hat\beta_i-\beta_{i0}}{S\sqrt{c_{ii}}}\sim t(n-2) \]
\(T\) works as a test statistic for us, using the \(t(n-2)\) distribution to create cutoff values for the rejection regions. We can also use \(T\) as a pivot and with \(t_\alpha=F^{-1}_{t(n-2)}(1-\alpha)\), we get confidence intervals
\[ \beta_i\in\hat\beta_i\pm t_{\alpha/2}S\sqrt{c_{ii}} \]
We may have reason to seek inferences about linear combinations of the coefficients. One example is to make inferences about \(\beta_0+\beta_1x^*\) for some (new) value \(x^*\).
We can create an estimator \(\hat\theta=a_0\hat\beta_0+a_1\hat\beta_1\). By linearity of \(\mathbb{E}\), this is unbiased for \(\theta=a_0\beta_0+a_1\beta_1\).
We know values for \(\mathbb{V}\hat\beta_0, \mathbb{V}\hat\beta_1, \text{cov}(\hat\beta_0,\hat\beta_1)\) - these allow us to calculate \[ \mathbb{V}\hat\theta = a_0^2\mathbb{V}\hat\beta_0 + a_1^2\mathbb{V}\hat\beta_1 + 2a_0a_1\text{cov}(\hat\beta_0,\hat\beta_1) = \\ \sigma^2(a_0^2c_{00} + a_1^2c_{11} + 2a_0a_1c_{01}) \\ c_{00} = \sum x_i^2/nS_{xx} \qquad c_{01} = -\overline x/S_{xx} \qquad c_{11} = 1/S_{xx} \]
Knowing \(\mathbb{V}\hat\theta\) means we can create Wald tests and confidence intervals. Using the test statistic \[ Z = \frac{\hat\theta-\theta_0}{\sigma\sqrt{a_0^2c_{00}+a_1^2c_{11}+2a_0a_1c_{01}}} \sim\mathcal{N}(0,1) \] we use the usual rejection regions. Writing \(z_\alpha=F^{-1}_{\mathcal{N}(0,1)}(1-\alpha)\) we get confidence intervals \[ \theta\in\hat\theta\pm z_{\alpha/2}\sigma\sqrt{a_0^2c_{00}+a_1^2c_{11}+2a_0a_1c_{01}} \]
Since the standard error \(\sigma\sqrt{a_0^2c_{00}+a_1^2c_{11}+2a_0a_1c_{01}}\) has \(\sigma\) as a factor, we can do the usual manipulations and compute \[ T = \frac{Z}{\sqrt{W}/(n-2)} = Z\cdot\frac{\sigma}{S} =\\ \frac{\hat\theta-\theta_0}{S\sqrt{a_0^2c_{00}+a_1^2c_{11}+2a_0a_1c_{01}}} \sim t(n-2) \] with the usual rejection regions. Writing \(t_\alpha=F^{-1}_{t(n-2)}(1-\alpha)\) we get confidence intervals \[ \theta\in\hat\theta\pm t_{\alpha/2}S\sqrt{a_0^2c_{00}+a_1^2c_{11}+2a_0a_1c_{01}} \]
Of particular interest is the linear combination we get by evaluating the regression function at a new observation. In this case, \(a_0=1\) and \(a_1=x^*\). \[ \mathbb{V}\hat\theta = \sigma^2(a_0^2c_{00} + a_1^2c_{11} + 2a_0a_1c_{01}) = \\ \frac{\sigma^2}{S_{xx}}\left(a_0^2\frac{\sum x_i^2}{n}+a_1^2-2a_0a_1\overline x\right) = \\ \frac{\sigma^2}{S_{xx}}\left(\frac{\sum x_i^2}{n}+x^{*2}-2x^*\overline x\right) \]
Add and subtract \(\overline x^2\) to get \[ \frac{\sigma^2}{S_{xx}}\left(\color{green}{\frac{\sum x_i^2}{n}-\frac{n\overline x^2}{n}}+\color{purple}{\overline x^2+x^{*2}-2x^*\overline x}\right) = \\ \frac{\sigma^2}{S_{xx}}\left(\color{green}{\frac{S_{xx}}{n}}+\color{purple}{(x^*-\overline x)^2}\right) = \\ \sigma^2\left(\frac{1}{n}+\frac{(x^*-\overline x)^2}{S_{xx}}\right) \]
We get an estimation confidence interval of \[ Y^*=\beta_0+\beta_1x^*\in \hat\beta_0+\hat\beta_1x^* \pm t_{\alpha/2} S \sqrt{\frac{1}{n}+\frac{(x^*-\overline x)^2}{S_{xx}}} \]
We get an estimation confidence interval of \[ Y^*=\beta_0+\beta_1x^*\in \hat\beta_0+\hat\beta_1x^* \pm t_{\alpha/2} S \sqrt{\frac{1}{n}+\frac{(x^*-\overline x)^2}{S_{xx}}} \]
The width of this interval shrinks when:
It can be more interesting to predict a range for the next observation than a range for the long-term average value.
For the new value \(Y^*=\beta_0+\beta_1x^*+\epsilon\) we would want an estimator. One choice is \[ \hat Y^*=\hat\beta_0+\hat\beta_1x^* \]
This is simultaneously an estimator of the expected value \(\mathbb{E}Y\) and a predictor for the next value \(Y^*\).
We would seek to minimize the prediction error \(Y^*-\hat Y^*\).
\[ \mathbb{E}[Y^*-\hat Y^*] = \mathbb{E}Y^* - \mathbb{E}\hat Y^* = (\beta_0+\beta_1x^*) - (\beta_0+\beta_1x^*) = 0 \\ \mathbb{V}(Y^*-\hat Y^*) = \mathbb{V}Y^* + \mathbb{V}\hat Y^* - 2\text{cov}(Y^*,\hat Y^*) \]
Since the new observation is new, it is independent of the values for \(\hat\beta_0\) and \(\hat\beta_1\). Hence the covariance vanishes.
\[ \mathbb{V}(Y^*-\hat Y^*) = \mathbb{V}Y^* + \mathbb{V}\hat Y^* = \sigma^2 + \sigma^2\left(\frac{1}{n}+\frac{(x^*-\overline x)^2}{S_{xx}}\right) =\\ \sigma^2\left(1+\frac{1}{n}+\frac{(x^*-\overline x)^2}{S_{xx}}\right) \]
If \(\epsilon\sim\mathcal{N}(0,\sigma^2)\), then \(Y^*-\hat Y^*\) is also normal. We get a Wald statistic and a T-statistic \[ Z = \frac{Y^*-\hat Y^*}{\sigma\sqrt{1+\frac{1}{n}+\frac{(x^*-\overline x)^2}{S_{xx}}}} \text{ and } T = \frac{Y^*-\hat Y^*}{S\sqrt{1+\frac{1}{n}+\frac{(x^*-\overline x)^2}{S_{xx}}}} \] with \(Z\sim\mathcal{N}(0,1)\) and \(T\sim t(n-2)\). Using \(T\) as a pivot, we get prediction intervals \[ Y^*\in\hat\beta_0+\hat\beta_1x^*\pm t_{\alpha/2}S \sqrt{1+\frac{1}{n}+\frac{(x^*-\overline x)^2}{S_{xx}}} \]
\[ Y^*\in\hat\beta_0+\hat\beta_1x^*\pm t_{\alpha/2}S \sqrt{1+\frac{1}{n}+\frac{(x^*-\overline x)^2}{S_{xx}}} \]
Notice the \(1+\) in the square root here. Estimation interval width can shrink to very tight intervals - prediction interval widths have a lower bound of \(2z_{\alpha/2}\sigma\). Prediction cannot get better than this, no matter how much data is collected. The way to get better prediction is to reduce the inherent uncertainty.
Two different methods to measure LC50 for various toxicants.
toxicants = tribble( ~toxicant, ~flow, ~static, 1, 23.00, 39.00, 2, 22.30, 37.50, 3, 9.40, 22.20, 4, 9.70, 17.50, 5, .15, .64, 6, .28, .45, 7, .75, 2.62, 8, .51, 2.36, 9, 28.00, 32.00, 10, .39, .77 )
We are to fit \(\text{flow} = \beta_0 + \beta_1\cdot\text{static} + \epsilon\). This is handled in R
by the command lm
(l
inear m
odel):
model = lm( response ~ predictor, data=dataset)
model = lm( flow ~ static, data=toxicants) summary(model)
## ## Call: ## lm(formula = flow ~ static, data = toxicants) ## ## Residuals: ## Min 1Q Median 3Q Max ## -4.4356 -1.4347 -0.2905 0.5578 7.7430 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.71104 1.48824 -0.478 0.646 ## static 0.65525 0.06818 9.610 1.14e-05 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.313 on 8 degrees of freedom ## Multiple R-squared: 0.9203, Adjusted R-squared: 0.9103 ## F-statistic: 92.35 on 1 and 8 DF, p-value: 1.141e-05
Recall the assumptions we made when deriving the linear model:
We can check for homoscedasticity by examining a scatter plot of residuals vs. predicted values
We can check for normal residuals by examining a quantile-quantile plot of the residuals.
All of these quantities are available from the fitted model - most easily using broom::augment
.
broom::augment(model) %>% gf_point(.resid ~ .fitted)
broom::augment(model) %>% gf_qq(~.resid) %>% gf_qqline()
Estimation and prediction are handled in R
by the predict
command:
predict(model, predictor.dataset)
The command needs the model, and a dataset with values for all the predictors, and will return a vector of the predicted values.
Exercise 11.8 asks for an estimation of the flow-through LC50 where the static LC50 has been measured as 12ppm.
predict(model, data.frame(static=12))
## 1 ## 7.151994
Exercise 11.37 asks for a 95% confidence interval for the estimation of the flow-through LC50 where the static LC50 has been measured as 12ppm.
predict(model, data.frame(static=12), interval="confidence")
## fit lwr upr ## 1 7.151994 4.674353 9.629636
Exercise 11.45 asks for a 95% prediction interval of the flow-through LC50 where the static LC50 has been measured as 12ppm.
predict(model, data.frame(static=12), interval="prediction")
## fit lwr upr ## 1 7.151994 -0.8785919 15.18258
The prediction interval is quite a bit wider than the confidence interval.
By estimating and predicting along a grid of values, we can get the ingredients needed to plot prediction and estimation bands in the scatter plot of the data.
tox_pred = cbind(data.frame(static=seq(0,40)), predict(model, data.frame(static=seq(0,40)), interval="confidence"), predict(model, data.frame(static=seq(0,40)), interval="prediction")) names(tox_pred) <- c("static","flow","lwr_ci","upr_ci","fit","lwr_pi","upr_pi")
cbind
here adds new columns to a dataset.
By estimating and predicting along a grid of values, we can get the ingredients needed to plot prediction and estimation bands in the scatter plot of the data.
static | flow | lwr_ci | upr_ci | fit | lwr_pi | upr_pi |
---|---|---|---|---|---|---|
0 | -0.7110386 | -4.1429286 | 2.720851 | -0.7110386 | -9.085373 | 7.663296 |
1 | -0.0557858 | -3.3778317 | 3.266260 | -0.0557858 | -8.385708 | 8.274136 |
2 | 0.5994669 | -2.6166726 | 3.815606 | 0.5994669 | -7.688788 | 8.887721 |
3 | 1.2547197 | -1.8598532 | 4.369293 | 1.2547197 | -6.994654 | 9.504094 |
4 | 1.9099724 | -1.1078116 | 4.927756 | 1.9099724 | -6.303348 | 10.123293 |
5 | 2.5652252 | -0.3610219 | 5.491472 | 2.5652252 | -5.614905 | 10.745355 |
By estimating and predicting along a grid of values, we can get the ingredients needed to plot prediction and estimation bands in the scatter plot of the data.
tox_pred %>% gf_line(flow ~ static) %>% gf_ribbon(lwr_ci + upr_ci ~ static, alpha=0.5) %>% gf_ribbon(lwr_pi + upr_pi ~ static, alpha=0.5) %>% gf_point(flow ~ static, data=toxicants, color=~"Data")