Some data sets to look at:
Florida: see how to predict with outlier removed (Bush vs Buchanan)
Ages: predictor: age, response maxrate.
Aptitude (Kitchens) apritcute predictor, productivity response.
Eggs: feed supplement, number of eggs.
AdSales: ad vs. sales
The basic model
$$Y_i = \beta_0 + \beta_1 X_i + \epsilon_i$$
Where $\epsilon_i$ are i.i.d. normal mean 0 variance sigma.
We saw that lest squares gives the following
$$\hat{y} = b_0 + b_1 x$$
where $b_0$ and $b_1$ predict $\beta_0$ and $\beta_1$. We had
$$b_1 = s_{xy}/s_x^2, \quad \bar{y} = b_0 + b_1 \bar{x}$$
First, How to check if the model is accurate? The key is the residual
$$e_i = y_i - \hat{y_i} = y_i - (b_0 + b_1 x_i)$$
Under the assumption, the $e_i$ are a i.i.d. sample from normal mean 0 and variance 1. We can check this graphically with:
prob. plot
histogram of residuals
Plot agains $i$ (time dependence)
Plot against $x_i$.
Look for normality in first two, in latter look for independence assumption.
If we are satisfied the model is correct, we can then make inferences about the parameters.
THis is an unbiased estimator of $\beta_1$ and furthermore
$$t = \frac{b_1 -\beta_1}{SE(b_1)}$$
is $t$ dist with $n-2$ d.f. where
$$SE(b_1) = \frac{s}{\sqrt{s_x^2}}$$
MINITAB performs a test for $\beta_1 = 0$ (no slope) in the regression analysis.We could do so our selves easily enough knowing the distribution of $t$ above.
A C.I. for $\beta_1$ would be then
$$b_1 \pm t^* SE(b_1)$$
Minitabl plots two of them. ONe is sytematically larger
Iw we now $x$ and want to predict the mean value of the corresponding $y$, then one needs to know that at $x'$ the CI is
$$b_0 + b_1 x' +\pm t^* SE(\hat{y})$$
where
$$SE(\hat{y}) = s \sqrt{ \frac{1}{n} + \frac{(x'-\bar{x})^2}{SS_x}}$$
Here one gets the simlar (except the error msg.s
the CI is
$$b_0 + b_1 x' +\pm t^* SE(\hat{y})$$
where
$$SE(\hat{y}) = s \sqrt{ 1 + \frac{1}{n} + \frac{(x'-\bar{x})^2}{SS_x}}$$