Some data sets to look at:

The basic model

$$Y_i = \beta_0 + \beta_1 X_i + \epsilon_i$$

Where $\epsilon_i$ are i.i.d. normal mean 0 variance sigma.

We saw that lest squares gives the following

$$\hat{y} = b_0 + b_1 x$$

where $b_0$ and $b_1$ predict $\beta_0$ and $\beta_1$. We had

$$b_1 = s_{xy}/s_x^2, \quad \bar{y} = b_0 + b_1 \bar{x}$$

First, How to check if the model is accurate? The key is the residual

$$e_i = y_i - \hat{y_i} = y_i - (b_0 + b_1 x_i)$$

Under the assumption, the $e_i$ are a i.i.d. sample from normal mean 0 and variance 1. We can check this graphically with:

  1. prob. plot

  2. histogram of residuals

  3. Plot agains $i$ (time dependence)

  4. Plot against $x_i$.

Look for normality in first two, in latter look for independence assumption.

If we are satisfied the model is correct, we can then make inferences about the parameters.

$b_1$

THis is an unbiased estimator of $\beta_1$ and furthermore

$$t = \frac{b_1 -\beta_1}{SE(b_1)}$$

is $t$ dist with $n-2$ d.f. where

$$SE(b_1) = \frac{s}{\sqrt{s_x^2}}$$

MINITAB performs a test for $\beta_1 = 0$ (no slope) in the regression analysis.We could do so our selves easily enough knowing the distribution of $t$ above.

A C.I. for $\beta_1$ would be then

$$b_1 \pm t^* SE(b_1)$$

Confidence intervals

Minitabl plots two of them. ONe is sytematically larger

Confidence interval for the mean

Iw we now $x$ and want to predict the mean value of the corresponding $y$, then one needs to know that at $x'$ the CI is

$$b_0 + b_1 x' +\pm t^* SE(\hat{y})$$

where

$$SE(\hat{y}) = s \sqrt{ \frac{1}{n} + \frac{(x'-\bar{x})^2}{SS_x}}$$

COntrast this to prediction of a single element.

Here one gets the simlar (except the error msg.s

the CI is

$$b_0 + b_1 x' +\pm t^* SE(\hat{y})$$

where

$$SE(\hat{y}) = s \sqrt{ 1 + \frac{1}{n} + \frac{(x'-\bar{x})^2}{SS_x}}$$