4/3/2020
The multivariate normal distribution with mean vector \(\mathbf{\mu}\) and covariance matrix \(\mathbf{\Sigma}\) is a vector-valued distribution such that if \(Y\sim\mathcal{N}(\mathbf{\mu},\mathbf{\Sigma})\), then
The last condition is enough to characterize the multivariate normal distributions.
All conditional and marginal distributions of a multivariate normal distribution are multivariate normal distributions.
The PDF is given by \[ f_Y(y_1,\dots,y_d) = \frac{1}{\sqrt{(2\pi)^d\det\mathbf{\Sigma}}} \exp\left[\frac{-1}{2}(y-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(y-\mathbf{\mu})\right] \]
If \(d=1\), then \(\mathbf{\mu}=(\mu)\) and \(\Sigma=(\sigma^2)\), and we recover the usual normal distribution.
If \(d=2\), then \[ \mathbf{\mu}=\begin{pmatrix}\mu_X\\ \mu_Y\end{pmatrix} \quad \mathbf{\Sigma} = \begin{pmatrix} \sigma_X^2 & \rho\sigma_X\sigma_Y\\ \rho\sigma_X\sigma_Y & \sigma_Y^2 \end{pmatrix} \\ f(x,y) = \frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}\times\\ \hspace{-1.5em}\times\exp\left[ \frac{-1}{2(1-\rho^2)}\left( \frac{(x-\mu_X)^2}{\sigma_X^2} + \frac{(y-\mu_Y)^2}{\sigma_Y^2} + \frac{2\rho(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} \right) \right] \]
If \((X,Y)\sim\mathcal{N}(\mu,\mathbf{\Sigma})\) is a bivariate normal distribution, then \(\rho=0\) if and only if \(X, Y\) are independent.
\[ f(x,y) = \frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1\color{red}{-\rho^2}}}\times\\ \hspace{-1.5em}\times\exp\left[ \frac{-1}{2(1\color{red}{-\rho^2})}\left( \frac{(x-\mu_X)^2}{\sigma_X^2} + \frac{(y-\mu_Y)^2}{\sigma_Y^2} + \color{red}{\frac{2\rho(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y}} \right) \right] = \\ \hspace{-1.5em} f(x,y)\!=\!\frac{1}{2\pi\sigma_X\sigma_Y} \exp\left[ \frac{-1}{2}\left(\!\! \frac{(x-\mu_X)^2}{\sigma_X^2}\!+\! \frac{(y-\mu_Y)^2}{\sigma_Y^2}\!\! \right) \right]\!=\!f_X(x)\cdot f_Y(y) \]
We have been assuming that the \(x_i\) are deterministic; chosen by the researcher. We could instead study a setup where \((X,Y)\) is a random vector. We then use \[ Y = \beta_0+\beta_1X+\epsilon \quad\text{ to signify }\quad \mathbb{E}[Y|X=x] = \beta_0+\beta_1x \]
It is easier if we assume \((X,Y)\sim\mathcal{N}(\mu,\mathbf{\Sigma})\) is bivariate normal. Then, by the formulas for conditional expectations, \[ \mathbb{E}[Y|X=x] = \mu_Y + \rho\frac{\sigma_Y}{\sigma_X}(x-\mu_X) = \\ = \color{green}{\rho\frac{\sigma_Y}{\sigma_X}}\cdot x + \color{purple}{\mu_Y-\rho\frac{\sigma_Y}{\sigma_X}\mu_X} = \color{green}{\beta_1}\cdot x + \color{purple}{\beta_0} \]
The definition of \(\beta_1=\rho\sigma_Y/\sigma_X\) and \(\beta_0=\mu_Y-\beta_1\mu_X\) closely mirrors the development of \(\hat\beta_0\) and \(\hat\beta_1\).
In fact, our work with deterministic \(x_i\) carries over without major changes to random \(X_i\).
It may be of more interest to study whether or not \(X\) and \(Y\) are independent, than to study the coefficients of \(Y\) as a linear function in \(X\).
For bivariate normal variables, independence is the same as \(\rho=0\).
Hence, hypothesis testing on \(H_0:\rho=0\) is important.
Since \(\beta_1=\rho\sigma_Y/\sigma_X\), it follows that \[ \rho=\beta_1\frac{\sigma_X}{\sigma_Y} = \beta_1\sqrt{\frac{\sigma_X^2}{\sigma_Y^2}} \]
By the method of moments, using \(\hat\beta_1\) to estimate \(\beta_1\), we get an estimator \[ r = \hat\beta_1\sqrt{\frac{S_X^2}{S_Y^2}} = \hat\beta_1\sqrt{\frac{S_{XX}/n}{S_{YY}/n}} = \hat\beta_1\sqrt{\frac{S_{XX}}{S_{YY}}} = \\ \frac{S_{XY}}{S_{XX}}\sqrt{\frac{S_{XX}}{S_{YY}}} = \frac{S_{XY}}{\sqrt{S_{XX}S_{YY}}} \]
If we knew a distribution for \(r\), we could build a test directly. Instead, we will go through \(\beta_1=\rho\sigma_Y/\sigma_X\). These have the same sign - so testing eg for \(H_A:\rho>0\) is equivalent to testing for \(H_A:\beta_1>0\).
We worked out a test statistic for \(H_A:\beta_1>0\):
\[ T = \frac{\hat\beta_1-0}{S/\sqrt{S_{XX}}}\sim t(n-2) \text{ under }H_0:\beta_1=0 \]
Recall that \(SSE=S_{YY}-\hat\beta_1S_{XY}\) and that \(S^2=SSE/(n-2)\). Then,
\[ T = \frac{\hat\beta_1}{\sqrt{\color{purple}{S^2}/S_{XX}}} = \frac{\hat\beta_1\color{purple}{\sqrt{n-2}}}{\sqrt{\color{purple}{SSE}/S_{XX}}} = \frac{\hat\beta_1\sqrt{S_{XX}}\sqrt{n-2}}{\sqrt{\color{green}{SSE}}} = \\ \frac{\hat\beta_1\sqrt{S_{XX}}\sqrt{n-2}}{\sqrt{\color{green}{S_{YY}-\hat\beta_1S_{XY}}}} = \frac{\color{purple}{\hat\beta_1\sqrt{S_{XX}/S_{YY}}}\sqrt{n-2}}{\sqrt{1-\color{green}{\hat\beta_1S_{XY}/S_{YY}}}} = \frac{\color{purple}{r}\sqrt{n-2}}{\sqrt{1-\color{green}{r^2}}} \]
Here, we used \[ \hat\beta_1\sqrt{S_{XX}/S_{YY}} = S_{XY}/S_{XX}\cdot\sqrt{S_{XX}/S_{YY}} = S_{XY}/(S_{XX}S_{YY}) = r \\ \hat\beta_1\frac{S_{XY}}{S_{YY}} = \frac{S_{XY}}{S_{XX}}\cdot\frac{S_{XY}}{S_{YY}} = \frac{S_{XY}}{\sqrt{S_{XX}S_{YY}}}\cdot\frac{S_{XY}}{\sqrt{S_{XX}S_{YY}}} = r^2 \]
Since \(\frac{\hat\beta_1}{S/\sqrt{S_{XX}}} = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}}\), it follows that \[ T=\frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \sim t(n-2) \] can be used to create T-tests for \(\beta_1\), and therefore for \(\rho\).
A confidence interval for \(T\) produces a confidence interval for \(\rho\) by solving for \(r\): \[ \begin{aligned} T &= r\sqrt{n-2}/\sqrt{1-r^2} \\ T^2 &= r^2(n-2)/(1-r^2) \\ T^2-r^2T^2 &= r^2(n-2) \\ T^2 &= r^2(T^2+n-2) \\ r &= T/\sqrt{T^2+n-2} \end{aligned} \]
Then, if \(t_{\ell}<T<t_u\) with specified probability, then \(\rho\) is between \(t_\ell/\sqrt{t_\ell^2+n-2}\) and \(t_u/\sqrt{t_u^2+n-2}\).
Recall \(\hat\beta_1=S_{XY}/S_{XX}\) and \(SSE=S_{YY}-\hat\beta_1S_{XY}\).
In particular, \(\hat\beta_1S_{XY} = S_{XY}^2/S_{XX}\) is a sum of squares, and therefore positive. This means \(SSE\leq S_{YY}\).
\(SSE = \sum(y_i-\hat y_i)^2\) measures the variance of the residuals - the variance that is left once the effect of the linear regression has been removed.
\(S_{YY} = \sum(y_i-\overline y)^2\) measures the variance of the \(y_i\) - the total variance ignoring any effect from the linear regression.
So \(SSE/S_{YY}\) is the proportion of variance that was explained by the residuals and not the linear regression.
Now, \[ r^2 = \left(\frac{S_{XY}}{\sqrt{S_{XX}S_{YY}}}\right)^2 = \frac{S_{XY}}{S_{XX}}\cdot\frac{S_{XY}}{S_{YY}} = \frac{\hat\beta_1S_{XY}}{S_{YY}} \]
Since \(SSE = S_{YY}-\hat\beta_1S_{XY}\), it follows \(\hat\beta_1S_{XY} = S_{YY}-SSE\). \[ r^2 = \frac{\hat\beta_1S_{XY}}{S_{YY}} = \frac{S_{YY}-SSE}{S_{YY}} = 1-\frac{SSE}{S_{YY}} \]
So \(r^2\) is the proportion of variance that was explained by the linear regression.
We are continuing with the toxicants
data from last week:
toxicants = tribble( ~toxicant, ~flow, ~static, 1, 23.00, 39.00, 2, 22.30, 37.50, 3, 9.40, 22.20, 4, 9.70, 17.50, 5, .15, .64, 6, .28, .45, 7, .75, 2.62, 8, .51, 2.36, 9, 28.00, 32.00, 10, .39, .77 )
We are asked to test \(\rho\neq0\) at \(\alpha=0.01\).
This is handled in R by the cor.test
function:
cor.test(toxicants$flow, toxicants$static, conf.level=0.99)
## ## Pearson's product-moment correlation ## ## data: toxicants$flow and toxicants$static ## t = 9.6099, df = 8, p-value = 1.141e-05 ## alternative hypothesis: true correlation is not equal to 0 ## 99 percent confidence interval: ## 0.7458948 0.9940916 ## sample estimates: ## cor ## 0.9593121