Lecture 21

4/3/2020

Multivariate and Bivariate Normal Distributions

The multivariate normal distribution

The multivariate normal distribution with mean vector \(\mathbf{\mu}\) and covariance matrix \(\mathbf{\Sigma}\) is a vector-valued distribution such that if \(Y\sim\mathcal{N}(\mathbf{\mu},\mathbf{\Sigma})\), then

\(\mathbb{E}(Y_i) = \mathbf{\mu}_i\)
\(\text{cov}(Y_i,Y_j) = \mathbf{\Sigma}_{ij}\)
All linear combinations of the \(Y_i\) have a normal distribution

The last condition is enough to characterize the multivariate normal distributions.

The multivariate normal distribution

All conditional and marginal distributions of a multivariate normal distribution are multivariate normal distributions.

The PDF is given by \[ f_Y(y_1,\dots,y_d) = \frac{1}{\sqrt{(2\pi)^d\det\mathbf{\Sigma}}} \exp\left[\frac{-1}{2}(y-\mathbf{\mu})^T\mathbf{\Sigma}^{-1}(y-\mathbf{\mu})\right] \]

The univariate and bivariate normal distributions

If \(d=1\), then \(\mathbf{\mu}=(\mu)\) and \(\Sigma=(\sigma^2)\), and we recover the usual normal distribution.

If \(d=2\), then \[ \mathbf{\mu}=\begin{pmatrix}\mu_X\\ \mu_Y\end{pmatrix} \quad \mathbf{\Sigma} = \begin{pmatrix} \sigma_X^2 & \rho\sigma_X\sigma_Y\\ \rho\sigma_X\sigma_Y & \sigma_Y^2 \end{pmatrix} \\ f(x,y) = \frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1-\rho^2}}\times\\ \hspace{-1.5em}\times\exp\left[ \frac{-1}{2(1-\rho^2)}\left( \frac{(x-\mu_X)^2}{\sigma_X^2} + \frac{(y-\mu_Y)^2}{\sigma_Y^2} + \frac{2\rho(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y} \right) \right] \]

Non-Correlation implies Independence

Theorem

If \((X,Y)\sim\mathcal{N}(\mu,\mathbf{\Sigma})\) is a bivariate normal distribution, then \(\rho=0\) if and only if \(X, Y\) are independent.

Proof

\[ f(x,y) = \frac{1}{2\pi\sigma_X\sigma_Y\sqrt{1\color{red}{-\rho^2}}}\times\\ \hspace{-1.5em}\times\exp\left[ \frac{-1}{2(1\color{red}{-\rho^2})}\left( \frac{(x-\mu_X)^2}{\sigma_X^2} + \frac{(y-\mu_Y)^2}{\sigma_Y^2} + \color{red}{\frac{2\rho(x-\mu_X)(y-\mu_Y)}{\sigma_X\sigma_Y}} \right) \right] = \\ \hspace{-1.5em} f(x,y)\!=\!\frac{1}{2\pi\sigma_X\sigma_Y} \exp\left[ \frac{-1}{2}\left(\!\! \frac{(x-\mu_X)^2}{\sigma_X^2}\!+\! \frac{(y-\mu_Y)^2}{\sigma_Y^2}\!\! \right) \right]\!=\!f_X(x)\cdot f_Y(y) \]

Random Predictors

We have been assuming that the \(x_i\) are deterministic; chosen by the researcher. We could instead study a setup where \((X,Y)\) is a random vector. We then use \[ Y = \beta_0+\beta_1X+\epsilon \quad\text{ to signify }\quad \mathbb{E}[Y|X=x] = \beta_0+\beta_1x \]

Bivariate Normal

It is easier if we assume \((X,Y)\sim\mathcal{N}(\mu,\mathbf{\Sigma})\) is bivariate normal. Then, by the formulas for conditional expectations, \[ \mathbb{E}[Y|X=x] = \mu_Y + \rho\frac{\sigma_Y}{\sigma_X}(x-\mu_X) = \\ = \color{green}{\rho\frac{\sigma_Y}{\sigma_X}}\cdot x + \color{purple}{\mu_Y-\rho\frac{\sigma_Y}{\sigma_X}\mu_X} = \color{green}{\beta_1}\cdot x + \color{purple}{\beta_0} \]

The definition of \(\beta_1=\rho\sigma_Y/\sigma_X\) and \(\beta_0=\mu_Y-\beta_1\mu_X\) closely mirrors the development of \(\hat\beta_0\) and \(\hat\beta_1\).

In fact, our work with deterministic \(x_i\) carries over without major changes to random \(X_i\).

Inference on Correlations

Correlation for Independence

It may be of more interest to study whether or not \(X\) and \(Y\) are independent, than to study the coefficients of \(Y\) as a linear function in \(X\).

For bivariate normal variables, independence is the same as \(\rho=0\).

Hence, hypothesis testing on \(H_0:\rho=0\) is important.

Estimating \(\rho\)

Since \(\beta_1=\rho\sigma_Y/\sigma_X\), it follows that \[ \rho=\beta_1\frac{\sigma_X}{\sigma_Y} = \beta_1\sqrt{\frac{\sigma_X^2}{\sigma_Y^2}} \]

By the method of moments, using \(\hat\beta_1\) to estimate \(\beta_1\), we get an estimator \[ r = \hat\beta_1\sqrt{\frac{S_X^2}{S_Y^2}} = \hat\beta_1\sqrt{\frac{S_{XX}/n}{S_{YY}/n}} = \hat\beta_1\sqrt{\frac{S_{XX}}{S_{YY}}} = \\ \frac{S_{XY}}{S_{XX}}\sqrt{\frac{S_{XX}}{S_{YY}}} = \frac{S_{XY}}{\sqrt{S_{XX}S_{YY}}} \]

Test Statistic

If we knew a distribution for \(r\), we could build a test directly. Instead, we will go through \(\beta_1=\rho\sigma_Y/\sigma_X\). These have the same sign - so testing eg for \(H_A:\rho>0\) is equivalent to testing for \(H_A:\beta_1>0\).

We worked out a test statistic for \(H_A:\beta_1>0\):

\[ T = \frac{\hat\beta_1-0}{S/\sqrt{S_{XX}}}\sim t(n-2) \text{ under }H_0:\beta_1=0 \]

T-statistic from \(r\)

Recall that \(SSE=S_{YY}-\hat\beta_1S_{XY}\) and that \(S^2=SSE/(n-2)\). Then,

\[ T = \frac{\hat\beta_1}{\sqrt{\color{purple}{S^2}/S_{XX}}} = \frac{\hat\beta_1\color{purple}{\sqrt{n-2}}}{\sqrt{\color{purple}{SSE}/S_{XX}}} = \frac{\hat\beta_1\sqrt{S_{XX}}\sqrt{n-2}}{\sqrt{\color{green}{SSE}}} = \\ \frac{\hat\beta_1\sqrt{S_{XX}}\sqrt{n-2}}{\sqrt{\color{green}{S_{YY}-\hat\beta_1S_{XY}}}} = \frac{\color{purple}{\hat\beta_1\sqrt{S_{XX}/S_{YY}}}\sqrt{n-2}}{\sqrt{1-\color{green}{\hat\beta_1S_{XY}/S_{YY}}}} = \frac{\color{purple}{r}\sqrt{n-2}}{\sqrt{1-\color{green}{r^2}}} \]

Here, we used \[ \hat\beta_1\sqrt{S_{XX}/S_{YY}} = S_{XY}/S_{XX}\cdot\sqrt{S_{XX}/S_{YY}} = S_{XY}/(S_{XX}S_{YY}) = r \\ \hat\beta_1\frac{S_{XY}}{S_{YY}} = \frac{S_{XY}}{S_{XX}}\cdot\frac{S_{XY}}{S_{YY}} = \frac{S_{XY}}{\sqrt{S_{XX}S_{YY}}}\cdot\frac{S_{XY}}{\sqrt{S_{XX}S_{YY}}} = r^2 \]

T-test for correlation

Since \(\frac{\hat\beta_1}{S/\sqrt{S_{XX}}} = \frac{r\sqrt{n-2}}{\sqrt{1-r^2}}\), it follows that \[ T=\frac{r\sqrt{n-2}}{\sqrt{1-r^2}} \sim t(n-2) \] can be used to create T-tests for \(\beta_1\), and therefore for \(\rho\).

Confidence Interval for \(\rho\)

A confidence interval for \(T\) produces a confidence interval for \(\rho\) by solving for \(r\): \[ \begin{aligned} T &= r\sqrt{n-2}/\sqrt{1-r^2} \\ T^2 &= r^2(n-2)/(1-r^2) \\ T^2-r^2T^2 &= r^2(n-2) \\ T^2 &= r^2(T^2+n-2) \\ r &= T/\sqrt{T^2+n-2} \end{aligned} \]

Then, if \(t_{\ell}<T<t_u\) with specified probability, then \(\rho\) is between \(t_\ell/\sqrt{t_\ell^2+n-2}\) and \(t_u/\sqrt{t_u^2+n-2}\).

\(r^2\) - the Coefficient of Determination

Bounding SSE

Recall \(\hat\beta_1=S_{XY}/S_{XX}\) and \(SSE=S_{YY}-\hat\beta_1S_{XY}\).

In particular, \(\hat\beta_1S_{XY} = S_{XY}^2/S_{XX}\) is a sum of squares, and therefore positive. This means \(SSE\leq S_{YY}\).

Variances

\(SSE = \sum(y_i-\hat y_i)^2\) measures the variance of the residuals - the variance that is left once the effect of the linear regression has been removed.

\(S_{YY} = \sum(y_i-\overline y)^2\) measures the variance of the \(y_i\) - the total variance ignoring any effect from the linear regression.

So \(SSE/S_{YY}\) is the proportion of variance that was explained by the residuals and not the linear regression.

Coefficient of Determination

Now, \[ r^2 = \left(\frac{S_{XY}}{\sqrt{S_{XX}S_{YY}}}\right)^2 = \frac{S_{XY}}{S_{XX}}\cdot\frac{S_{XY}}{S_{YY}} = \frac{\hat\beta_1S_{XY}}{S_{YY}} \]

Since \(SSE = S_{YY}-\hat\beta_1S_{XY}\), it follows \(\hat\beta_1S_{XY} = S_{YY}-SSE\). \[ r^2 = \frac{\hat\beta_1S_{XY}}{S_{YY}} = \frac{S_{YY}-SSE}{S_{YY}} = 1-\frac{SSE}{S_{YY}} \]

So \(r^2\) is the proportion of variance that was explained by the linear regression.

Example (11.51)

The Data

We are continuing with the toxicants data from last week:

toxicants = tribble(
  ~toxicant, ~flow, ~static,
  1, 23.00, 39.00,
  2, 22.30, 37.50,
  3,  9.40, 22.20,
  4,  9.70, 17.50,
  5,   .15,   .64,
  6,   .28,   .45,
  7,   .75,  2.62,
  8,   .51,  2.36,
  9, 28.00, 32.00,
  10,  .39,   .77
)

The Question

We are asked to test \(\rho\neq0\) at \(\alpha=0.01\).

This is handled in R by the cor.test function:

cor.test(toxicants$flow, toxicants$static, conf.level=0.99)

## 
##  Pearson's product-moment correlation
## 
## data:  toxicants$flow and toxicants$static
## t = 9.6099, df = 8, p-value = 1.141e-05
## alternative hypothesis: true correlation is not equal to 0
## 99 percent confidence interval:
##  0.7458948 0.9940916
## sample estimates:
##       cor 
## 0.9593121