Lecture 2

29 January, 2018

Vector valued random variables

Nothing we have done with random variables really requires the variables to take values in \(\mathbb{R}\).

It will often be useful to work with random vectors: \(X:E\to\mathbb{R}^n\).

Here, the random vector produces a probability measure on Borel sets in \(\mathbb{R}^n\).

Each coordinate of the vector is a random variable. The expected value of the vector is the vector of expected values.

Covariance

The covariance matrix of a random vector \(X\) is the matrix \(Cov(X)_{ij} = Cov(X_i,X_j)\) whose entries are covariances between coordinates.

Write \((X-\mathbb{E}X)^T\) for the transpose of \(X-\mathbb{EX}\). The covariance matrix is \[ Cov(X) = \mathbb{E}\left[(X-\mathbb{E}X)(X-\mathbb{E}X)^T\right] \]

By a similar argument to the random variable case, \[ Cov(X) = \mathbb{E}(XX^T) - (\mathbb{E}X)(\mathbb{E}X^T) \]

Linearity

A random matrix is a matrix with random variables as entries, or equivalently \(W:E\to\mathbb{R}^{n\times m}\). Its expectation is the matrix of expectations.

If \(X\) is a random vector, \(W\) a random matrix, \(A, B, C\) constant matrices and \(v\) a constant vector:

\[ \mathbb{E}(v+AX) = v+A\mathbb{E}X \\ \mathbb{E}(A+BWC) = A+B(\mathbb{E}W)C \\ Cov(v+AX) = A\cdot Cov(X)\cdot A^T \]

Product measures

Suppose we have two measure spaces \((E, B, \mu)\) and \((E',B',\nu)\).

The product set \(E\times E'\) has an induced set of measurable sets \(B\vee B'\) consisting of all sets reachable by complement and countable union from the sets \(U\times V\) for \(U\in B\), \(V\in B'\).

This set can be equipped with a product measure \(\mu\times\nu\) defined as the product \((\mu\times\nu)(U\times V) = \mu(U)\nu(V)\).

Fubini's theorem

Theorem (Fubini) Integrating \(f\) over the product measure is the same thing as integrating against first one and then the other – if either \(f\) is non-negative or \(\int|f|d(\mu\times\nu)<\infty\):

\[ \int fd(\mu\times\nu) = \int\left[\int f(x,y)d\nu(y)\right]d\mu(x) \\ = \int\left[\int f(x,y)d\mu(x)\right]d\nu(y) \]

Joint probability

Any pair of random vectors \(X\in\mathbb{R}^n\) and \(Y\in\mathbb{R}^m\) form a random vector in \(\mathbb{R}^{n+m}\).

The joint probability of \(X\) and \(Y\) is the probability measure \(\mathbb{P}_{X,Y}\) on \(\mathbb{R}^{n+m}\).

IF this probability is a product measure, ie \(\mathbb{P}(X\in A, Y\in B)=\mathbb{P}(X\in A)\cdot\mathbb{P}(Y\in B)\), then \(X\) and \(Y\) are independent.

Independent random variables have a joint distribution function given as the product of distribution functions, and a joint density function given as the product of density function.

Independence

Theorem If \(X_1,\dots,X_n\) are independent random variables, and \(f_1,\dots,f_n\) are measurable functions, then \(f_1(X_1), \dots, f_n(X_n)\) are independent.

Proof For a measurable set \(U_1\times\dots\times U_n\), the set \(f_1^{-1}(U_1)\times\dots\times f_1^{-1}(U_1)\) is also measurable – the same holds for unions and complements of these.

The measure of these inverse images is the product of their measures, therefore so is the measure of \(U_1\times\dots\times U_n\).

Covariance and independence

If \(X\) and \(Y\) are independent random variables, \[ Cov(X,Y) = \mathbb{E}(XY)-(\mathbb{E}X)(\mathbb{E}Y) \] \[ \mathbb{E}(XY) = \int xy d\mathbb{P}_{X\times Y} \\ = \int\int xy d\mathbb{P}_X(x)d\mathbb{P}_Y(y) = \int\left(\int xd\mathbb{P}_X(x)\right)yd\mathbb{P}_Y(y) \\ = \int xd\mathbb{P}_X(x) \cdot \int yd\mathbb{P}_Y(y) = (\mathbb{E}X)(\mathbb{E}Y) \]

So \(Cov(X,Y) = 0\).

Example

We established that for two D6, captured as random variables \(X\) and \(X'\), \[ \mathbb{V}(X+X') = \mathbb{V}X + \mathbb{V}X' + Cov(X,X') \]

If dice throws are independent, \(Cov(X,X')=0\), and the variance comes out as \[ \mathbb{V}(X+X') = \mathbb{V}X + \mathbb{V}X' = 35/12 + 35/12 = 35/6 \simeq 5.83 \]

Conditional probability

Let \(X, Y\) be random vector. Suppose we observe \(X\) – and learn that \(X=x\). This limits the possible events in the product space: we no longer know that \(\mathbb{P}_Y\) is accurate.

For discrete random vectors, the classical formula holds: \[ \mathbb{P}(Y\in B | X=x) = \frac {\mathbb{P}(Y\in B, X=x)} {\mathbb{P}(X=x)} \]

This defines a probability measure on the measure space for \(Y\), called the conditional distribution for \(Y\) given \(X=x\).

Continuous distributions

For continuous distributions for \(X, Y\) we can define the conditional distribution to be any distribution that does The Right Thing (see Ch 6.2) – if we have density functions, we can compute it.

Let \(p_Z\) be the joint density function. The marginal density is \(p_X = \int p_Z(x,y)d\nu(y)\).

For an outcome \(x\), with positive density \(p_X(x) > 0\), we define the conditional density \[ p_{Y|X}(y | x) = \frac{p_Z(x,y)}{p_X(x)} \]

Conditioning on independent variable

Notice that if \(X\) and \(Y\) are independent, then \(p_Z(x,y) = p_X(x)\cdot p_Y(y)\), so \[ p_{Y|X}(y | x) = \frac{p_Z(x,y)}{p_X(x)} = \frac{p_X(x)\cdot p_Y(y)}{p_X(x)} = p_Y(y) \]

Conditioning on an independent variable does not change the probability distribution.

Conditional expectation

We get conditional expectations by integrating against conditional distributions

\[ \mathbb{E}\left[f(X,Y)\middle|X=x\right] = \int f(x,y) p_{Y|X}(y|x)dy \]

Theorem (Law of total probability; Tower property; Smoothing)

\[ \mathbb{E}f(X,Y) = \mathbb{E}\left[\mathbb{E}(f(X,Y)|X)\right] \]

The joint expectation is the expected value across all possible ways of conditioning on one variable.

Some probability distributions

Uniform

Uniform on an interval \((a,b)\). Parameters \(a, b\).

Density function \(\mathbb{1}_{(a,b)}\).

Mean and variance are homework.

Bernoulli

Single trial, probability \(p\) of success. Parameter \(p\).

Density function \(\mathbb{P}(1) = p\), \(\mathbb{P}(0) = 1-p\) and 0 otherwise.

Mean \(p\). Variance \(p\).

Binomial

\(n\) repeated trials, each a Bernoulli distribution with probability \(p\). Parameters \(n, p\).

Density function \(\mathbb{P}(k) = {n\choose k} p^k(1-p)^{n-k}\).

Mean \(np\). Variance \(np(1-p)\).

Poisson

Independently occurring events with constant rate \(\lambda\). The Poisson distribution counts events in a time period. Parameter \(\lambda\).

Density \(\mathbb{P}(k) = \frac{\lambda^k e^{-\lambda}}{k!}\)

Mean \(\lambda\). Variance \(\lambda\).

Exponential

Independently occurring events with constant rate \(\lambda\). The Exponential distribution counts time until next event. Parameter \(\lambda\).

Density function \(p(x) = \lambda e^{-\lambda x}\) for non-negative \(x\).

Mean \(1/\lambda\). Variance \(1/\lambda^2\).

Normal distribution

The limit distribution for sums (and averages) of iid random variables. Parameters \(\mu, \sigma^2\).

Density function \(p(x) = \frac{1}{\sqrt{2\pi\sigma^2}}\exp\left[-\frac{(x-\mu)^2}{2\sigma^2}\right]\)

Mean \(\mu\). Variance \(\sigma^2\).

z-score, t-score

If \(X_1, \dots, X_n \sim \mathcal{N}(\mu, \sigma^2)\), and we write \(\overline{X} = (X_1+\dots+X_n)/n\), then

\(Z = \frac{\overline{X}-\mu}{\sigma/\sqrt{n}}\) is the Z-score. \(Z \sim \mathcal{N}(0,1)\)

Write \(S^2 = \frac{\sum(X_j-\mu)^2}{n-1}\) for the sample variance. Then

\(T = \frac{\overline{X}-\mu}{S/\sqrt{n}}\) is the T-score. This is not a normal random variable.

Student's t-distribution

William Gosset. Derived the distribution of \(T = \frac{\overline{X}-\mu}{S/\sqrt{n}} \sim T(n-1)\). Parameter \(\nu=n-1\) the degrees of freedom.

Density function: Horrible. Check Wikipedia.

Mean 0 for \(\nu>1\). Undefined otherwise.

Variance \(\nu/(\nu/1)\) for \(\nu>2\); \(\infty\) for \(2\geq\nu>1\). Undefined otherwise.

Problem 1.11:21

\(X\) uniform on \((0,1)\) – ie \(p(x) = 1_{(0,1)}\). Let \(Y_1\) and \(Y_2\) be the first two digits of \(X\) written as a binary.

Question: \(\mathbb{P}(Y_1, Y_2)\)?

Problem 1.11:21

\(X\) uniform on \((0,1)\) – ie \(p(x) = \mathbb{1}_{(0,1)}\). Let \(Y_1\) and \(Y_2\) be the first two digits of \(X\) written as a binary.

Question: \(\mathbb{P}(Y_1, Y_2)\)?

Notice that the interval \((0,1)\) splits into four pieces

\[ (Y_1,Y_2) = \begin{cases} (0,0) & 0 < x < 1/4 \\ (0,1) & 1/4 \leq x < 1/2 \\ (1,0) & 1/2 \leq x < 3/4 \\ (1,1) & 3/4 \leq x < 1 \\ \end{cases} \]

Each of these pieces has length \(1/4\) – with the uniform distribution, each has probability \(1/4\).