Return to the task of solving a system of equations:
$$~ \begin{align} a_{11} x_1 + a_{12}x_2 + \cdots a_{1n} x_n &= b_1\\ a_{21} x_1 + a_{22}x_2 + \cdots a_{2n} x_n &= b_2\\ &\vdots\\ a_{m1} x_1 + a_{m2}x_2 + \cdots a_{mn} x_n &= b_m \end{align} ~$$Which we wrote as $Ax=b$.
If we have equations resulting in $A$ being a diagonal matrix, then we have basically:
$$~ \begin{align} a_{11}x_1 &= b_1\\ a_{22}x_2 &= b_2\\ & \vdots\\ a_{nn}x_n &= b_n \end{align} ~$$This has easy solutions, namely. If $a_{ii} \neq 0$, then $x_i = b_i/a_{ii}$.
If an $a_{ii} = 0$, then the determinant of $A$ is $0$, and there is not a unique solution.
If $A$ is lower triangular, that is ($a_{ij} = 0$ if $j > i$) or:
$$~ A = \left( \begin{array}{cccc} a_{11} & 0 & \cdots & 0\\ a_{21} & a_{22} & \cdots & 0\\ & \vdots & & \\ a_{m1} & a_{m2} & \cdots & a_{mn}\\ \end{array} \right) ~$$Then we can solve recursively
It is important that we have $a_{ii} \neq 0$, as otherwise we will have issues dividing. But this will be the case if $det(A) \neq 0$. (Why?)
This method is called forward substitution
A matrix $U$ is upper triangular if $u_{ij} = 0$ if $i < j$. For example:
$$~ A = \left( \begin{array}{cccc} a_{11} & a_{12} & \cdots & a_{1, n-1} & a_{1n}\\ 0 & a_{22} & \cdots & a_{2,n-1} &a_{2n}\\ &&\vdots&&\\ 0 & 0 &\cdots & a_{n-1,n-1} & a_{n-1,n}\\ 0 & 0 &\cdots & 0 & a_{nn}\\ \end{array} \right) ~$$Then solving $Ax=b$ can be done by working backwards:
$$~ x_n = b_n / a_{nn} ~$$and from here:
$$~ x_{n-1} = (b_{n-1} - a_{n-1, n}x_n)/a_{n-1, n-1} = (b_{n-1} - a_{n-1, n}) \cdot b_n / a_{nn}/a_{n-1, n-1} ~$$And in general
$$~ x_i = (b_i - \sum_{j=i+1}^n a_{ij} x_j) / a_{ii}. ~$$Again, we need $a_{ii} = 0$.
Consider
$$~ A = \begin{array}{ccc} 1 & 2 & 3\\ 0 & 0 & 4\\ 0 & 5 & 6 \end{array} ~$$Clearly if we permuted rows 2 and 3 this would be upper triangular, so we could solve easily by: first solving row 2, then use that to solve row 3 and then finally row 4.
Define $p = [p_1, p_2, \cdots, p_n]$, to be a permutation vector if the mapping $i \rightarrow p_i$ maps the set $1, \dots, n$ to itself in a bijective manner and the matrix $(\alpha_{ij}) = (a_{p_i j})$ is either upper or lower triangular.
(In the above we would have $p_1 = 1, p_2 = 3$, and $p_3 = 2$.)
Then clearly we could solve the permuted system of equations. For example, in the case that we end up lower triangular, so that forward substitution works, we would have:
$$~ x_i = (b_{p_i} - \sum_{j=1}^{i-1} a_{p_i j}) / a_{p_i i}. ~$$Suppose we knew that $A=LU$, then we can solve $Ax = b$ easily by:
So if we can factorize $A = LU$, we can easily solve $Ax = b$.
In Julia (and MATLAB) there is a built in solver for these problems:
U = [1 2; 0 1]
2x2 Array{Int64,2}: 1 2 0 1
b = [1, 3] x = U \ b
2-element Array{Float64,1}: -5.0 3.0
U*x - b
2-element Array{Float64,1}: 0.0 0.0
In fact, there are many different methods depending on assumptions. For example, rationals:
U = [1//1 2; 0 1] U \ b
2-element Array{Rational{Int64},1}: -5//1 3//1
There are special methods for many others...
Suppose $A = LU$, then we have:
$$~ a_{ij} = l_{i1} u_{1j} + l_{i2} u_{2j} + \cdots l_{in} u_{nj} ~$$But:
So
$$~ a_{ij} = \sum_{s = 1}^{min(i, j)} l_{is} u_{sj}. ~$$Now to prove we can find $LU = A$. We will do so by induction. Suppose we know the first $k-1$ columns of $L$ and the first $k-1$ rows of $U$. We then have:
$$~ a_{kk} = \sum_{s=1}^{min(k,k)} \cdot = \sum_{s=1}^k \cdot = \sum_{s=1}^{k-1} l_{ks}u_{sk} + l_{kk} u_{kk}. ~$$The first part of the right hand sum involves columns of $L$ for which $s < k$ and rows of $U$ or which $s < k$. So all values are known by our assumption. So if $l_{kk}$ is known (say assumed to 1 or some other non-zero value) we can solve for $u_{kk}$ in terms of known values. To be explicit:
$$~ u_kk = (a_kk - \sum_{s=1}^{k-1} l_{ks}u_{sk} ) / l_{kk}. ~$$Then to fill out the $k$ row of $U$, we consider for $j > k$ (for which $min(j,k) = l$):
$$~ a_{kj} = \sum_{s=1}^{k-1} l_{ks} u_{sj} + l_{kk} u_{kj} ~$$The sum is of known values and $l_{kk}$ is known, so for each $j$, as specified, we can solve for $u_{kj}$.
Similarly, for the $k$ column of $L$, we consider for $j > k$
$$~ a_{jk} = \sum_{s=1}^{k-1} l_{js} u_{sk} + l_{jk} u_{kk} ~$$As before, the sum is known, and here, so is $u_{kk}$, so we can solve for $l_{jk}$ when $j > k$. That is we can fill in the $k$ column.
Let's look at this matrix
A = [1 1 1; 1 2 2; 1 2 3]
3x3 Array{Int64,2}: 1 1 1 1 2 2 1 2 3
We need to fill in $U$ and $L$. We start with a zeros:
L = zero(A) U = zero(A)
3x3 Array{Int64,2}: 0 0 0 0 0 0 0 0 0
Now we fill in: we have $1 = a_{11} = l_{11} u_{11}$ so we can take each to be 1:
L[1,1] = 1 U[1,1] = 1
1
Then for $U$ we need to fill in $u_{12}$ and $u_{23}$. For these we have
$$~ a_{12} = 1 = (0) + l_{11} u_{12} = 1 u_{12}, \quad a_{13} = 1 = (0) + l_{11} u_{13} = 1 u_{13} ~$$So both are 1:
U[1,2] = 1 U[1,3] = 1
1
And for the first row of $L$ we have:
$$~ a_{21} = 1 = (0) + l_{21}u_{11} = l_{21}, \quad a_{31} = 1 = (0) + l_{31}u_{11} = l_{31} ~$$So ditto:
L[2,1] = 1 L[3,1] = 1
1
Moving on to $k=2$ gives first the diagonal terms:
$$~ a_{22} = (l_{21} u_{12}) + l_{22} u_{22}, \quad\text{or} 2 = (1 \cdot 1) + l_{22} u_{22} ~$$We can take both to be $1$:
L[2,2] = 1 U[2,2] = 1
1
And to fill in for $j > 2$:
$$~ a_{23} = 2 = (l_{21} u_{13}) + l_{22}u_{23} = (1\cdot 1) + 1 u_{23}, ~$$So $u_{23} = 1$
U[2,3] = 1
1
And from
$$~ a_{32} = 2 = (l_{31} u_{12}) + l_{32} u_{22} = (1 \cdot 1) + l_{32} \cdot 1 ~$$So $l_{32} = 1$:
L[3,2] = 1
1
Finally, we need to find the last diagonal terms:
$$~ a_{33} = 3 = (l_{31}u_{13} + l_{32}u_{23}) + l_{33} u_{33} = (1\cdot 1 + 1\cdot 1) + l_{33} u_{33} ~$$So we can take each to be $1$:
L[3,3] = 1 U[3,3] = 1
1
L
3x3 Array{Int64,2}: 1 0 0 1 1 0 1 1 1
and
U
3x3 Array{Int64,2}: 1 1 1 0 1 1 0 0 1
And we verify:
A - L*U
3x3 Array{Int64,2}: 0 0 0 0 0 0 0 0 0
There are built in functions for these:
L, U, p = lu(A)
( 3x3 Array{Float64,2}: 1.0 0.0 0.0 1.0 1.0 0.0 1.0 1.0 1.0, 3x3 Array{Float64,2}: 1.0 1.0 1.0 0.0 1.0 1.0 0.0 0.0 1.0, [1,2,3])
We already verified that $LU = A$ for this $A$. The p
is a permulation vector. In general, we should verify
A[p,:] - L * U
3x3 Array{Float64,2}: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Define the $k$th leading principal minor of $A$ to be the submatrix $a_{ij}$ for $1 \leq i,j \leq k$. Call this $A_k$.
Thm: If $A$ is $n \times n$ and all $n$ leading principle minors are non-singular, then $A$ has an LU decomposition
Proof. Suppose by induction this is true for step $k-1$. The we have $A_{k-1}$ can be factored: $A_{k-1} = L_{k-1} U_{k-1}$.
We wish to find $L_k$ and $U_k$ which are extensions and satisfy $A_k = L_k U_k$.
Consider the case $1 \leq i \leq k-1$ and
$$~ a_{ik} = \sum_{s = 1}^{k-1} l_{is}u_{sk} ~$$We know the $a_{ik}$, the $l$'s involved are from $L_{k-1}$. The $u$'s we don't know (yet), but as $L_{k-1}$ is non-singular we can solve for the $u$s and this is just of the form $b = L_{k-1} x$. So we can fill out the value of $U_k$.
Similarly, for $1 \leq j \leq k-1$ we have:
$$~ a_{kj} = \sum_{s=1}^{k-1} l_{ks} u_{sj} ~$$This is of the form $b = U_{k-1} x$ so can be solved to fill out the value of $L_k$.
FInally, we need to solve for $l_{kk}$ and $u_{kk}$, but we have:
$$~ a_{kk} = \sum_s^{k-1} l_{ks} u_{sk} + l_{kk}u_{kk} ~$$If the value of $l_{kk} = 1$, then $u_{kk}$ can be solved as the sum is now known. That is, we can fill out $L_k$ and $U_k$ with $A_k = L_k U_k$, as desired.
We know the transpose of a lower triangular matrix is upper and vice versa. This gives hope to a factorization of the form $A = L L^T$, known as the Cholesky factorization. When is this possible?
Thm: If $A$ is real, symmetric and positive definite then it has a unque factorization $A=LL^T$ and $L$ has a positive diagonal.
Pf: We must have $Ax=0$ has only a solution $x=0$, as positive definite means $x^T A x > 0$ for non-zero $x$. By considering vectors of the form $x = [x_1 x_2 \cdot x_k 0 0 \cdots 0]$ we can see that $A_k$ will also be non-singular.
So by the last theorem $A= LU$ for some $l$ and $U$. But $A^T = A$ so $LU = (LU)^T = U^T L^T$. Multiplying on the right and left as follows gives
$$~ \begin{align} L^{-1} (L U) (L^T)^{-1} &=L^{-1} U^T L^T (L^T)^{-1}\\ U (L^T)^{-1} &= L^{-1} U^T. \end{align} ~$$The left side is upper triangular, the right side lower triangular, hence the must be a diagonal matrix $D$: $D = U (L^T)^{-1}$ and so $L D = (LU)(L^T)^{-1} = A(L^T)^{-1}$, giving $A = L D L^T$.
If we can show that $D$ has all positive diagonal terms, then we can define $D^{1/2}$ by $(\sqrt{d_{ii}})$ and express $A$ as $(LD^{1/2}) (LD^{1/2})^T$ which is what we want.
So, why do we know $D$ has all positive diagonal terms? Because $D$ is positive definite:
Take $x$ and then:
$$~ \begin{align} x^T D x &= x^T (L^{-1}) A (L^T)^{-1} x\\ &= (x^T L^{-1}) A ((L^T)^{-1} x)\\ &= ((L^{-1})^Tx)^T A ((L^T)^{-1}x)\\ &= ((L^{-1})^Tx)^T A ((L^{-1})^{t}x) &> 0. \end{align} ~$$The last line as $A$ is positive definite and $(L^{-1})^Tx$ is non-zero. The fact we can swap out the inverse and transpose of a matrix is something to prove.
Here is an alternative proof, perhaps more instructive. It requires a few facts about matrices which are symmetric and positive definite:
Suppose we have $A$ as assumed. Then we can write $A$ in the following way where $a_{11} > 0$.
$$~ A = \left[ \begin{array}{cc} a_{11} & w^T\\ w & K \end{array} \right] ~$$By the second fact above, $K$ Is symmetric and positive definite.
Now consider the following lower triangular matrix:
$$~ L_1 = \left[ \begin{array}{cc} \sqrt{a_{11}} & 0\\ \frac{w}{\sqrt{a_{11}}} & I \end{array} \right] ~$$Then using block matrix multiplication we get this decomposition: $A = L B A_1 L^T$ where
$$~ B_1= \left[ \begin{array}{cc} I & 0^T\\ 0 & K - \frac{ww^T}{a_{11}} \end{array} \right]. ~$$To see:
using SymPy w, K= symbols("w, K", real=true) w = [w] # a vector a_11 = symbols("a_11", positive=true) A = [a_11 w'; w K] L = [sqrt(a_11) 0; w/sqrt(a_11) 1] B_1 = [1 0;0 (K-w*w'/a_11)] L*B_1*L'
2x2 Array{Any,2}: a_11 w w K
(This isn't perfect, as w
messes up the automatic conversion to a matrix of symbolic objects.)
Let $A_1 = K - ww^T / a_{11}$. Since $A$ is positive definite and $L$ has full rank (why?) it must be $B_1$ is positive definite. Hence, $A_1$ is too. But both $K$ and $ww^T$ are symmetric, so $A_1$ is also symmetric and positive definite.
So we can find $M_2$, $B_2$, such that $A_1 = M_2 B_2 M_2^T$ and $B_2$ will have a symmetric, positive definite submatrix $A_2$. As written $M_2$ is $n-1 \times n-1$. We embed this into
$$~ L_2 = \left[ \begin{array}{cc} I & 0^T\\ 0 & M_2 \end{array} \right]. ~$$And then $A = L_1 L_2 A_2 L_2^T L_1^T$ where
$$~ A_2 = \left[ \begin{array}{cc} I_2 & 0^T\\ 0 & K_2 \end{array} \right]. ~$$We see were this repeated, we would eventually get:
$$~ A = L_1 L_2 \cdots L_n \cdot I \cdot L_n^T \cdots L_2^T L_1^T. ~$$Letting $L = L_1 L_2 \cdots L_n$ yields the result, $A=LL^T$, after noting the the product of lower triangular matrices is lower triangular.
This comes from statistics. Consider the overdetermined system:
A = [1 2; 3 5; 4 7; 1 8]
4x2 Array{Int64,2}: 1 2 3 5 4 7 1 8
b = [1,2,3,4]
4-element Array{Int64,1}: 1 2 3 4
The system $Ax=b$ has no solutions. However, this system will:
$$~ (A^T A) x = A^T b ~$$(Assuming $A^TA$ is non-singular, we have it is symmetric and positive definite.)
M = A' *A
2x2 Array{Int64,2}: 27 53 53 142
So we can take the cholesky decomposition:
U = chol(M)' # default answer is upper triangular L = U'
2x2 UpperTriangular{Float64,Array{Float64,2}}: 5.19615 10.1999 0.0 6.16141
So we can solve $LL^Tx = A^T b$. First we solve for $y$ in $Ly=A^Tb$ with:
y = L \ (A'*b)
2-element Array{Float64,1}: -16.282 10.5495
And then solve $L^Tx = y$:
x = L' \ y
2-element Array{Float64,1}: -3.13347 6.89947
This answer is not the "answer" (as that doesn't exist):
A*x - b
4-element Array{Float64,1}: 9.66548 23.097 32.7624 48.0623
However, it has a property: it is the x
with the smallest difference:
norm(A*x - b)
63.3265770219158
sort([norm(A*rand(2) - b) for _ in 1:10])
10-element Array{Any,1}: 1.11337 2.49197 2.50891 2.8541 3.89776 4.96715 5.42795 5.71885 7.3971 10.695
(P279) Suppose $Ax=b$ has $A$ being $m \times n$ with $m > n$ and $rank(A) = n$. Then, this will typically have no solutions. In that case, what is sought is a best solution in the sense of minimizing $\| b - Ax \|_2$.
Now suppose $x$ solves $A^TAx=A^Tb$, and $y$ is some other value, then
$$~ \begin{align} \|b - Ay\|_2^2 &= \|b - Ax + A(x-y)\|_2^2\\ &= (b - Ax + A(x-y))^T \cdot (b - Ax + A(x-y))\\ &= (b - Ax)^T\cdot (b-Ax) + (b-Ax)^T\cdot (A(x-y)) + (A(x-y))^T \cdot (b-Ax) + (A(x-y)^T) \cdot (A(x-y))\\ &= \| b - Ax \|_2^2 + 0 + 0 + \|A(x-y)\|_2^2\\ &\geq \| b - Ax \|_2^2 \end{align} ~$$The latter because, $Ax-b$ has a $0$ dot product with vectors in the column space of $A$ (as $A^T(Ax-b)=0$. But $A(x-y)$ is in the column space of $A$. (Any $Az = [A_{\cdot 1} ; A_{\cdot 2} ; \cdots ; A_{\cdot n}] \cdot z = z_1A_{\cdot 1} + z_2A_{\cdot 2} + \cdots + z_n A_{\cdot n}$.) So, the cross terms are 0 and the result holds.