Chapter 17 Numerical Linear Algebra
In your algebra courses you would write \(Ax=b\) and solve \(x=A^{-1}b\). This is useful to understand the algebraic properties of \(x\), but a computer would never recover \(x\) that way. Even the computation of the sample variance, \(S^2(x)=(n-1)^{-1}\sum (x_i-\bar x )^2\) is not solved that way in a computer, because of numerical and speed considerations.
In this chapter, we discuss several ways a computer solves systems of linear equations, with their application to statistics, namely, to OLS problems.
17.1 LU Factorization
The LU factorization is essentially the matrix notation for the Gaussian elimination you did in your introductory algebra courses.
For a square \(n \times n\) matrix, the LU factorization requires \(n^3/3\) operations, and stores \(n^2+n\) elements in memory.
17.2 Cholesky Factorization
Seeing the matrix \(A\) as a function, non-negative matrices can be thought of as functions that generalize the squaring operation.
For obvious reasons, the Cholesky factorization is known as the square root of a matrix.
Because Cholesky is less general than LU, it is also more efficient. It can be computed in \(n^3/6\) operations, and requires storing \(n(n+1)/2\) elements.
17.3 QR Factorization
The QR factorization is very useful to solve the OLS problem as we will see in 17.6. The QR factorization takes \(2n^3/3\) operations to compute. Three major methods for computing the QR factorization exist. These rely on Householder transformations, Givens transformations, and a (modified) Gram-Schmidt procedure (Gentle 2012).
17.4 Singular Value Factorization
The SVD factorization is very useful for algebraic analysis, but less so for computations. This is because it is (typically) solved via the QR factorization.
17.5 Iterative Methods
The various matrix factorizations above may be used to solve a system of linear equations, and in particular, the OLS problem. There is, however, a very different approach to solving systems of linear equations. This approach relies on the fact that solutions of linear systems of equations, can be cast as optimization problems: simply find \(x\) by minimizing \(\Vert Ax-b \Vert\).
Some methods for solving (convex) optimization problems are reviewed in the Convex Optimization Chapter 18.
For our purposes we will just mention that historically (this means in the lm
function, and in the LAPACK numerical libraries) the factorization approach was preferred, and now optimization approaches are preferred.
This is because the optimization approach is more numerically stable, and easier to parallelize.
17.6 Solving the OLS Problem
Recalling the OLS problem in Eq.(6.5): we wish to find \(\beta\) such that
\[\begin{align}
\hat \beta:= argmin_\beta \{ \Vert y-X\beta \Vert^2_2 \}.
\end{align}\]
The solution, \(\hat \beta\) that solves this problem has to satisfy \[\begin{align} X'X \beta = X'y. \tag{17.1} \end{align}\] Eq.(17.1) are known as the normal equations. The normal equations are the link between the OLS problem, and the matrix factorization discussed above.
Using the QR decomposition in the normal equations we have that \[\begin{align*} \hat \beta = R_{(1:p,1:p)}^{-1} y, \end{align*}\] where \((R_{n\times p})=(R_{(1:p,1:p)},0_{(p+1:n,1:p)})\) is the
17.7 Numerical Libraries for Linear Algebra
TODO. In the meanwhile: comparison of numerical libraries; installing MKL in Ubnutu; how to speed-up linear algebra in R; and another; install Open-Blas;
17.7.1 OpenBlas
17.7.2 MKL
17.8 Bibliographic Notes
For an excellent introduction to numerical algorithms in statistics, see Weihs, Mersmann, and Ligges (2013). For an emphasis on numerical linear algebra, see Gentle (2012), and Golub and Van Loan (2012).
17.9 Practice Yourself
References
Gentle, James E. 2012. Numerical Linear Algebra for Applications in Statistics. Springer Science & Business Media.
Golub, Gene H, and Charles F Van Loan. 2012. Matrix Computations. Vol. 3. JHU Press.
Weihs, Claus, Olaf Mersmann, and Uwe Ligges. 2013. Foundations of Statistical Algorithms: With References to R Packages. CRC Press.