Matrix Factorizations in ML (LU, QR, Cholesky)
Matrix factorizations break down a matrix into simpler building blocks.
They are not just abstract math — they are workhorses of numerical linear algebra that make solving systems, regression, and probabilistic ML efficient and stable.
In this lesson, we cover three fundamental factorizations:
- LU Decomposition → solving linear systems efficiently
- QR Decomposition → numerical stability in least squares
- Cholesky Decomposition → covariance matrices, Gaussian processes
1. LU Decomposition
Definition: Any square matrix
where:
is a lower triangular matrix (ones on the diagonal) is an upper triangular matrix
This is extremely useful for solving systems of equations
- Compute
once. - Solve
(forward substitution). - Solve
(back substitution).
Much faster than computing
ML relevance
- Linear regression can involve solving
. LU factorization speeds this up. - Appears in optimization routines and numerical solvers.
2. QR Decomposition
Definition: Any (rectangular) matrix
where:
is an orthogonal matrix ( ) is an upper triangular matrix
Why useful?
Instead of solving
This is more numerically stable.
ML relevance
- QR is widely used in least squares regression solvers.
- Preferred when feature matrices are ill-conditioned (highly correlated features).
3. Cholesky Decomposition
Definition: A symmetric, positive-definite matrix
where
Why useful?
- Efficient way to invert covariance matrices.
- More efficient than LU for positive-definite systems.
ML relevance
- Gaussian Processes (GPs): covariance kernel matrices are symmetric positive-definite → use Cholesky for efficient inference.
- Optimization: Cholesky is used in second-order methods where Hessians are PSD.
Hands-on with Python and Rust
import numpy as np
# Example matrix
A = np.array([[4, 2], [2, 3]])
# LU decomposition
from scipy.linalg import lu
P, L, U = lu(A)
print("LU Decomposition:")
print("P=\n", P, "\nL=\n", L, "\nU=\n", U)
# QR decomposition
Q, R = np.linalg.qr(A)
print("\nQR Decomposition:")
print("Q=\n", Q, "\nR=\n", R)
# Cholesky decomposition
L = np.linalg.cholesky(A)
print("\nCholesky Decomposition:")
print("L=\n", L)
use ndarray::array;
use ndarray_linalg::{LU, QR, Cholesky};
fn main() {
let a = array![[4.0, 2.0],
[2.0, 3.0]];
// LU decomposition
let lu = a.clone().lu().unwrap();
let (l, u) = (lu.l().to_owned(), lu.u().to_owned());
println!("LU Decomposition:\nL=\n{:?}\nU=\n{:?}", l, u);
// QR decomposition
let qr = a.clone().qr().unwrap();
let (q, r) = (qr.q().unwrap(), qr.r().unwrap());
println!("QR Decomposition:\nQ=\n{:?}\nR=\n{:?}", q, r);
// Cholesky decomposition
let chol = a.cholesky().unwrap();
println!("Cholesky Decomposition:\nL=\n{:?}", chol);
}
Summary
- LU → solve systems efficiently, appears in regression & optimization
- QR → stable least squares solutions
- Cholesky → Gaussian processes & covariance matrices
Next Steps
Continue to Pseudo-Inverse & Ill-Conditioned Systems.