Matrix Factorizations in ML (LU, QR, Cholesky)
Matrix Factorizations in ML (LU, QR, Cholesky)
Section titled “Matrix Factorizations in ML (LU, QR, Cholesky)”Matrix factorizations break down a matrix into simpler building blocks.
They are not just abstract math — they are workhorses of numerical linear algebra that make solving systems, regression, and probabilistic ML efficient and stable.
In this lesson, we cover three fundamental factorizations:
- LU Decomposition → solving linear systems efficiently
- QR Decomposition → numerical stability in least squares
- Cholesky Decomposition → covariance matrices, Gaussian processes
1. LU Decomposition
Section titled “1. LU Decomposition”Definition: Any square matrix can (under certain conditions) be decomposed as:
where:
- is a lower triangular matrix (ones on the diagonal)
- is an upper triangular matrix
This is extremely useful for solving systems of equations :
- Compute once.
- Solve (forward substitution).
- Solve (back substitution).
Much faster than computing .
::: info ML relevance
- Linear regression can involve solving . LU factorization speeds this up.
- Appears in optimization routines and numerical solvers.
:::
2. QR Decomposition
Section titled “2. QR Decomposition”Definition: Any (rectangular) matrix can be factored as:
where:
- is an orthogonal matrix ()
- is an upper triangular matrix
Why useful?
Instead of solving (which may be unstable if is ill-conditioned), we can solve least squares via QR:
This is more numerically stable.
::: info ML relevance
- QR is widely used in least squares regression solvers.
- Preferred when feature matrices are ill-conditioned (highly correlated features).
:::
3. Cholesky Decomposition
Section titled “3. Cholesky Decomposition”Definition: A symmetric, positive-definite matrix can be decomposed as:
where is lower triangular.
Why useful?
- Efficient way to invert covariance matrices.
- More efficient than LU for positive-definite systems.
::: info ML relevance
- Gaussian Processes (GPs): covariance kernel matrices are symmetric positive-definite → use Cholesky for efficient inference.
- Optimization: Cholesky is used in second-order methods where Hessians are PSD.
:::
Hands-on with Python and Rust
Section titled “Hands-on with Python and Rust”::: code-group
import numpy as np
# Example matrixA = np.array([[4, 2], [2, 3]])
# LU decompositionfrom scipy.linalg import luP, L, U = lu(A)print("LU Decomposition:")print("P=\n", P, "\nL=\n", L, "\nU=\n", U)
# QR decompositionQ, R = np.linalg.qr(A)print("\nQR Decomposition:")print("Q=\n", Q, "\nR=\n", R)
# Cholesky decompositionL = np.linalg.cholesky(A)print("\nCholesky Decomposition:")print("L=\n", L)use ndarray::array;use ndarray_linalg::{LU, QR, Cholesky};
fn main() { let a = array![[4.0, 2.0], [2.0, 3.0]];
// LU decomposition let lu = a.clone().lu().unwrap(); let (l, u) = (lu.l().to_owned(), lu.u().to_owned()); println!("LU Decomposition:\nL=\n{:?}\nU=\n{:?}", l, u);
// QR decomposition let qr = a.clone().qr().unwrap(); let (q, r) = (qr.q().unwrap(), qr.r().unwrap()); println!("QR Decomposition:\nQ=\n{:?}\nR=\n{:?}", q, r);
// Cholesky decomposition let chol = a.cholesky().unwrap(); println!("Cholesky Decomposition:\nL=\n{:?}", chol);}:::
Summary
Section titled “Summary”- LU → solve systems efficiently, appears in regression & optimization
- QR → stable least squares solutions
- Cholesky → Gaussian processes & covariance matrices
Next Steps
Section titled “Next Steps”Continue to Pseudo-Inverse & Ill-Conditioned Systems.