Matrix Factorizations in ML (LU, QR, Cholesky)

Matrix factorizations break down a matrix into simpler building blocks.
They are not just abstract math — they are workhorses of numerical linear algebra that make solving systems, regression, and probabilistic ML efficient and stable.

In this lesson, we cover three fundamental factorizations:

LU Decomposition → solving linear systems efficiently
QR Decomposition → numerical stability in least squares
Cholesky Decomposition → covariance matrices, Gaussian processes

1. LU Decomposition

Definition: Any square matrix $A$ can (under certain conditions) be decomposed as:

A = L U

where:

$L$ is a lower triangular matrix (ones on the diagonal)
$U$ is an upper triangular matrix

This is extremely useful for solving systems of equations $A x = b$ :

Compute $L U = A$ once.
Solve $L y = b$ (forward substitution).
Solve $U x = y$ (back substitution).

Much faster than computing $A^{- 1}$ .

ML relevance

Linear regression can involve solving $(X^{T} X) w = X^{T} y$ . LU factorization speeds this up.
Appears in optimization routines and numerical solvers.

2. QR Decomposition

Definition: Any (rectangular) matrix $A$ can be factored as:

A = Q R

where:

$Q$ is an orthogonal matrix ( $Q^{T} Q = I$ )
$R$ is an upper triangular matrix

Why useful?
Instead of solving $A^{T} A w = A^{T} y$ (which may be unstable if $A^{T} A$ is ill-conditioned), we can solve least squares via QR:

A w \approx y \Rightarrow R w = Q^{T} y

This is more numerically stable.

ML relevance

QR is widely used in least squares regression solvers.
Preferred when feature matrices are ill-conditioned (highly correlated features).

3. Cholesky Decomposition

Definition: A symmetric, positive-definite matrix $A$ can be decomposed as:

A = L L^{T}

where $L$ is lower triangular.

Why useful?

Efficient way to invert covariance matrices.
More efficient than LU for positive-definite systems.

ML relevance

Gaussian Processes (GPs): covariance kernel matrices are symmetric positive-definite → use Cholesky for efficient inference.
Optimization: Cholesky is used in second-order methods where Hessians are PSD.

Hands-on with Python and Rust

PythonRust

python

import numpy as np

# Example matrix
A = np.array([[4, 2], [2, 3]])

# LU decomposition
from scipy.linalg import lu
P, L, U = lu(A)
print("LU Decomposition:")
print("P=\n", P, "\nL=\n", L, "\nU=\n", U)

# QR decomposition
Q, R = np.linalg.qr(A)
print("\nQR Decomposition:")
print("Q=\n", Q, "\nR=\n", R)

# Cholesky decomposition
L = np.linalg.cholesky(A)
print("\nCholesky Decomposition:")
print("L=\n", L)

rust

use ndarray::array;
use ndarray_linalg::{LU, QR, Cholesky};

fn main() {
    let a = array![[4.0, 2.0],
                   [2.0, 3.0]];

    // LU decomposition
    let lu = a.clone().lu().unwrap();
    let (l, u) = (lu.l().to_owned(), lu.u().to_owned());
    println!("LU Decomposition:\nL=\n{:?}\nU=\n{:?}", l, u);

    // QR decomposition
    let qr = a.clone().qr().unwrap();
    let (q, r) = (qr.q().unwrap(), qr.r().unwrap());
    println!("QR Decomposition:\nQ=\n{:?}\nR=\n{:?}", q, r);

    // Cholesky decomposition
    let chol = a.cholesky().unwrap();
    println!("Cholesky Decomposition:\nL=\n{:?}", chol);
}

Summary

LU → solve systems efficiently, appears in regression & optimization
QR → stable least squares solutions
Cholesky → Gaussian processes & covariance matrices

Next Steps

Continue to Pseudo-Inverse & Ill-Conditioned Systems.

Matrix Factorizations in ML (LU, QR, Cholesky) ​

1. LU Decomposition ​

2. QR Decomposition ​

3. Cholesky Decomposition ​

Hands-on with Python and Rust ​

Summary ​

Next Steps ​

Matrix Factorizations in ML (LU, QR, Cholesky)

1. LU Decomposition

2. QR Decomposition

3. Cholesky Decomposition

Hands-on with Python and Rust

Summary

Next Steps