Skip to content

Pseudo-Inverse & Ill-Conditioned Systems

In machine learning, we often need to invert matrices (e.g., in linear regression: (XTX)1).
But what if the matrix is not invertible or is ill-conditioned (unstable for inversion)?
This is where the pseudo-inverse and the concept of numerical stability come in.


1. The Moore–Penrose Pseudo-Inverse

If A is not square or not invertible, we use the Moore–Penrose inverse A+.

Definition:
A+ is the unique matrix such that:

AA+A=A,A+AA+=A+,(AA+)T=AA+,(A+A)T=A+A

In regression:
Instead of solving

w=(XTX)1XTy

we use

w=X+y

where X+ is the pseudo-inverse (often computed via SVD).

ML relevance

  • Works even if XTX is singular (e.g., correlated features, fewer samples than features).
  • Used in regularized regression and neural network pseudo-inverse training.

2. Handling Non-Invertible Matrices in Regression

Situations where (XTX) is not invertible:

  • Multicollinearity: features are linearly dependent.
  • Underdetermined systems: more features than samples.

Solutions:

  • Use pseudo-inverse.
  • Add regularization (Ridge regression: (XTX+λI)1).
  • Reduce dimensionality (PCA).

3. Condition Number & Numerical Stability

The condition number of a matrix A (with respect to inversion) is:

κ(A)=AA1
  • If κ(A) is large → small input errors cause large output errors.
  • High condition number → matrix is ill-conditioned.

ML relevance

  • Ill-conditioned XTX means regression weights are highly unstable.
  • Regularization (Ridge) reduces condition number.
  • QR or SVD are often used instead of direct inversion for stability.

Hands-on with Python and Rust

python
import numpy as np

# Feature matrix with collinearity
X = np.array([[1, 2], [2, 4], [3, 6]])  # second column is 2x first
y = np.array([1, 2, 3])

# Direct normal equation (fails: X^T X not invertible)
try:
    w = np.linalg.inv(X.T @ X) @ X.T @ y
except np.linalg.LinAlgError:
    print("Matrix is singular, cannot invert.")

# Use pseudo-inverse instead
w_pinv = np.linalg.pinv(X) @ y

# Condition number
cond_num = np.linalg.cond(X)

print("Pseudo-inverse solution:", w_pinv)
print("Condition number of X:", cond_num)
rust
use ndarray::{array, Array2, Array1};
use ndarray_linalg::{PseudoInverse, Norm};

fn main() {
    // Feature matrix with collinearity
    let x: Array2<f64> = array![
        [1.0, 2.0],
        [2.0, 4.0],
        [3.0, 6.0]
    ];
    let y: Array1<f64> = array![1.0, 2.0, 3.0];

    // Pseudo-inverse solution
    let x_pinv = x.pinv(1e-8).unwrap();
    let w = x_pinv.dot(&y);

    // Condition number
    let cond_num = x.norm_l2() * x_pinv.norm_l2();

    println!("Pseudo-inverse solution: {:?}", w);
    println!("Condition number of X: {}", cond_num);
}

Summary

  • Pseudo-inverse (Moore–Penrose) solves regression when (XTX) is not invertible.
  • Ill-conditioning → unstable solutions due to large condition numbers.
  • Fixes → pseudo-inverse, regularization, SVD/QR-based methods.

Next Steps

Continue to Block Matrices and Kronecker Products.