Projections and Least Squares
In this lesson, we connect orthogonality with one of the most important tools in machine learning: least squares regression.
The key idea: when fitting models, we are often projecting data onto a subspace that best approximates it.
Projection of a Vector onto Another
Given a vector
- This is the component of
along . - The residual
is orthogonal to .
Why this matters
In ML, when we approximate a target vector
Projection onto a Subspace
If we have multiple feature vectors (columns of
is the design matrix (features). is the projection of onto the column space of . - The residual
is orthogonal to all columns of .
This formula is exactly the ordinary least squares (OLS) solution.
Least Squares Problem
We want to solve:
Expanding and differentiating w.r.t.
Solving yields:
This is the same closed-form solution we saw in linear regression.
Geometric interpretation
OLS finds the vector
The difference
Hands-on with Python and Rust
import numpy as np
# Feature matrix X and target y
X = np.array([[1], [2], [3]])
y = np.array([2, 2.9, 4.1])
# Closed-form solution (normal equation)
w = np.linalg.inv(X.T @ X) @ X.T @ y
# Predictions (projection of y onto span(X))
y_hat = X @ w
# Residual (orthogonal to X)
residual = y - y_hat
print("Weight:", w)
print("Predictions:", y_hat)
print("Residual (should be orthogonal):", residual)
print("Dot(X[:,0], residual) =", np.dot(X[:,0], residual))
use ndarray::{array, Array1, Array2};
use ndarray_linalg::Inverse;
fn main() {
// Feature matrix X and target y
let x: Array2<f64> = array![[1.0], [2.0], [3.0]];
let y: Array1<f64> = array![2.0, 2.9, 4.1];
// Compute (X^T X)^{-1} X^T y
let xtx = x.t().dot(&x);
let xtx_inv = xtx.inv().unwrap();
let xty = x.t().dot(&y);
let w = xtx_inv.dot(&xty);
// Predictions
let y_hat = x.dot(&w);
// Residual
let residual = &y - &y_hat;
let dot = x.column(0).dot(&residual);
println!("Weight: {:?}", w);
println!("Predictions: {:?}", y_hat);
println!("Residual: {:?}", residual);
println!("Dot(X[:,0], residual) = {}", dot);
}
Connection to Machine Learning
- Linear regression is solving a least squares problem.
- PCA can also be seen as a projection (onto directions of maximum variance).
- Many ML methods boil down to finding the “best projection” of data onto a simpler subspace.