Projections and Least Squares 
In this lesson, we connect orthogonality with one of the most important tools in machine learning: least squares regression.
 The key idea: when fitting models, we are often projecting data onto a subspace that best approximates it.
Projection of a Vector onto Another 
Given a vector 
- This is the component of along . 
- The residual is orthogonal to . 
Why this matters
In ML, when we approximate a target vector 
Projection onto a Subspace 
If we have multiple feature vectors (columns of 
- is the design matrix (features). 
- is the projection of - onto the column space of - . 
- The residual is orthogonal to all columns of . 
This formula is exactly the ordinary least squares (OLS) solution.
Least Squares Problem 
We want to solve:
Expanding and differentiating w.r.t. 
Solving yields:
This is the same closed-form solution we saw in linear regression.
Geometric interpretation
OLS finds the vector 
 The difference 
Hands-on with Python and Rust 
import numpy as np
# Feature matrix X and target y
X = np.array([[1], [2], [3]])
y = np.array([2, 2.9, 4.1])
# Closed-form solution (normal equation)
w = np.linalg.inv(X.T @ X) @ X.T @ y
# Predictions (projection of y onto span(X))
y_hat = X @ w
# Residual (orthogonal to X)
residual = y - y_hat
print("Weight:", w)
print("Predictions:", y_hat)
print("Residual (should be orthogonal):", residual)
print("Dot(X[:,0], residual) =", np.dot(X[:,0], residual))use ndarray::{array, Array1, Array2};
use ndarray_linalg::Inverse;
fn main() {
    // Feature matrix X and target y
    let x: Array2<f64> = array![[1.0], [2.0], [3.0]];
    let y: Array1<f64> = array![2.0, 2.9, 4.1];
    // Compute (X^T X)^{-1} X^T y
    let xtx = x.t().dot(&x);
    let xtx_inv = xtx.inv().unwrap();
    let xty = x.t().dot(&y);
    let w = xtx_inv.dot(&xty);
    // Predictions
    let y_hat = x.dot(&w);
    // Residual
    let residual = &y - &y_hat;
    let dot = x.column(0).dot(&residual);
    println!("Weight: {:?}", w);
    println!("Predictions: {:?}", y_hat);
    println!("Residual: {:?}", residual);
    println!("Dot(X[:,0], residual) = {}", dot);
}Connection to Machine Learning 
- Linear regression is solving a least squares problem.
- PCA can also be seen as a projection (onto directions of maximum variance).
- Many ML methods boil down to finding the “best projection” of data onto a simpler subspace.