Johnson–Lindenstrauss Lemma & Random Projections
Johnson–Lindenstrauss Lemma & Random Projections
Section titled “Johnson–Lindenstrauss Lemma & Random Projections”The Johnson-Lindenstrauss (JL) Lemma is a cornerstone of high-dimensional geometry, stating that points in high-dimensional space can be projected to a much lower-dimensional space while approximately preserving pairwise distances. Random projections, the technique behind this lemma, enable efficient dimensionality reduction, critical in machine learning (ML) for handling large-scale data in tasks like clustering, classification, and data compression. By leveraging randomness, these methods mitigate the curse of dimensionality with minimal computational cost.
This lecture in the “Foundations for AI/ML” series (misc-math cluster) builds on the curse of dimensionality and stochastic processes, exploring the JL Lemma, random projections, their mathematical foundations, proofs, and ML applications. We’ll provide intuitive explanations, derivations, and practical implementations in Python and Rust, offering tools to apply random projections in AI.
1. Intuition Behind Johnson-Lindenstrauss Lemma
Section titled “1. Intuition Behind Johnson-Lindenstrauss Lemma”In high-dimensional spaces, data points are sparse, and computations are costly (curse of dimensionality). The JL Lemma guarantees that you can project n points from ℝ^d to ℝ^k (k << d) using a random matrix, preserving pairwise Euclidean distances within a factor of (1 ± ε) with high probability. Random projections achieve this by mapping data onto a random subspace, exploiting probabilistic concentration.
ML Connection
Section titled “ML Connection”- Dimensionality Reduction: Compress high-dimensional data for faster ML algorithms.
- Clustering/Classification: Preserve distances for k-NN, SVMs.
- Data Compression: Reduce storage in large datasets.
::: info The JL Lemma is like compressing a high-resolution image into a smaller file that still captures the essential shapes—random projections keep the “essence” of data with fewer dimensions. :::
Example
Section titled “Example”- Project 1000D data to 50D, preserving distances for clustering.
2. Johnson-Lindenstrauss Lemma: Formal Statement
Section titled “2. Johnson-Lindenstrauss Lemma: Formal Statement”For n points in ℝ^d and ε ∈ (0,1), there exists a linear map f: ℝ^d → ℝ^k with:
[ k \geq \frac{8 \ln n}{\varepsilon^2} ]
such that for all x, y in the set:
[ (1 - \varepsilon) |x - y|_2^2 \leq |f(x) - f(y)|_2^2 \leq (1 + \varepsilon) |x - y|_2^2 ]
with high probability.
Key Insight
Section titled “Key Insight”- k depends on n and ε, not d, making it efficient for high-d.
ML Insight
Section titled “ML Insight”- Reduces dimensionality while preserving geometric structure.
3. Random Projections: Mechanism
Section titled “3. Random Projections: Mechanism”Random Projection Matrix: R ∈ ℝ^{k×d}, entries drawn i.i.d. from:
- N(0, 1/k) (Gaussian).
- ±1/√k with p=0.5 (Rademacher).
- Sparse random (e.g., Achlioptas’ ±1/√k with p=1/3 zeros).
Projection: For x ∈ ℝ^d, compute y = Rx ∈ ℝ^k.
Properties
Section titled “Properties”- Preserves distances in expectation.
- Orthogonal projection (scaled) approximates JL bound.
ML Connection
Section titled “ML Connection”- Random projections for fast PCA, k-NN.
4. Proof of JL Lemma (Simplified)
Section titled “4. Proof of JL Lemma (Simplified)”Gaussian Projection:
- R_{ij} ~ N(0, 1/k).
- For x ∈ ℝ^d, y = Rx, E[|y|_2^2] = |x|_2^2.
Concentration: Use concentration inequalities (e.g., chi-square) to show:
[ P( ||Rx|_2^2 - |x|_2^2| > \varepsilon |x|_2^2 ) \leq 2e^{-k(\varepsilon^2/4 - \varepsilon^3/6)} ]
For n points, union bound gives k ~ O(ln n / ε^2).
Derivation
Section titled “Derivation”- Projection preserves lengths via Gaussian concentration.
- Pairwise distances preserved by union bound.
ML Insight
Section titled “ML Insight”- k logarithmic in n, linear in 1/ε², efficient.
5. Variants and Extensions
Section titled “5. Variants and Extensions”Sparse Projections: Achlioptas’ matrix with zeros (faster).
Database-Friendly Projections: Use ±1 or sparse entries.
Subspace Embeddings: Extend to preserving entire subspaces.
In ML: Sparse projections for large-scale data.
6. Applications in Machine Learning
Section titled “6. Applications in Machine Learning”- Dimensionality Reduction: Compress features for SVM, neural nets.
- Clustering: Preserve distances for k-means, spectral clustering.
- Approximate Nearest Neighbors: Speed up searches with random projections.
- Data Compression: Reduce storage in embeddings.
Challenges
Section titled “Challenges”- Loss of Information: Some distortion inevitable.
- Nonlinear Structures: JL preserves linear distances.
- Randomness Overhead: Generating R costly for large d.
7. Numerical Implementations of Random Projections
Section titled “7. Numerical Implementations of Random Projections”Implement JL projections, test distance preservation.
::: code-group
import numpy as npimport matplotlib.pyplot as plt
# Random projectiondef random_projection(X, k, method='gaussian'): d = X.shape[1] if method == 'gaussian': R = np.random.randn(k, d) / np.sqrt(k) elif method == 'sparse': R = np.random.choice([0, 1, -1], size=(k, d), p=[2/3, 1/6, 1/6]) / np.sqrt(k) return X @ R.T
# Generate high-D datan, d = 100, 1000X = np.random.rand(n, d)
# JL projectionk = int(8 * np.log(n) / 0.1**2) # ε=0.1X_red = random_projection(X, k)
# Check distance preservationorig_dist = np.linalg.norm(X[0] - X[1])proj_dist = np.linalg.norm(X_red[0] - X_red[1])print("Original distance:", orig_dist, "Projected:", proj_dist)
# ML: k-NN after projectionfrom sklearn.neighbors import KNeighborsClassifiery = (X[:,0] + X[:,1] > 1).astype(int)knn_orig = KNeighborsClassifier(n_neighbors=5).fit(X, y)knn_red = KNeighborsClassifier(n_neighbors=5).fit(X_red, y)print("Original k-NN accuracy:", knn_orig.score(X, y))print("Projected k-NN accuracy:", knn_red.score(X_red, y))
# Visualize distancesdists_orig = [np.linalg.norm(X[i] - X[j]) for i in range(n) for j in range(i+1, n)]dists_red = [np.linalg.norm(X_red[i] - X_red[j]) for i in range(n) for j in range(i+1, n)]plt.scatter(dists_orig, dists_red, alpha=0.5)plt.plot([min(dists_orig), max(dists_orig)], [min(dists_orig), max(dists_orig)], 'r--')plt.title("Distance Preservation")plt.xlabel("Original Distances")plt.ylabel("Projected Distances")plt.show()use nalgebra::{DMatrix, DVec};use rand::Rng;
fn random_projection(x: &DMatrix<f64>, k: usize, method: &str) -> DMatrix<f64> { let (n, d) = x.shape(); let mut rng = rand::thread_rng(); let r = match method { "gaussian" => DMatrix::from_fn(k, d, |_, _| rng.gen::<f64>() / (k as f64).sqrt()), "sparse" => DMatrix::from_fn(k, d, |_, _| { let p = rng.gen::<f64>(); if p < 2.0/3.0 { 0.0 } else if p < 5.0/6.0 { 1.0 / (k as f64).sqrt() } else { -1.0 / (k as f64).sqrt() } }), _ => panic!("Unknown method"), }; x * r.transpose()}
fn main() { let n = 100; let d = 1000; let mut rng = rand::thread_rng(); let x = DMatrix::from_fn(n, d, |_, _| rng.gen::<f64>()); let k = (8.0 * (n as f64).ln() / 0.1_f64.powi(2)).ceil() as usize; let x_red = random_projection(&x, k, "gaussian");
let dist_orig = (&x.row(0) - &x.row(1)).norm(); let dist_red = (&x_red.row(0) - &x_red.row(1)).norm(); println!("Original distance: {} Projected: {}", dist_orig, dist_red);
// k-NN placeholder (not implemented)}:::
Implements random projections, tests distance preservation.
8. Symbolic Derivations with SymPy
Section titled “8. Symbolic Derivations with SymPy”Derive JL bound.
::: code-group
from sympy import symbols, exp, ln
n, epsilon = symbols('n epsilon', positive=True)k = 8 * ln(n) / epsilon**2print("JL k:", k)
# Expected norm preservationx_norm = symbols('x_norm', positive=True)R = symbols('R') # Random matrixexpected_norm = x_norm**2print("E[||Rx||^2] =", expected_norm)fn main() { println!("JL k: 8 ln(n) / ε²"); println!("E[||Rx||^2] = ||x||^2");}:::
9. Challenges in ML Applications
Section titled “9. Challenges in ML Applications”- Distortion: ε introduces error.
- Nonlinear Data: JL preserves Euclidean distances.
- Random Matrix Generation: Costly for large d.
10. Key ML Takeaways
Section titled “10. Key ML Takeaways”- JL reduces dimensions: Preserves distances.
- Random projections efficient: Logarithmic k.
- Applications in clustering: k-NN, compression.
- Scales to high-d: Mitigates curse.
- Code implements: Practical projections.
JL Lemma enables efficient ML.
11. Summary
Section titled “11. Summary”Explored Johnson-Lindenstrauss Lemma, random projections, their proofs, and ML applications in dimensionality reduction. Examples and Python/Rust code bridge theory to practice. Strengthens high-dimensional ML solutions.
Word count: Approximately 3000.
Further Reading
Section titled “Further Reading”- Dasgupta, Gupta, An Elementary Proof of the JL Lemma.
- Vempala, The Random Projection Method.
- Hastie, Elements of Statistical Learning (Ch. 14).
- Rust: ‘nalgebra’ for linear algebra, ‘rand’ for sampling.