Principal Component Analysis

Principal Component Analysis (PCA) is a key unsupervised learning technique for dimensionality reduction, transforming high-dimensional data into a lower-dimensional space while preserving variance. This section provides a comprehensive exploration of PCA’s theory, derivations, computational aspects, and evaluation, with a Rust lab using linfa. We’ll dive into the mechanics of eigenvalue decomposition, data transformation, and Rust’s computational efficiency.

Theory

PCA reduces a dataset with $m$ points ${x_{1}, x_{2}, \dots, x_{m}}$ , where each $x_{i} \in R^{n}$ , to $d < n$ dimensions by projecting onto directions (principal components) that maximize variance. The goal is to find a new basis where the data’s variance is captured by the top $d$ components.

Given a centered dataset (mean-subtracted, $x_{i} \leftarrow x_{i} - \bar{x}$ ), PCA computes the covariance matrix:

C = \frac{1}{m - 1} X^{T} X

where $X \in R^{m \times n}$ is the data matrix. The principal components are the eigenvectors of $C$ , and their eigenvalues represent the variance along each direction.

Derivation: Eigenvalue Decomposition

To maximize variance, PCA finds the direction $u_{1}$ (unit vector) that maximizes the projected variance:

Var (X u_{1}) = \frac{1}{m - 1} (X u_{1})^{T} (X u_{1}) = u_{1}^{T} C u_{1}

subject to $| | u_{1} | | = 1$ . Using Lagrange multipliers, maximize:

L (u_{1}, λ_{1}) = u_{1}^{T} C u_{1} - λ_{1} (u_{1}^{T} u_{1} - 1)

Taking the derivative with respect to $u_{1}$ :

\frac{\partial L}{\partial u_{1}} = 2 C u_{1} - 2 λ_{1} u_{1} = 0 ⟹ C u_{1} = λ_{1} u_{1}

Thus, $u_{1}$ is an eigenvector of $C$ , and $λ_{1}$ is the eigenvalue. The largest eigenvalue corresponds to the first principal component, maximizing variance. Subsequent components are orthogonal eigenvectors, ordered by decreasing eigenvalues.

The data is projected onto the top $d$ eigenvectors $U_{d} = [u_{1}, \dots, u_{d}]$ :

Z = X U_{d}

where $Z \in R^{m \times d}$ is the reduced dataset.

Under the Hood: Computing the covariance matrix and its eigendecomposition is costly ( $O (n^{3})$ for $n$ features). For large $m$ , singular value decomposition (SVD) of $X$ is more efficient:

X = U Σ V^{T}

where $V$ contains the principal components, and $Σ^{2} / (m - 1)$ gives the eigenvalues. Rust’s linfa leverages nalgebra’s optimized SVD, ensuring numerical stability and memory safety, unlike C++ where manual matrix operations risk errors.

Optimization: Choosing $d$

The number of components $d$ is chosen to retain a target explained variance ratio:

{EVR}_{d} = \frac{\sum_{i = 1}^{d} λ_{i}}{\sum_{i = 1}^{n} λ_{i}}

Typically, $d$ is selected so ${EVR}_{d} \geq 0.95$ (95% variance retained).

Under the Hood: Selecting $d$ balances information retention and dimensionality reduction. linfa computes eigenvalues efficiently, allowing rapid EVR calculation. Rust’s compile-time checks prevent indexing errors in eigenvalue sorting, unlike Python’s scikit-learn, which may require additional validation for large datasets.

Evaluation

PCA performance is evaluated with:

Explained Variance Ratio (EVR): As above, higher is better.
Reconstruction Error: Measures information loss: $Error = \frac{1}{m} \sum_{i = 1}^{m} | | x_{i} - {\hat{x}}_{i} | |^{2}$ where ${\hat{x}}_{i} = U_{d} z_{i} + \bar{x}$ , and $z_{i} = U_{d}^{T} (x_{i} - \bar{x})$ .
Downstream Task Performance: Accuracy or MSE of a model trained on $Z$ .

Under the Hood: The reconstruction error quantifies the trade-off between dimensionality and fidelity. Rust’s linfa optimizes projection and reconstruction, using vectorized operations to minimize computation time, outperforming Python for large $m$ due to Rust’s zero-cost abstractions.

Lab: PCA with `linfa`

You’ll apply PCA to a synthetic dataset (e.g., high-dimensional customer data) to reduce dimensions and evaluate EVR.

Edit src/main.rs in your rust_ml_tutorial project:

rust

use linfa::prelude::*;
use linfa_reduction::Pca;
use ndarray::{array, Array2};

fn main() {
    // Synthetic dataset: 4D features (x1, x2, x3, x4)
    let x: Array2<f64> = array![
        [1.0, 2.0, 1.1, 2.1], [1.5, 1.8, 1.4, 1.9], [1.2, 2.1, 1.3, 2.0], // Cluster 1
        [5.0, 8.0, 5.2, 8.2], [4.8, 7.9, 4.9, 7.8], [5.1, 8.1, 5.3, 8.0], // Cluster 2
        [9.0, 3.0, 9.1, 3.1], [8.8, 3.2, 8.9, 3.0], [9.2, 2.9, 9.0, 3.3]  // Cluster 3
    ];

    // Create dataset
    let dataset = DatasetBase::from(x.clone());

    // Apply PCA to reduce to 2D
    let pca = Pca::params(2)
        .fit(&dataset)
        .expect("PCA fitting failed");

    // Transform data
    let reduced = pca.transform(&x);
    println!("Reduced Data (2D):\n{:?}", reduced);

    // Compute explained variance ratio
    let evr = pca.explained_variance_ratio();
    println!("Explained Variance Ratio: {:?}", evr);
    let total_evr = evr.iter().sum::<f64>();
    println!("Total EVR: {}", total_evr);

    // Compute reconstruction error
    let reconstructed = pca.inverse_transform(&reduced);
    let error = x.iter().zip(reconstructed.iter())
        .map(|(xi, ri)| (xi - ri).powi(2)).sum::<f64>() / x.nrows() as f64;
    println!("Reconstruction Error: {}", error);
}

Ensure Dependencies:

Verify Cargo.toml includes:

toml

[dependencies]
linfa = "0.7.1"
linfa-reduction = "0.7.0"
ndarray = "0.15.0"

Run cargo build.

Run the Program:

bash

cargo run

Expected Output (approximate):

Reduced Data (2D):
[[-4.2, 0.1], [-4.0, -0.2], [-4.1, 0.0], [2.5, 0.3], [2.4, 0.1], [2.6, 0.2], [3.1, -0.1], [3.0, 0.0], [3.2, -0.2]]
Explained Variance Ratio: [0.85, 0.10]
Total EVR: 0.95
Reconstruction Error: 0.05

Understanding the Results

Dataset: Synthetic 4D features form three clusters, mimicking customer data.
Model: PCA reduces to 2D, capturing ~95% of variance (Total EVR ~0.95). The reduced data separates clusters effectively.
Reconstruction Error: Low error (~0.05) indicates minimal information loss.
Under the Hood: linfa uses SVD for PCA, leveraging nalgebra’s optimized linear algebra routines. Rust’s memory safety ensures robust handling of large matrices, unlike C++ where manual memory management risks leaks. The SVD computation avoids explicit covariance matrix formation, reducing $O (n^{2} m)$ costs for large $m$ , outperforming Python’s scikit-learn for high-dimensional data.
Evaluation: High EVR and low reconstruction error confirm effective dimensionality reduction, suitable for tasks like visualization or preprocessing for ML models.

This lab introduces dimensionality reduction, preparing for model evaluation.

Next Steps

Continue to Model Evaluation for performance metrics, or revisit K-Means Clustering.

Principal Component Analysis ​

Theory ​

Derivation: Eigenvalue Decomposition ​

Optimization: Choosing d ​

Evaluation ​

Lab: PCA with linfa ​

Understanding the Results ​

Next Steps ​

Further Reading ​