Support Vector Machines

Support Vector Machines (SVMs) are powerful supervised learning models for classification, excelling in finding optimal decision boundaries. This section dives into their theory, derivations, kernel methods, and evaluation, with a Rust lab using linfa. We’ll explore computational intricacies and Rust’s role in optimizing SVM training.

Theory

SVMs classify data by finding the maximum-margin hyperplane that separates classes (e.g., spam vs. not spam). For a feature vector $x = [x_{1}, x_{2}, \dots, x_{n}]$ and binary labels $y \in {- 1, 1}$ , the hyperplane is defined as:

w^{T} x + b = 0

where $w$ is the weight vector, and $b$ is the bias. The margin is the distance to the nearest data point, maximized by minimizing $| | w | |^{2}$ subject to:

y_{i} (w^{T} x_{i} + b) \geq 1, \forall i

Derivation: Hard-Margin SVM

For linearly separable data, the hard-margin SVM optimizes:

min_{w, b} \frac{1}{2} | | w | |^{2} s.t. y_{i} (w^{T} x_{i} + b) \geq 1

This is a constrained optimization problem solved using Lagrange multipliers. The Lagrangian is:

L (w, b, α) = \frac{1}{2} | | w | |^{2} - \sum_{i = 1}^{m} α_{i} [y_{i} (w^{T} x_{i} + b) - 1]

where $α_{i} \geq 0$ are multipliers. Taking partial derivatives and setting them to zero:

\frac{\partial L}{\partial w} = w - \sum_{i = 1}^{m} α_{i} y_{i} x_{i} = 0 ⟹ w = \sum_{i = 1}^{m} α_{i} y_{i} x_{i}

\frac{\partial L}{\partial b} = - \sum_{i = 1}^{m} α_{i} y_{i} = 0

Substituting back, the dual problem maximizes:

W (α) = \sum_{i = 1}^{m} α_{i} - \frac{1}{2} \sum_{i, j = 1}^{m} α_{i} α_{j} y_{i} y_{j} x_{i}^{T} x_{j}

subject to $α_{i} \geq 0$ and $\sum_{i = 1}^{m} α_{i} y_{i} = 0$ . The non-zero $α_{i}$ correspond to support vectors, points on the margin.

Under the Hood: The dual problem depends only on dot products $x_{i}^{T} x_{j}$ , enabling kernel methods. Solving this requires quadratic programming, which linfa implements efficiently using Rust’s optimized linear algebra, avoiding memory leaks common in C++ solvers.

Soft-Margin SVM

For non-separable data, soft-margin SVM introduces slack variables $ξ_{i} \geq 0$ to allow violations:

min_{w, b, ξ} \frac{1}{2} | | w | |^{2} + C \sum_{i = 1}^{m} ξ_{i}

subject to:

y_{i} (w^{T} x_{i} + b) \geq 1 - ξ_{i}, ξ_{i} \geq 0

where $C$ controls the trade-off between margin size and errors. The dual problem is similar, with an additional constraint $α_{i} \leq C$ .

Kernel Methods

For non-linear boundaries, SVMs use the kernel trick, mapping data to a higher-dimensional space via a kernel function $K (x_{i}, x_{j}) = ϕ (x_{i})^{T} ϕ (x_{j})$ . Common kernels include:

Linear: $K (x_{i}, x_{j}) = x_{i}^{T} x_{j}$ .
Polynomial: $K (x_{i}, x_{j}) = (x_{i}^{T} x_{j} + c)^{d}$ .
Radial Basis Function (RBF): $K (x_{i}, x_{j}) = \exp (- γ | | x_{i} - x_{j} | |^{2})$ .

Under the Hood: The kernel trick avoids explicit mapping, computing dot products in the transformed space. linfa’s SVM implementation optimizes kernel evaluations, using Rust’s memory safety to handle large kernel matrices, unlike Python’s scikit-learn, which may face memory issues for high-dimensional data.

Evaluation

Performance is assessed with:

Accuracy: Proportion of correct predictions, $\frac{correct}{m}$ .
Precision, Recall, F1-Score: For imbalanced classes, as in logistic regression.
ROC-AUC: Area under the ROC curve, measuring class separation.

Under the Hood: SVMs excel in high-dimensional spaces due to the kernel trick, but tuning $C$ and kernel parameters (e.g., $γ$ ) is critical. Rust’s linfa provides efficient parameter sweeps, leveraging compile-time checks to ensure robust computation, unlike some C++ libraries prone to runtime errors.

Lab: SVM Classification with `linfa`

You’ll train an SVM classifier with an RBF kernel on a synthetic dataset (e.g., features predicting binary class) and evaluate accuracy.

Edit src/main.rs in your rust_ml_tutorial project:

rust

use linfa::prelude::*;
use linfa_svm::Svm;
use ndarray::{array, Array2, Array1};

fn main() {
    // Synthetic dataset: features (x1, x2), binary target (0 or 1)
    let x: Array2<f64> = array![
        [1.0, 2.0], [2.0, 1.0], [3.0, 3.0], [4.0, 5.0], [5.0, 4.0],
        [6.0, 1.0], [7.0, 2.0], [8.0, 3.0], [9.0, 4.0], [10.0, 5.0]
    ];
    let y: Array1<f64> = array![0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0];

    // Create dataset
    let dataset = Dataset::new(x.clone(), y.clone());

    // Train SVM with RBF kernel
    let model = Svm::<_, f64>::params()
        .gaussian_kernel(0.1)
        .pos_neg_weights(1.0, 1.0)
        .fit(&dataset)
        .unwrap();

    // Predict classes
    let predictions = model.predict(&x);
    println!("Predictions: {:?}", predictions);

    // Compute accuracy
    let accuracy = predictions.iter().zip(y.iter())
        .filter(|(p, t)| p == t).count() as f64 / y.len() as f64;
    println!("Accuracy: {}", accuracy);
}

Ensure Dependencies:

Verify Cargo.toml includes:

toml

[dependencies]
linfa = "0.7.1"
linfa-svm = "0.7.0"
ndarray = "0.15.0"

Run cargo build.

Run the Program:

bash

cargo run

Expected Output (approximate):

Predictions: [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
Accuracy: 1.0

Understanding the Results

Dataset: Synthetic features ( $x_{1}$ , $x_{2}$ ) predict binary classes (0 or 1), designed to be separable in a transformed space.
Model: The SVM with an RBF kernel learns a non-linear boundary, achieving perfect accuracy.
Under the Hood: linfa solves the dual problem using sequential minimal optimization (SMO), optimized for Rust’s memory safety and performance. The RBF kernel maps data to an infinite-dimensional space, enabling complex boundaries, with Rust’s ndarray ensuring efficient kernel computations. Compared to Python’s scikit-learn, Rust’s SVM avoids memory leaks in large-scale kernel matrix operations, critical for high-dimensional datasets.
Evaluation: Perfect accuracy suggests good separation, but real-world data requires tuning $C$ and $γ$ via cross-validation.

This lab introduces advanced classification, preparing for clustering and PCA.

Next Steps

Continue to K-Means Clustering for unsupervised learning, or revisit Decision Trees.

Support Vector Machines ​

Theory ​

Derivation: Hard-Margin SVM ​

Soft-Margin SVM ​

Kernel Methods ​

Evaluation ​

Lab: SVM Classification with linfa ​

Understanding the Results ​

Next Steps ​

Further Reading ​