Skip to content

Support Vector Machines

Support Vector Machines (SVMs) are powerful supervised learning models for classification, excelling in finding optimal decision boundaries. This section dives into their theory, derivations, kernel methods, and evaluation, with a Rust lab using linfa. We’ll explore computational intricacies and Rust’s role in optimizing SVM training.

Theory

SVMs classify data by finding the maximum-margin hyperplane that separates classes (e.g., spam vs. not spam). For a feature vector x=[x1,x2,,xn] and binary labels y{1,1}, the hyperplane is defined as:

wTx+b=0

where w is the weight vector, and b is the bias. The margin is the distance to the nearest data point, maximized by minimizing ||w||2 subject to:

yi(wTxi+b)1,i

Derivation: Hard-Margin SVM

For linearly separable data, the hard-margin SVM optimizes:

minw,b12||w||2s.t.yi(wTxi+b)1

This is a constrained optimization problem solved using Lagrange multipliers. The Lagrangian is:

L(w,b,α)=12||w||2i=1mαi[yi(wTxi+b)1]

where αi0 are multipliers. Taking partial derivatives and setting them to zero:

Lw=wi=1mαiyixi=0w=i=1mαiyixiLb=i=1mαiyi=0

Substituting back, the dual problem maximizes:

W(α)=i=1mαi12i,j=1mαiαjyiyjxiTxj

subject to αi0 and i=1mαiyi=0. The non-zero αi correspond to support vectors, points on the margin.

Under the Hood: The dual problem depends only on dot products xiTxj, enabling kernel methods. Solving this requires quadratic programming, which linfa implements efficiently using Rust’s optimized linear algebra, avoiding memory leaks common in C++ solvers.

Soft-Margin SVM

For non-separable data, soft-margin SVM introduces slack variables ξi0 to allow violations:

minw,b,ξ12||w||2+Ci=1mξi

subject to:

yi(wTxi+b)1ξi,ξi0

where C controls the trade-off between margin size and errors. The dual problem is similar, with an additional constraint αiC.

Kernel Methods

For non-linear boundaries, SVMs use the kernel trick, mapping data to a higher-dimensional space via a kernel function K(xi,xj)=ϕ(xi)Tϕ(xj). Common kernels include:

  • Linear: K(xi,xj)=xiTxj.
  • Polynomial: K(xi,xj)=(xiTxj+c)d.
  • Radial Basis Function (RBF): K(xi,xj)=exp(γ||xixj||2).

Under the Hood: The kernel trick avoids explicit mapping, computing dot products in the transformed space. linfa’s SVM implementation optimizes kernel evaluations, using Rust’s memory safety to handle large kernel matrices, unlike Python’s scikit-learn, which may face memory issues for high-dimensional data.

Evaluation

Performance is assessed with:

  • Accuracy: Proportion of correct predictions, correctm.
  • Precision, Recall, F1-Score: For imbalanced classes, as in logistic regression.
  • ROC-AUC: Area under the ROC curve, measuring class separation.

Under the Hood: SVMs excel in high-dimensional spaces due to the kernel trick, but tuning C and kernel parameters (e.g., γ) is critical. Rust’s linfa provides efficient parameter sweeps, leveraging compile-time checks to ensure robust computation, unlike some C++ libraries prone to runtime errors.

Lab: SVM Classification with linfa

You’ll train an SVM classifier with an RBF kernel on a synthetic dataset (e.g., features predicting binary class) and evaluate accuracy.

  1. Edit src/main.rs in your rust_ml_tutorial project:

    rust
    use linfa::prelude::*;
    use linfa_svm::Svm;
    use ndarray::{array, Array2, Array1};
    
    fn main() {
        // Synthetic dataset: features (x1, x2), binary target (0 or 1)
        let x: Array2<f64> = array![
            [1.0, 2.0], [2.0, 1.0], [3.0, 3.0], [4.0, 5.0], [5.0, 4.0],
            [6.0, 1.0], [7.0, 2.0], [8.0, 3.0], [9.0, 4.0], [10.0, 5.0]
        ];
        let y: Array1<f64> = array![0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0];
    
        // Create dataset
        let dataset = Dataset::new(x.clone(), y.clone());
    
        // Train SVM with RBF kernel
        let model = Svm::<_, f64>::params()
            .gaussian_kernel(0.1)
            .pos_neg_weights(1.0, 1.0)
            .fit(&dataset)
            .unwrap();
    
        // Predict classes
        let predictions = model.predict(&x);
        println!("Predictions: {:?}", predictions);
    
        // Compute accuracy
        let accuracy = predictions.iter().zip(y.iter())
            .filter(|(p, t)| p == t).count() as f64 / y.len() as f64;
        println!("Accuracy: {}", accuracy);
    }
  2. Ensure Dependencies:

    • Verify Cargo.toml includes:
      toml
      [dependencies]
      linfa = "0.7.1"
      linfa-svm = "0.7.0"
      ndarray = "0.15.0"
    • Run cargo build.
  3. Run the Program:

    bash
    cargo run

    Expected Output (approximate):

    Predictions: [0, 0, 0, 0, 0, 1, 1, 1, 1, 1]
    Accuracy: 1.0

Understanding the Results

  • Dataset: Synthetic features (x1, x2) predict binary classes (0 or 1), designed to be separable in a transformed space.
  • Model: The SVM with an RBF kernel learns a non-linear boundary, achieving perfect accuracy.
  • Under the Hood: linfa solves the dual problem using sequential minimal optimization (SMO), optimized for Rust’s memory safety and performance. The RBF kernel maps data to an infinite-dimensional space, enabling complex boundaries, with Rust’s ndarray ensuring efficient kernel computations. Compared to Python’s scikit-learn, Rust’s SVM avoids memory leaks in large-scale kernel matrix operations, critical for high-dimensional datasets.
  • Evaluation: Perfect accuracy suggests good separation, but real-world data requires tuning C and γ via cross-validation.

This lab introduces advanced classification, preparing for clustering and PCA.

Next Steps

Continue to K-Means Clustering for unsupervised learning, or revisit Decision Trees.

Further Reading

  • An Introduction to Statistical Learning by James et al. (Chapter 9)
  • Hands-On Machine Learning by Géron (Chapter 5)
  • linfa Documentation: github.com/rust-ml/linfa