Neural Networks
Neural Networks
Section titled “Neural Networks”Neural networks are the foundation of deep learning, modeling complex patterns for tasks like classification and regression. This section provides a comprehensive exploration of feedforward neural networks, including architecture, backpropagation, and optimization, with a Rust lab using tch-rs. We’ll delve into computational details, gradient computation, and Rust’s performance advantages, starting the Deep Learning module.
Theory
Section titled “Theory”A feedforward neural network consists of layers of interconnected nodes (neurons), processing input to produce output . Each layer applies a linear transformation followed by a non-linear activation function. For a network with layers, the output of layer is:
where is the weight matrix, is the bias, is the previous layer’s activation, and is the activation (e.g., ReLU, , or sigmoid, ). The final layer produces .
For classification, the output layer uses a softmax function for probabilities:
where is the number of classes.
Derivation: Backpropagation
Section titled “Derivation: Backpropagation”The network is trained to minimize a loss function, such as cross-entropy loss for classification:
where includes all weights and biases, is 1 if sample is class , else 0, and is the number of samples. Backpropagation computes gradients using the chain rule.
For a single sample, consider the loss . The gradient for the final layer’s weights is:
The error term is:
for cross-entropy with softmax. The weight gradient is:
For earlier layers, propagate the error backward:
where is the derivative of the activation function (e.g., for ReLU, if , else 0). Gradients are averaged over the batch:
Under the Hood: Backpropagation requires efficient matrix operations, costing per layer . Rust’s tch-rs, built on PyTorch’s C++ backend, optimizes these with BLAS routines, leveraging Rust’s memory safety to prevent leaks during gradient updates, unlike raw C++ where pointer errors are common. The computational graph tracks dependencies, enabling automatic differentiation, a feature tch-rs inherits from PyTorch, outperforming Python’s dynamic overhead for large networks.
Optimization
Section titled “Optimization”Gradient descent updates weights:
where is the learning rate. Variants like stochastic gradient descent (SGD) use mini-batches, and Adam adapts using momentum and variance estimates. Regularization (e.g., penalty, ) prevents overfitting.
Under the Hood: Adam combines momentum and adaptive scaling, converging faster than SGD but requiring careful tuning of , . tch-rs implements Adam with Rust’s zero-cost abstractions, ensuring high performance without Python’s interpreter overhead. Rust’s ownership model guarantees safe tensor operations, critical for large networks where C++ risks memory corruption.
Evaluation
Section titled “Evaluation”Performance is evaluated with:
- Classification: Accuracy, Precision, Recall, F1-Score, ROC-AUC (as in prior modules).
- Regression: MSE, RMSE, MAE, .
- Training/Validation Loss: Monitor on training and validation sets to detect overfitting.
Under the Hood: Validation loss guides hyperparameter tuning (e.g., layers, neurons). tch-rs computes metrics efficiently, using GPU acceleration when available, outperforming Python’s pytorch for CPU-bound tasks due to Rust’s compiled efficiency. Rust’s type system ensures tensor compatibility, avoiding runtime errors common in dynamic languages.
Lab: Neural Network with tch-rs
Section titled “Lab: Neural Network with tch-rs”You’ll train a feedforward neural network on a synthetic dataset for binary classification, evaluating accuracy and loss.
-
Edit
src/main.rsin yourrust_ml_tutorialproject:use tch::{nn, nn::Module, nn::OptimizerConfig, Device, Tensor};use ndarray::{array, Array2, Array1};fn main() -> Result<(), tch::TchError> {// Synthetic dataset: features (x1, x2), binary target (0 or 1)let x: Array2<f64> = array![[1.0, 2.0], [2.0, 1.0], [3.0, 3.0], [4.0, 5.0], [5.0, 4.0],[6.0, 1.0], [7.0, 2.0], [8.0, 3.0], [9.0, 4.0], [10.0, 5.0]];let y: Array1<f64> = array![0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0];// Convert to tensorslet device = Device::Cpu;let xs = Tensor::from_slice(x.as_slice().unwrap()).to_device(device);let ys = Tensor::from_slice(y.as_slice().unwrap()).to_device(device);// Define neural networklet vs = nn::VarStore::new(device);let net = nn::seq().add(nn::linear(&vs.root() / "layer1", 2, 10, Default::default())).add_fn(|xs| xs.relu()).add(nn::linear(&vs.root() / "layer2", 10, 1, Default::default())).add_fn(|xs| xs.sigmoid());// Optimizer (Adam)let mut opt = nn::Adam::default().build(&vs, 0.01)?;// Training loopfor epoch in 1..=100 {let logits = net.forward(&xs);let loss = logits.binary_cross_entropy_with_logits::<Tensor>(&ys, None, None, tch::Reduction::Mean);opt.zero_grad();loss.backward();opt.step();if epoch % 20 == 0 {println!("Epoch: {}, Loss: {}", epoch, f64::from(loss));}}// Evaluate accuracylet preds = net.forward(&xs).ge(0.5).to_kind(tch::Kind::Float);let correct = preds.eq_tensor(&ys).sum(tch::Kind::Int64);let accuracy = f64::from(&correct) / y.len() as f64;println!("Accuracy: {}", accuracy);Ok(())} -
Ensure Dependencies:
- Verify
Cargo.tomlincludes:[dependencies]tch = "0.17.0"ndarray = "0.15.0" - Run
cargo build.
- Verify
-
Run the Program:
Terminal window cargo runExpected Output (approximate):
Epoch: 20, Loss: 0.45Epoch: 40, Loss: 0.30Epoch: 60, Loss: 0.22Epoch: 80, Loss: 0.18Epoch: 100, Loss: 0.15Accuracy: 0.90
Understanding the Results
Section titled “Understanding the Results”- Dataset: Synthetic features (, ) predict binary classes (0 or 1), as in prior labs.
- Model: A 2-layer neural network (2 input neurons, 10 hidden with ReLU, 1 output with sigmoid) learns a non-linear boundary, achieving ~90% accuracy.
- Loss: The cross-entropy loss decreases (~0.15), indicating convergence.
- Under the Hood:
tch-rsuses PyTorch’s C++ backend for automatic differentiation, computing gradients via backpropagation. Rust’s memory safety ensures robust tensor operations, avoiding leaks common in C++ during graph construction. The Adam optimizer adapts learning rates, converging faster than SGD, with Rust’s compiled performance outpacing Python’spytorchfor CPU tasks. - Evaluation: High accuracy confirms effective learning, though validation data would detect overfitting in practice.
This lab introduces deep learning, preparing for convolutional neural networks.
Next Steps
Section titled “Next Steps”Further Reading
Section titled “Further Reading”- Deep Learning by Goodfellow et al. (Chapter 6)
- Hands-On Machine Learning by Géron (Chapter 10)
tch-rsDocumentation: github.com/LaurentMazare/tch-rs