Convolutional Neural Networks
Convolutional Neural Networks
Section titled “Convolutional Neural Networks”Convolutional Neural Networks (CNNs) are specialized neural networks for processing structured data, such as images, excelling in tasks like image classification and object detection. This section provides a comprehensive exploration of CNN architecture, convolution, pooling, and optimization, with a Rust lab using tch-rs. We’ll delve into computational details, gradient computation, and Rust’s performance advantages, building on feedforward neural networks.
Theory
Section titled “Theory”CNNs process input data (e.g., an image , where is height, is width, is channels) through layers: convolutional, pooling, and fully connected. Each layer extracts features, from low-level (edges) to high-level (objects).
Convolution Layer
Section titled “Convolution Layer”A convolution layer applies filters to extract features. A filter slides over the input, computing a feature map:
where is the output, is the bias, and is the number of filters. The output dimensions are:
where is padding, is stride.
Derivation: The convolution operation is a linear transformation, but with shared weights across spatial locations, reducing parameters compared to fully connected layers. For backpropagation, the gradient of the loss with respect to the filter is:
The input gradient is:
This requires a “flipped” convolution with the rotated kernel.
Under the Hood: Convolution is computationally intensive ( per layer). tch-rs optimizes this with GPU-accelerated FFT or Winograd algorithms, leveraging Rust’s integration with PyTorch’s C++ backend. Rust’s memory safety prevents buffer overflows during kernel sliding, a risk in C++ implementations, while outperforming Python’s pytorch for CPU tasks due to compiled efficiency.
Pooling Layer
Section titled “Pooling Layer”Pooling reduces spatial dimensions, enhancing translation invariance. Max-pooling selects the maximum value in a region:
Output dimensions are similar to convolution, with . The gradient passes to the max input during backpropagation.
Under the Hood: Max-pooling reduces computation and overfitting but may lose information. tch-rs implements pooling with optimized strided operations, ensuring cache efficiency, unlike Python’s dynamic memory allocation, which can fragment memory for large feature maps.
Fully Connected Layer
Section titled “Fully Connected Layer”The final layers are fully connected, mapping flattened feature maps to outputs (e.g., class probabilities via softmax), as in feedforward networks.
Optimization
Section titled “Optimization”CNNs are trained with backpropagation and gradient descent, minimizing a loss (e.g., cross-entropy):
where includes weights and biases. Adam optimizer is common, with regularization or dropout to prevent overfitting.
Under the Hood: CNNs require significant memory for feature maps and gradients. tch-rs uses Rust’s ownership model to manage tensor lifecycles, avoiding leaks common in C++ during backpropagation. The Adam optimizer’s adaptive updates converge quickly, with Rust’s compiled performance reducing training time compared to Python’s pytorch for CPU-bound tasks.
Evaluation
Section titled “Evaluation”Performance is evaluated with:
- Classification: Accuracy, Precision, Recall, F1-Score, ROC-AUC.
- Training/Validation Loss: Monitors overfitting.
- Inference Time: Critical for real-time applications.
Under the Hood: CNNs are computationally heavy, with inference time dominated by convolutions. tch-rs optimizes inference with batched tensor operations, leveraging Rust’s zero-cost abstractions for efficiency, unlike Python’s interpreter overhead.
Lab: CNN with tch-rs
Section titled “Lab: CNN with tch-rs”You’ll train a simple CNN on a synthetic image dataset for binary classification, evaluating accuracy and loss.
-
Edit
src/main.rsin yourrust_ml_tutorialproject:use tch::{nn, nn::Module, nn::OptimizerConfig, Device, Tensor};use ndarray::{array, Array4, Array1};fn main() -> Result<(), tch::TchError> {// Synthetic dataset: 8x8x1 images, binary target (0 or 1)let x: Array4<f64> = array![ // 10 samples, 8x8x1// Class 0: "dark" images[[[0.1, 0.2, 0.1, 0.3, 0.2, 0.1, 0.2, 0.1]; 8]; 1]; 5,// Class 1: "bright" images[[[0.9, 0.8, 0.9, 0.7, 0.8, 0.9, 0.8, 0.7]; 8]; 1]; 5,];let y: Array1<f64> = array![0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0];// Convert to tensorslet device = Device::Cpu;let xs = Tensor::from_slice(x.as_slice().unwrap()).to_device(device).reshape(&[10, 1, 8, 8]);let ys = Tensor::from_slice(y.as_slice().unwrap()).to_device(device);// Define CNNlet vs = nn::VarStore::new(device);let net = nn::seq().add(nn::conv2d(&vs.root() / "conv1", 1, 16, 3, nn::ConvConfig { stride: 1, padding: 1, ..Default::default() })).add_fn(|xs| xs.relu()).add_fn(|xs| xs.max_pool2d_default(2)).add_fn(|xs| xs.flatten(1, -1)).add(nn::linear(&vs.root() / "fc", 16 * 4 * 4, 1, Default::default())).add_fn(|xs| xs.sigmoid());// Optimizer (Adam)let mut opt = nn::Adam::default().build(&vs, 0.01)?;// Training loopfor epoch in 1..=100 {let logits = net.forward(&xs);let loss = logits.binary_cross_entropy_with_logits::<Tensor>(&ys, None, None, tch::Reduction::Mean);opt.zero_grad();loss.backward();opt.step();if epoch % 20 == 0 {println!("Epoch: {}, Loss: {}", epoch, f64::from(loss));}}// Evaluate accuracylet preds = net.forward(&xs).ge(0.5).to_kind(tch::Kind::Float);let correct = preds.eq_tensor(&ys).sum(tch::Kind::Int64);let accuracy = f64::from(&correct) / y.len() as f64;println!("Accuracy: {}", accuracy);Ok(())} -
Ensure Dependencies:
- Verify
Cargo.tomlincludes:[dependencies]tch = "0.17.0"ndarray = "0.15.0" - Run
cargo build.
- Verify
-
Run the Program:
Terminal window cargo runExpected Output (approximate):
Epoch: 20, Loss: 0.40Epoch: 40, Loss: 0.25Epoch: 60, Loss: 0.15Epoch: 80, Loss: 0.10Epoch: 100, Loss: 0.08Accuracy: 0.95
Understanding the Results
Section titled “Understanding the Results”- Dataset: Synthetic 8x8x1 images (10 samples) represent two classes (dark vs. bright), mimicking simple image data.
- Model: A CNN with one convolutional layer (16 filters, 3x3 kernel), ReLU, max-pooling, and a fully connected layer achieves ~95% accuracy.
- Loss: The cross-entropy loss decreases (~0.08), indicating convergence.
- Under the Hood:
tch-rsleverages PyTorch’s optimized convolution routines, with Rust’s memory safety preventing tensor mismanagement, a risk in C++ during backpropagation. The CNN’s sparse connectivity reduces parameters compared to fully connected networks, with Rust’s compiled performance outpacing Python’spytorchfor CPU tasks. Max-pooling enhances robustness to translations, critical for image tasks. - Evaluation: High accuracy confirms effective learning, though real-world image datasets require validation to detect overfitting.
This lab introduces CNNs, preparing for recurrent neural networks.
Next Steps
Section titled “Next Steps”Continue to Recurrent Neural Networks for sequence modeling, or revisit Neural Networks.
Further Reading
Section titled “Further Reading”- Deep Learning by Goodfellow et al. (Chapter 9)
- Hands-On Machine Learning by Géron (Chapter 14)
tch-rsDocumentation: github.com/LaurentMazare/tch-rs