First ML Lab
This section introduces your first machine learning (ML) task: linear regression using either the linfa
library in Rust or scikit-learn
in Python.
You’ll train a model to predict a continuous output, learning the basics of supervised learning. No prior ML experience is required.
What is Linear Regression? (Detailed)
Linear regression models the relationship between one or more input features and a continuous target variable by fitting a linear function. It is one of the simplest and most interpretable supervised learning algorithms — a great first lab for understanding ML end-to-end.
Model and Prediction
With
where
Loss Function (Mean Squared Error)
The most common objective is to minimize the Mean Squared Error (MSE) over the training set of size
Minimizing MSE gives the ordinary least squares solution.
Quick intuition: The MSE penalizes large errors more than small ones (because of the square), and averaging over
Explanation of MSE
This formula computes the average squared difference between the actual value
- Squaring ensures errors don’t cancel out and emphasizes larger errors.
- Averaging over
samples gives a single-number summary of performance. - Smaller MSE indicates better fit.
Mini numerical example: suppose
Compute squared errors:
MSE =
Closed-form Solution (Normal Equation)
For linear regression without regularization, the weights can be computed in closed form using the normal equation. If
This is efficient for small-to-medium problems but can be numerically unstable or expensive when
Quick intuition: The normal equation solves the linear system
Explanation of Normal Equation
Start with the system of linear equations:
Pre-multiply both sides by
Assuming
Why this helps:
Optimization (Gradient Descent)
When data is large or when using regularization, iterative optimization like gradient descent is used. For MSE, the gradient with respect to the weights is:
and weights are updated by stepping against the gradient (e.g.,
Quick intuition: The gradient points to the direction of greatest increase in loss; moving opposite the gradient reduces loss.
Explanation of Gradient Descent
The gradient
- The factor
scales the gradient according to dataset size. - In practice we use a learning rate
so updates look like: . - For large datasets, variants like SGD (stochastic gradient descent) or mini-batch SGD are used.
Regularization
To avoid overfitting, regularization penalizes large weights. Two common variants:
- Ridge (L2): adds
to the loss. Closed-form: - Lasso (L1): adds
; encourages sparsity but requires iterative solvers.
Explanation of Regularization
Adding a penalty term with coefficient
- Ridge (L2) shrinks weights smoothly and is effective when many small contributions exist.
- Lasso (L1) can set some weights exactly to zero, performing feature selection automatically.
- The hyperparameter
controls the strength of the penalty; larger ⇒ stronger shrinkage.
Assumptions and Limitations
Linear regression assumes:
- Linearity between features and target (or transformed features).
- Errors are independent and identically distributed with zero mean.
- No (strong) multicollinearity among features.
Violations lead to biased or high-variance estimates. We'll discuss diagnostics (residual plots, multicollinearity) in later modules.
Practical Considerations (Before Training)
- Train/Test Split: Always evaluate on unseen data. Common splits: 70/30 or 80/20.
- Feature Scaling: Standardize features when using gradient-based solvers or regularization.
- Multicollinearity: Highly correlated features inflate variance — consider PCA or feature selection.
- Outliers: Can strongly affect least-squares solutions; consider robust methods if needed.
- Evaluation Metrics: MSE, RMSE (sqrt of MSE), MAE (mean absolute error), and
(coefficient of determination).
Lab: Linear Regression (Code + Explanation)
You’ll train a linear regression model on a small synthetic dataset and inspect the learned parameters and predictions. Both Rust and Python examples are provided — use the tab UI to switch between them in the docs.
# first_lab.py - scikit-learn example
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
# Synthetic dataset: feature (x) and target (y)
X = np.array([[1.0], [2.0], [3.0], [4.0], [5.0]])
y = np.array([2.1, 4.2, 6.1, 8.3, 10.0])
# Train/test split (tiny dataset — in practice use larger data)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=42)
# Train linear regression model
model = LinearRegression().fit(X_train, y_train)
# Predict on test set and new data
y_pred = model.predict(X_test)
prediction_for_6 = model.predict(np.array([[6.0]]))[0]
# Metrics
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Model intercept:", model.intercept_)
print("Model weights:", model.coef_)
print("Test MSE:", mse)
print("Test R2:", r2)
print("Prediction for x=6:", prediction_for_6)
// main.rs - linfa example
use linfa::prelude::*;
use linfa_linear::LinearRegression;
use ndarray::{array, Axis};
use linfa::dataset::Dataset;
use ndarray::Array2;
fn main() {
// Synthetic dataset
let x = array![[1.0], [2.0], [3.0], [4.0], [5.0]];
let y = array![2.1, 4.2, 6.1, 8.3, 10.0];
// Create dataset (linfa expects an Array2 for features)
let x2: Array2<f64> = x.clone();
let dataset = Dataset::new(x2, y);
// Split dataset manually (simple example)
let (train, test) = dataset.split_with_ratio(0.6);
// Train model
let model = LinearRegression::default().fit(&train).unwrap();
// Predict
let y_pred = model.predict(&test.records);
let prediction_for_6 = model.predict(&array![[6.0]]);
// Compute simple MSE manually
let mse: f64 = (&y_pred - &test.targets).mapv(|v| v.powi(2)).mean().unwrap();
println!("Intercept: {}", model.intercept());
println!("Weights: {:?}", model.params());
println!("Test MSE: {}", mse);
println!("Prediction for x=6: {}", prediction_for_6[0]);
}
Dependencies
pip install numpy scikit-learn
[dependencies]
linfa = "0.7.1"
linfa-linear = "0.7.0"
ndarray = "0.15.0"
linfa-datasets = "0.7.0"
Run the Program
python first_lab.py
cargo run
Interpreting Results (Deeper)
- Intercept and Weights: The intercept is the model's baseline; weights show how much the target changes per unit change in each feature.
- Goodness of Fit: Use
to measure the fraction of variance explained by the model:
- Bias–Variance Tradeoff: Simple models (high bias) underfit; very flexible models (high variance) overfit. Linear regression is low-variance if features are few and samples many.
- Residual Analysis: Plot residuals to check homoscedasticity (constant variance) and patterns indicating model misspecification.
Explanation of R^2
= sum of squared residuals (errors) from your model. = total sum of squared deviations from the mean of .
Mini numerical example: for
So(the model explains 75% of variance).
Practical Extensions
- Polynomial Features: Model non-linear relationships by adding polynomial terms (e.g.,
, ). - Regularized Regression: Use Ridge or Lasso to penalize large weights.
- Cross-validation: Use K-fold CV to better estimate performance on small datasets.
- Feature Engineering: Create meaningful features, handle categorical variables (one-hot encoding), and impute missing values.
Next Steps
- Proceed to Mathematical Foundations for ML’s mathematical basis.
- Or revisit Rust Basics / Python Basics.
Further Reading
- An Introduction to Statistical Learning by James et al. (Chapter 3)
- Andrew Ng’s Machine Learning Specialization (Course 1, Week 1)
linfa
Documentation: https://github.com/rust-ml/linfascikit-learn
Documentation: https://scikit-learn.org