Skip to content

Probability

Probability is fundamental to machine learning (ML), enabling models to handle uncertainty and make predictions. This section introduces probability basics, distributions, and Bayes’ theorem, with a Rust lab using the rand crate.

Probability Basics

Probability measures the likelihood of an event, ranging from 0 (impossible) to 1 (certain). For a random variable X, the probability of an outcome x is denoted P(X=x).

Example: For a fair die, P(X=3)=16.

Probability Distributions

A probability distribution describes how probabilities are distributed over a random variable’s values.

  • Discrete: For finite outcomes (e.g., die rolls). The probability mass function (PMF) gives P(X=x).
  • Continuous: For infinite outcomes (e.g., heights). The probability density function (PDF) defines probabilities over intervals.

Normal Distribution: A common continuous distribution with PDF:

f(x)=12πσ2e(xμ)22σ2

where μ is the mean and σ is the standard deviation. Used in ML for modeling data.

Bayes’ Theorem

Bayes’ theorem updates probabilities based on new evidence:

P(A|B)=P(B|A)P(A)P(B)

In ML, it’s used for classification (e.g., naive Bayes).

Lab: Simulating a Normal Distribution with rand

You’ll simulate samples from a normal distribution using the rand crate, illustrating probability in ML.

  1. Edit src/main.rs in your rust_ml_tutorial project:

    rust
    use rand::distributions::{Distribution, Normal};
    use rand::thread_rng;
    
    fn main() {
        // Define normal distribution (mean=0, std=1)
        let normal = Normal::new(0.0, 1.0);
        let mut rng = thread_rng();
    
        // Generate 5 samples
        let samples: Vec<f64> = (0..5).map(|_| normal.sample(&mut rng)).collect();
        println!("Samples from N(0,1): {:?}", samples);
    
        // Compute sample mean
        let mean = samples.iter().sum::<f64>() / samples.len() as f64;
        println!("Sample Mean: {}", mean);
    }
  2. Ensure Dependencies:

    • Verify Cargo.toml includes:
      toml
      [dependencies]
      rand = "0.8.5"
    • Run cargo build.
  3. Run the Program:

    bash
    cargo run

    Expected Output (values vary due to randomness):

    Samples from N(0,1): [0.123, -0.456, 1.789, -0.234, 0.678]
    Sample Mean: ~0.38

Understanding the Results

  • Distribution: The rand crate generates samples from N(0,1), mimicking a normal distribution.
  • Mean: The sample mean approximates the true mean (μ=0), converging with more samples.
  • ML Relevance: Distributions model data uncertainty, used in algorithms like Gaussian naive Bayes.

This lab prepares you for probabilistic ML models.

Learning from Official Resources

Deepen Rust skills with:

  • The Rust Programming Language (The Book): Free at doc.rust-lang.org/book.
  • Programming Rust: By Blandy, Orendorff, and Tindall.

Next Steps

Continue to Statistics for ML’s statistical methods, or revisit Calculus.

Further Reading

  • Deep Learning by Goodfellow et al. (Chapter 3)
  • An Introduction to Statistical Learning by James et al. (Prerequisites)
  • rand Documentation: docs.rs/rand