Conditional Probability & Bayes' Theorem

Conditional Probability & Bayes’ Theorem

Conditional probability and Bayes’ theorem are pivotal tools in probability theory, enabling reasoning under uncertainty by updating beliefs based on new evidence. In machine learning (ML), they form the backbone of algorithms like Naive Bayes classifiers, Bayesian networks, and probabilistic inference, allowing models to handle noisy data, make predictions, and incorporate prior knowledge. These concepts are essential for tasks ranging from spam filtering to medical diagnosis and generative modeling.

This fourth lecture in the “Probability Foundations for AI/ML” series builds on expectation, variance, and covariance, exploring conditional probability, Bayes’ theorem, their properties, and their applications in ML. We’ll provide intuitive explanations, rigorous derivations, and practical implementations in Python and Rust, preparing you for advanced topics like independence and maximum likelihood estimation.

1. Intuition Behind Conditional Probability

Conditional probability measures the likelihood of an event given that another has occurred. If you know it’s raining, what’s the chance you’ll need an umbrella? This is P(umbrella|rain).

Formally, for events A and B with P(B)>0:

[ P(A|B) = \frac{P(A \cap B)}{P(B)} ]

It’s the fraction of B’s probability where A also happens.

ML Connection

Classification: P(class|features) predicts labels given data.
Inference: Update model beliefs with new observations.

::: info Conditional probability shrinks the world to what’s known, like zooming into a subset of possibilities. :::

Example

Dice roll: P(sum=7|first die=3) = P({(3,4)})/P(first=3) = (1/36)/(1/6) = 1/6.

2. Formal Definition and Properties

Conditional Probability: P(A|B) = P(A∩B)/P(B).

Properties:

0≤P(A|B)≤1.
P(Ω|B)=1.
Additivity: P(A∪C|B)=P(A|B)+P(C|B) if A,C disjoint.

Multiplication Rule: P(A∩B) = P(A|B)P(B) = P(B|A)P(A).

ML Insight

Joint probabilities in graphical models decompose via conditionals.

Example: P(spam ∩ keyword) = P(keyword|spam)P(spam).

3. Bayes’ Theorem: Updating Beliefs

[ P(A|B) = \frac{P(B|A)P(A)}{P(B)} ]

P(A) prior, P(B|A) likelihood, P(B) evidence, P(A|B) posterior.

Derivation

From multiplication: P(A∩B)=P(A|B)P(B)=P(B|A)P(A).

Law of Total Probability

P(B) = sum P(B|A_i)P(A_i) over partition A_i.

[ P(A|B) = \frac{P(B|A)P(A)}{sum P(B|A_i)P(A_i)} ]

ML Connection

Naive Bayes: Assumes feature independence for P(features|class).
Bayesian Inference: Update model params with data.

Example: Medical test, P(disease|positive) = P(positive|disease)P(disease)/P(positive).

4. Conditional Distributions for Random Variables

For random variables X,Y:

Discrete: P(X=x|Y=y) = P(X=x,Y=y)/P(Y=y).
Continuous: f_{X|Y}(x|y) = f(x,y)/f_Y(y).

Conditional Expectation: E[X|Y=y] = sum x P(x|y) or ∫ x f(x|y) dx.

ML Application

Regression: E[Y|X=x] as prediction.
Hidden Markov Models: Conditional state probs.

5. Independence and Conditional Independence

X,Y independent if P(X,Y)=P(X)P(Y), implies P(X|Y)=P(X).

Conditional Independence: X,Y indep given Z if P(X,Y|Z)=P(X|Z)P(Y|Z).

In ML: Naive Bayes assumes features conditionally indep given class.

6. Bayes’ Theorem in Practice

Steps:

Define prior P(A).
Compute likelihood P(B|A).
Calculate evidence P(B) via total prob.
Compute posterior.

Example: Spam filter, P(spam|keyword) using word frequencies.

ML Insight

Bayesian nets: Cascade Bayes for structured inference.

7. Applications in Machine Learning

Naive Bayes: Classify emails as spam/non-spam.
Bayesian Optimization: Tune hyperparameters.
Diagnostics: P(disease|symptom) in medical ML.
Uncertainty: Posterior for confidence intervals.

8. Numerical Computations with Bayes

Simulate conditionals, estimate posteriors.

::: code-group

import numpy as np
from scipy.stats import norm

# Conditional prob: P(A|B) simulation
def cond_prob_sim(n_trials=10000):
    # A: X>0, B: X+Y>0, X,Y ~ N(0,1)
    X = np.random.normal(0, 1, n_trials)
    Y = np.random.normal(0, 1, n_trials)
    B = X + Y > 0
    A_and_B = (X > 0) & B
    return np.sum(A_and_B) / np.sum(B)

print("P(X>0|X+Y>0) ≈", cond_prob_sim())  # ~0.5

# Naive Bayes classifier
def naive_bayes(X, y, test):
    n_classes = len(np.unique(y))
    n_features = X.shape[1]
    priors = [np.mean(y==c) for c in range(n_classes)]
    likelihoods = []
    for c in range(n_classes):
        X_c = X[y==c]
        p = [np.mean(X_c[:,i]) for i in range(n_features)]  # Bernoulli
        likelihoods.append([p[i] if test[i]==1 else 1-p[i] for i in range(n_features)])
    probs = [priors[c] * np.prod(likelihoods[c]) for c in range(n_classes)]
    return probs / np.sum(probs)

X = np.array([[1,0],[0,1],[1,1],[0,0]])
y = np.array([1,0,1,0])
test = [1,1]
print("P(class|test):", naive_bayes(X, y, test))

use rand::Rng;
use rand_distr::{Normal, Distribution};

fn cond_prob_sim(n_trials: usize) -> f64 {
    let normal = Normal::new(0.0, 1.0).unwrap();
    let mut rng = rand::thread_rng();
    let mut count_ab = 0;
    let mut count_b = 0;
    for _ in 0..n_trials {
        let x = normal.sample(&mut rng);
        let y = normal.sample(&mut rng);
        if x + y > 0.0 {
            count_b += 1;
            if x > 0.0 {
                count_ab += 1;
            }
        }
    }
    count_ab as f64 / count_b as f64
}

fn naive_bayes(x: &[[u8; 2]], y: &[u8], test: &[u8]) -> [f64; 2] {
    let mut priors = [0.0; 2];
    for &c in y {
        priors[c as usize] += 1.0;
    }
    priors[0] /= y.len() as f64;
    priors[1] /= y.len() as f64;
    let mut likelihoods = [[0.0; 2]; 2];
    let mut counts = [0; 2];
    for (xi, &yi) in x.iter().zip(y.iter()) {
        counts[yi as usize] += 1;
        for j in 0..2 {
            likelihoods[yi as usize][j] += xi[j] as f64;
        }
    }
    for i in 0..2 {
        for j in 0..2 {
            likelihoods[i][j] /= counts[i] as f64;
        }
    }
    let mut probs = [0.0; 2];
    for c in 0..2 {
        probs[c] = priors[c] * test.iter().zip(likelihoods[c].iter()).map(|(&t, &p)| if t==1 {p} else {1.0-p}).product::<f64>();
    }
    let sum = probs.iter().sum::<f64>();
    [probs[0]/sum, probs[1]/sum]
}

fn main() {
    println!("P(X>0|X+Y>0) ≈ {}", cond_prob_sim(10000));

    let x = [[1,0],[0,1],[1,1],[0,0]];
    let y = [1,0,1,0];
    let test = [1,1];
    println!("P(class|test): {:?}", naive_bayes(&x, &y, &test));
}

:::

Simulates conditional probs, implements Naive Bayes.

9. Symbolic Bayes with SymPy

Exact calculations.

::: code-group

from sympy import symbols, Rational
p_A, p_B_A, p_B = symbols('p_A p_B_A p_B', positive=True)
p_A_B = (p_B_A * p_A) / p_B
print("Bayes P(A|B):", p_A_B)

# Example
p_disease = Rational(1,100)
p_pos_disease = Rational(99,100)
p_pos = Rational(5,100)
p_disease_pos = (p_pos_disease * p_disease) / p_pos
print("P(disease|positive):", p_disease_pos)

fn main() {
    println!("Bayes P(A|B): p(B|A)p(A)/p(B)");
    println!("P(disease|positive): 0.198");
}

:::

10. Challenges in ML Applications

High-Dim: P(B) hard to compute.
Independence Assumptions: Naive Bayes oversimplifies.

11. Key ML Takeaways

Conditionals update beliefs: Core to inference.
Bayes reverses probs: Prior to posterior.
ML relies on conditionals: Classification, networks.
Numerical sims practical: For complex probs.
Code implements: Bayes in action.

Bayes powers probabilistic reasoning.

12. Summary

Explored conditional probability, Bayes’ theorem, their properties, and ML applications like classification. Examples and Python/Rust code bridge theory to practice. Prepares for independence and LLN.

Word count: Approximately 2900.

Conditional Probability & Bayes' Theorem

Conditional Probability & Bayes’ Theorem

1. Intuition Behind Conditional Probability

ML Connection

Example

2. Formal Definition and Properties

ML Insight

3. Bayes’ Theorem: Updating Beliefs

Derivation

Law of Total Probability

ML Connection

4. Conditional Distributions for Random Variables

ML Application

5. Independence and Conditional Independence

6. Bayes’ Theorem in Practice

ML Insight

7. Applications in Machine Learning

8. Numerical Computations with Bayes

9. Symbolic Bayes with SymPy

10. Challenges in ML Applications

11. Key ML Takeaways

12. Summary

Further Reading