Conditional Probability & Bayes' Theorem
Conditional Probability & Bayes’ Theorem
Section titled “Conditional Probability & Bayes’ Theorem”Conditional probability and Bayes’ theorem are pivotal tools in probability theory, enabling reasoning under uncertainty by updating beliefs based on new evidence. In machine learning (ML), they form the backbone of algorithms like Naive Bayes classifiers, Bayesian networks, and probabilistic inference, allowing models to handle noisy data, make predictions, and incorporate prior knowledge. These concepts are essential for tasks ranging from spam filtering to medical diagnosis and generative modeling.
This fourth lecture in the “Probability Foundations for AI/ML” series builds on expectation, variance, and covariance, exploring conditional probability, Bayes’ theorem, their properties, and their applications in ML. We’ll provide intuitive explanations, rigorous derivations, and practical implementations in Python and Rust, preparing you for advanced topics like independence and maximum likelihood estimation.
1. Intuition Behind Conditional Probability
Section titled “1. Intuition Behind Conditional Probability”Conditional probability measures the likelihood of an event given that another has occurred. If you know it’s raining, what’s the chance you’ll need an umbrella? This is P(umbrella|rain).
Formally, for events A and B with P(B)>0:
[ P(A|B) = \frac{P(A \cap B)}{P(B)} ]
It’s the fraction of B’s probability where A also happens.
ML Connection
Section titled “ML Connection”- Classification: P(class|features) predicts labels given data.
- Inference: Update model beliefs with new observations.
::: info Conditional probability shrinks the world to what’s known, like zooming into a subset of possibilities. :::
Example
Section titled “Example”- Dice roll: P(sum=7|first die=3) = P({(3,4)})/P(first=3) = (1/36)/(1/6) = 1/6.
2. Formal Definition and Properties
Section titled “2. Formal Definition and Properties”Conditional Probability: P(A|B) = P(A∩B)/P(B).
Properties:
- 0≤P(A|B)≤1.
- P(Ω|B)=1.
- Additivity: P(A∪C|B)=P(A|B)+P(C|B) if A,C disjoint.
Multiplication Rule: P(A∩B) = P(A|B)P(B) = P(B|A)P(A).
ML Insight
Section titled “ML Insight”- Joint probabilities in graphical models decompose via conditionals.
Example: P(spam ∩ keyword) = P(keyword|spam)P(spam).
3. Bayes’ Theorem: Updating Beliefs
Section titled “3. Bayes’ Theorem: Updating Beliefs”[ P(A|B) = \frac{P(B|A)P(A)}{P(B)} ]
P(A) prior, P(B|A) likelihood, P(B) evidence, P(A|B) posterior.
Derivation
Section titled “Derivation”From multiplication: P(A∩B)=P(A|B)P(B)=P(B|A)P(A).
Law of Total Probability
Section titled “Law of Total Probability”P(B) = sum P(B|A_i)P(A_i) over partition A_i.
[ P(A|B) = \frac{P(B|A)P(A)}{sum P(B|A_i)P(A_i)} ]
ML Connection
Section titled “ML Connection”- Naive Bayes: Assumes feature independence for P(features|class).
- Bayesian Inference: Update model params with data.
Example: Medical test, P(disease|positive) = P(positive|disease)P(disease)/P(positive).
4. Conditional Distributions for Random Variables
Section titled “4. Conditional Distributions for Random Variables”For random variables X,Y:
- Discrete: P(X=x|Y=y) = P(X=x,Y=y)/P(Y=y).
- Continuous: f_{X|Y}(x|y) = f(x,y)/f_Y(y).
Conditional Expectation: E[X|Y=y] = sum x P(x|y) or ∫ x f(x|y) dx.
ML Application
Section titled “ML Application”- Regression: E[Y|X=x] as prediction.
- Hidden Markov Models: Conditional state probs.
5. Independence and Conditional Independence
Section titled “5. Independence and Conditional Independence”X,Y independent if P(X,Y)=P(X)P(Y), implies P(X|Y)=P(X).
Conditional Independence: X,Y indep given Z if P(X,Y|Z)=P(X|Z)P(Y|Z).
In ML: Naive Bayes assumes features conditionally indep given class.
6. Bayes’ Theorem in Practice
Section titled “6. Bayes’ Theorem in Practice”Steps:
- Define prior P(A).
- Compute likelihood P(B|A).
- Calculate evidence P(B) via total prob.
- Compute posterior.
Example: Spam filter, P(spam|keyword) using word frequencies.
ML Insight
Section titled “ML Insight”- Bayesian nets: Cascade Bayes for structured inference.
7. Applications in Machine Learning
Section titled “7. Applications in Machine Learning”- Naive Bayes: Classify emails as spam/non-spam.
- Bayesian Optimization: Tune hyperparameters.
- Diagnostics: P(disease|symptom) in medical ML.
- Uncertainty: Posterior for confidence intervals.
8. Numerical Computations with Bayes
Section titled “8. Numerical Computations with Bayes”Simulate conditionals, estimate posteriors.
::: code-group
import numpy as npfrom scipy.stats import norm
# Conditional prob: P(A|B) simulationdef cond_prob_sim(n_trials=10000): # A: X>0, B: X+Y>0, X,Y ~ N(0,1) X = np.random.normal(0, 1, n_trials) Y = np.random.normal(0, 1, n_trials) B = X + Y > 0 A_and_B = (X > 0) & B return np.sum(A_and_B) / np.sum(B)
print("P(X>0|X+Y>0) ≈", cond_prob_sim()) # ~0.5
# Naive Bayes classifierdef naive_bayes(X, y, test): n_classes = len(np.unique(y)) n_features = X.shape[1] priors = [np.mean(y==c) for c in range(n_classes)] likelihoods = [] for c in range(n_classes): X_c = X[y==c] p = [np.mean(X_c[:,i]) for i in range(n_features)] # Bernoulli likelihoods.append([p[i] if test[i]==1 else 1-p[i] for i in range(n_features)]) probs = [priors[c] * np.prod(likelihoods[c]) for c in range(n_classes)] return probs / np.sum(probs)
X = np.array([[1,0],[0,1],[1,1],[0,0]])y = np.array([1,0,1,0])test = [1,1]print("P(class|test):", naive_bayes(X, y, test))use rand::Rng;use rand_distr::{Normal, Distribution};
fn cond_prob_sim(n_trials: usize) -> f64 { let normal = Normal::new(0.0, 1.0).unwrap(); let mut rng = rand::thread_rng(); let mut count_ab = 0; let mut count_b = 0; for _ in 0..n_trials { let x = normal.sample(&mut rng); let y = normal.sample(&mut rng); if x + y > 0.0 { count_b += 1; if x > 0.0 { count_ab += 1; } } } count_ab as f64 / count_b as f64}
fn naive_bayes(x: &[[u8; 2]], y: &[u8], test: &[u8]) -> [f64; 2] { let mut priors = [0.0; 2]; for &c in y { priors[c as usize] += 1.0; } priors[0] /= y.len() as f64; priors[1] /= y.len() as f64; let mut likelihoods = [[0.0; 2]; 2]; let mut counts = [0; 2]; for (xi, &yi) in x.iter().zip(y.iter()) { counts[yi as usize] += 1; for j in 0..2 { likelihoods[yi as usize][j] += xi[j] as f64; } } for i in 0..2 { for j in 0..2 { likelihoods[i][j] /= counts[i] as f64; } } let mut probs = [0.0; 2]; for c in 0..2 { probs[c] = priors[c] * test.iter().zip(likelihoods[c].iter()).map(|(&t, &p)| if t==1 {p} else {1.0-p}).product::<f64>(); } let sum = probs.iter().sum::<f64>(); [probs[0]/sum, probs[1]/sum]}
fn main() { println!("P(X>0|X+Y>0) ≈ {}", cond_prob_sim(10000));
let x = [[1,0],[0,1],[1,1],[0,0]]; let y = [1,0,1,0]; let test = [1,1]; println!("P(class|test): {:?}", naive_bayes(&x, &y, &test));}:::
Simulates conditional probs, implements Naive Bayes.
9. Symbolic Bayes with SymPy
Section titled “9. Symbolic Bayes with SymPy”Exact calculations.
::: code-group
from sympy import symbols, Rationalp_A, p_B_A, p_B = symbols('p_A p_B_A p_B', positive=True)p_A_B = (p_B_A * p_A) / p_Bprint("Bayes P(A|B):", p_A_B)
# Examplep_disease = Rational(1,100)p_pos_disease = Rational(99,100)p_pos = Rational(5,100)p_disease_pos = (p_pos_disease * p_disease) / p_posprint("P(disease|positive):", p_disease_pos)fn main() { println!("Bayes P(A|B): p(B|A)p(A)/p(B)"); println!("P(disease|positive): 0.198");}:::
10. Challenges in ML Applications
Section titled “10. Challenges in ML Applications”- High-Dim: P(B) hard to compute.
- Independence Assumptions: Naive Bayes oversimplifies.
11. Key ML Takeaways
Section titled “11. Key ML Takeaways”- Conditionals update beliefs: Core to inference.
- Bayes reverses probs: Prior to posterior.
- ML relies on conditionals: Classification, networks.
- Numerical sims practical: For complex probs.
- Code implements: Bayes in action.
Bayes powers probabilistic reasoning.
12. Summary
Section titled “12. Summary”Explored conditional probability, Bayes’ theorem, their properties, and ML applications like classification. Examples and Python/Rust code bridge theory to practice. Prepares for independence and LLN.
Word count: Approximately 2900.
Further Reading
Section titled “Further Reading”- Wasserman, All of Statistics (Ch. 2).
- Bishop, Pattern Recognition (Ch. 1.2, 2.1).
- 3Blue1Brown: Bayes’ theorem videos.
- Rust: ‘rand_distr’ for sampling.