Measure Theory Lite - Probability on Solid Ground
Measure Theory Lite - Probability on Solid Ground
Section titled “Measure Theory Lite - Probability on Solid Ground”Measure theory provides the rigorous foundation for modern probability, extending integration to abstract spaces and enabling precise handling of continuous and discrete uncertainties. In artificial intelligence and machine learning, it underpins advanced concepts like stochastic processes, infinite-dimensional optimization, and probabilistic programming, ensuring consistency in models dealing with uncountable outcomes. By “lite,” we focus on key ideas without full abstraction, emphasizing intuition and applications.
This final lecture in the calculus series synthesizes prior topics, exploring measure spaces, sigma-algebras, Lebesgue measures and integrals, probability measures, and their implications for ML. We’ll blend accessible mathematics with practical insights, including examples and implementations in Python and Rust, to ground your understanding of probability’s solid mathematical base for AI innovations.
1. Why Measure Theory? Intuition and Motivation
Section titled “1. Why Measure Theory? Intuition and Motivation”Probability intuitively assigns “sizes” to events, but for continuous spaces, simple length fails for pathological sets. Measure theory generalizes “size” (measure) to subsets via sigma-algebras, ensuring countable additivity and handling uncountables.
Think of it as upgrading from Riemann (intervals) to Lebesgue (more sets), allowing integration over complex domains.
ML Connection
Section titled “ML Connection”- Handles infinite data in Bayesian nonparametrics (e.g., Gaussian processes).
- Rigorous expectations in reinforcement learning value functions.
::: info Measure theory makes probability “bulletproof,” like reinforcing a building to withstand any load—essential for uncountable AI uncertainties. :::
Historical Context
Section titled “Historical Context”- Lebesgue (1902): Solved integration issues with measures.
- Kolmogorov (1933): Axiomatized probability as measure.
2. Sigma-Algebras: The Measurable Sets
Section titled “2. Sigma-Algebras: The Measurable Sets”A sigma-algebra Σ on set Ω is a collection of subsets closed under complement, countable union/intersection, containing ∅ and Ω.
Events we can assign probabilities to.
Borel sigma-algebra: Generated by open intervals on R—standard for reals.
Properties
Section titled “Properties”- Power set too large; sigma-algebra minimal for limits.
ML Insight
Section titled “ML Insight”- Feature spaces: Measurable for integration in losses.
Example: On [0,1], Borel includes intervals, Cantors not problematic.
3. Measures: Generalizing Size
Section titled “3. Measures: Generalizing Size”Measure μ: Σ → [0,∞], μ(∅)=0, countable additivity μ(∪ A_i)=sum μ(A_i) for disjoint A_i.
Probability measure: μ(Ω)=1.
Lebesgue measure on R: μ([a,b])=b-a.
Complete Measures
Section titled “Complete Measures”Add null sets.
ML Application
Section titled “ML Application”- Data manifolds: Measure zero in high-dim, curse implications.
Example: Lebesgue on R^n generalizes volume.
4. Measurable Functions and Random Variables
Section titled “4. Measurable Functions and Random Variables”f: Ω → R measurable if preimage of Borels in Σ.
Random variable: Measurable function on probability space.
Pushforward measure: P_f(B)=P(f^{-1}(B)).
ML Connection
Section titled “ML Connection”- Neural nets: Compositions measurable if activations continuous.
5. Lebesgue Integration: Beyond Riemann
Section titled “5. Lebesgue Integration: Beyond Riemann”∫ f dμ = lim sum simple functions approx f.
For density, ∫ f g dx where g PDF.
Properties: Monotone convergence, dominated convergence theorems—justify limits under integral.
Vs Riemann: Handles discontinuities better.
Expectation
Section titled “Expectation”E[X]= ∫ X dP.
Variance: E[(X-μ)^2].
ML Insight
Section titled “ML Insight”- Stochastic gradients: Expectations via samples (MC).
6. Radon-Nikodym Theorem and Densities
Section titled “6. Radon-Nikodym Theorem and Densities”If ν << μ (absolute continuous), dν/dμ exists, ν(A)= ∫_A (dν/dμ) dμ.
In prob: PDF if P << Lebesgue.
Likelihoods in ML.
7. Product Measures and Fubini’s Theorem
Section titled “7. Product Measures and Fubini’s Theorem”For joint: μ×ν on Σ×Τ.
Fubini: ∫∫ f(x,y) dμ dν = ∫ [∫ f dν] dμ if integrable.
Indep vars in ML.
8. Change of Variables in Multidim
Section titled “8. Change of Variables in Multidim”∫ f(g(u)) |det Dg| du.
In ML: Normalizing flows for density estimation.
9. Convergence Concepts: In Measure, Almost Everywhere
Section titled “9. Convergence Concepts: In Measure, Almost Everywhere”a.e.: Property holds except measure zero set.
In ML: Ignore pathological data points.
10. Probability Spaces and Kolmogorov Axioms
Section titled “10. Probability Spaces and Kolmogorov Axioms”(Ω, Σ, P), P measure with P(Ω)=1.
Indep, conditional via measures.
11. Borel-Cantelli and Laws of Large Numbers
Section titled “11. Borel-Cantelli and Laws of Large Numbers”Almost sure convergence.
In ML: Consistency of estimators.
12. Central Limit Theorem Measure-Theoretically
Section titled “12. Central Limit Theorem Measure-Theoretically”Convergence in distribution.
13. ML Applications: GPs, Prob Prog, etc.
Section titled “13. ML Applications: GPs, Prob Prog, etc.”- GPs: Measures on function spaces.
- PPL: Inference over measures.
- Diff priv: Sensitivity measures.
14. Numerical Aspects in Code
Section titled “14. Numerical Aspects in Code”Approximate integrals, simulate measures.
::: code-group
import numpy as npfrom scipy.integrate import quad
# Lebesgue integral approxdef f(x): return np.sin(x)**2
integral, _ = quad(f, 0, np.pi)print("∫ sin^2 [0,π]:", integral) # π/2
# MC for measuredef mc_integral(f, a, b, n=10000): x = np.random.uniform(a, b, n) return (b - a) * np.mean(f(x))
print("MC:", mc_integral(f, 0, np.pi))
# ML: Expectation under normalfrom scipy.stats import normsamples = norm.rvs(loc=0, scale=1, size=10000)exp = np.mean(np.sin(samples))print("E[sin(X)] X~N(0,1):", exp)use rand::Rng;
fn f(x: f64) -> f64 { (x.sin()).powi(2)}
fn mc_integral(f: fn(f64) -> f64, a: f64, b: f64, n: usize) -> f64 { let mut rng = rand::thread_rng(); let mut sum = 0.0; for _ in 0..n { let x = rng.gen_range(a..b); sum += f(x); } (b - a) * sum / n as f64}
fn main() { println!("MC ∫ sin^2 [0,π]: {}", mc_integral(f, 0.0, std::f64::consts::PI, 10000));
// Normal expectation sim let normal = rand_distr::Normal::new(0.0, 1.0).unwrap(); let mut sum = 0.0; for _ in 0..10000 { let x = normal.sample(&mut rand::thread_rng()); sum += x.sin(); } println!("E[sin(X)]:", sum / 10000.0);}:::
Approximates Lebesgue, MC expectations.
15. Symbolic Measures and Integrals
Section titled “15. Symbolic Measures and Integrals”SymPy for densities.
::: code-group
from sympy import symbols, integrate, exp, oo, pi, sqrt
x = symbols('x')pdf = 1/sqrt(2*pi) * exp(-x**2/2)int_pdf = integrate(pdf, (x, -oo, oo))print("∫ normal:", int_pdf)
exp_x = integrate(x * pdf, (x, -oo, oo))print("E[X]:", exp_x)// Hardcodedfn main() { println!("∫ normal: 1"); println!("E[X]: 0");}:::
16. Advanced Topics: Signed Measures, Hahn Decomposition
Section titled “16. Advanced Topics: Signed Measures, Hahn Decomposition”Signed: Allow negative.
In ML: Rarely, but for signed densities in some kernels.
17. Key ML Takeaways
Section titled “17. Key ML Takeaways”- Sigma-algebras define events: Measurable for probs.
- Lebesgue generalizes int: Handles irregularities.
- Measures size abstractions: For complex spaces.
- Functions measurable: RVs rigorous.
- Code approximates: Practical probs.
Measure theory solidifies prob for AI.
18. Summary
Section titled “18. Summary”Provided lite measure theory from sigma-algebras to Lebesgue, probability measures, with ML ties. Examples and code in Python/Rust. Concludes series, grounding calculus in prob for AI.
Word count: Approximately 3850.
Further Reading
Section titled “Further Reading”- Billingsley, Probability and Measure.
- Murphy, Probabilistic ML (Ch. 21 advanced).
- Folland, Real Analysis.
- Rust: ‘rand’ for simulations, ‘statrs’ for stats.