Skip to content

Measure Theory Lite - Probability on Solid Ground

Measure Theory Lite - Probability on Solid Ground

Section titled “Measure Theory Lite - Probability on Solid Ground”

Measure theory provides the rigorous foundation for modern probability, extending integration to abstract spaces and enabling precise handling of continuous and discrete uncertainties. In artificial intelligence and machine learning, it underpins advanced concepts like stochastic processes, infinite-dimensional optimization, and probabilistic programming, ensuring consistency in models dealing with uncountable outcomes. By “lite,” we focus on key ideas without full abstraction, emphasizing intuition and applications.

This final lecture in the calculus series synthesizes prior topics, exploring measure spaces, sigma-algebras, Lebesgue measures and integrals, probability measures, and their implications for ML. We’ll blend accessible mathematics with practical insights, including examples and implementations in Python and Rust, to ground your understanding of probability’s solid mathematical base for AI innovations.


1. Why Measure Theory? Intuition and Motivation

Section titled “1. Why Measure Theory? Intuition and Motivation”

Probability intuitively assigns “sizes” to events, but for continuous spaces, simple length fails for pathological sets. Measure theory generalizes “size” (measure) to subsets via sigma-algebras, ensuring countable additivity and handling uncountables.

Think of it as upgrading from Riemann (intervals) to Lebesgue (more sets), allowing integration over complex domains.

  • Handles infinite data in Bayesian nonparametrics (e.g., Gaussian processes).
  • Rigorous expectations in reinforcement learning value functions.

::: info Measure theory makes probability “bulletproof,” like reinforcing a building to withstand any load—essential for uncountable AI uncertainties. :::

  • Lebesgue (1902): Solved integration issues with measures.
  • Kolmogorov (1933): Axiomatized probability as measure.

A sigma-algebra Σ on set Ω is a collection of subsets closed under complement, countable union/intersection, containing ∅ and Ω.

Events we can assign probabilities to.

Borel sigma-algebra: Generated by open intervals on R—standard for reals.

  • Power set too large; sigma-algebra minimal for limits.
  • Feature spaces: Measurable for integration in losses.

Example: On [0,1], Borel includes intervals, Cantors not problematic.


Measure μ: Σ → [0,∞], μ(∅)=0, countable additivity μ(∪ A_i)=sum μ(A_i) for disjoint A_i.

Probability measure: μ(Ω)=1.

Lebesgue measure on R: μ([a,b])=b-a.

Add null sets.

  • Data manifolds: Measure zero in high-dim, curse implications.

Example: Lebesgue on R^n generalizes volume.


4. Measurable Functions and Random Variables

Section titled “4. Measurable Functions and Random Variables”

f: Ω → R measurable if preimage of Borels in Σ.

Random variable: Measurable function on probability space.

Pushforward measure: P_f(B)=P(f^{-1}(B)).

  • Neural nets: Compositions measurable if activations continuous.

∫ f dμ = lim sum simple functions approx f.

For density, ∫ f g dx where g PDF.

Properties: Monotone convergence, dominated convergence theorems—justify limits under integral.

Vs Riemann: Handles discontinuities better.

E[X]= ∫ X dP.

Variance: E[(X-μ)^2].

  • Stochastic gradients: Expectations via samples (MC).

If ν << μ (absolute continuous), dν/dμ exists, ν(A)= ∫_A (dν/dμ) dμ.

In prob: PDF if P << Lebesgue.

Likelihoods in ML.


7. Product Measures and Fubini’s Theorem

Section titled “7. Product Measures and Fubini’s Theorem”

For joint: μ×ν on Σ×Τ.

Fubini: ∫∫ f(x,y) dμ dν = ∫ [∫ f dν] dμ if integrable.

Indep vars in ML.


∫ f(g(u)) |det Dg| du.

In ML: Normalizing flows for density estimation.


9. Convergence Concepts: In Measure, Almost Everywhere

Section titled “9. Convergence Concepts: In Measure, Almost Everywhere”

a.e.: Property holds except measure zero set.

In ML: Ignore pathological data points.


10. Probability Spaces and Kolmogorov Axioms

Section titled “10. Probability Spaces and Kolmogorov Axioms”

(Ω, Σ, P), P measure with P(Ω)=1.

Indep, conditional via measures.


11. Borel-Cantelli and Laws of Large Numbers

Section titled “11. Borel-Cantelli and Laws of Large Numbers”

Almost sure convergence.

In ML: Consistency of estimators.


12. Central Limit Theorem Measure-Theoretically

Section titled “12. Central Limit Theorem Measure-Theoretically”

Convergence in distribution.


  • GPs: Measures on function spaces.
  • PPL: Inference over measures.
  • Diff priv: Sensitivity measures.

Approximate integrals, simulate measures.

::: code-group

import numpy as np
from scipy.integrate import quad
# Lebesgue integral approx
def f(x):
return np.sin(x)**2
integral, _ = quad(f, 0, np.pi)
print("∫ sin^2 [0,π]:", integral) # π/2
# MC for measure
def mc_integral(f, a, b, n=10000):
x = np.random.uniform(a, b, n)
return (b - a) * np.mean(f(x))
print("MC:", mc_integral(f, 0, np.pi))
# ML: Expectation under normal
from scipy.stats import norm
samples = norm.rvs(loc=0, scale=1, size=10000)
exp = np.mean(np.sin(samples))
print("E[sin(X)] X~N(0,1):", exp)
use rand::Rng;
fn f(x: f64) -> f64 {
(x.sin()).powi(2)
}
fn mc_integral(f: fn(f64) -> f64, a: f64, b: f64, n: usize) -> f64 {
let mut rng = rand::thread_rng();
let mut sum = 0.0;
for _ in 0..n {
let x = rng.gen_range(a..b);
sum += f(x);
}
(b - a) * sum / n as f64
}
fn main() {
println!("MC ∫ sin^2 [0,π]: {}", mc_integral(f, 0.0, std::f64::consts::PI, 10000));
// Normal expectation sim
let normal = rand_distr::Normal::new(0.0, 1.0).unwrap();
let mut sum = 0.0;
for _ in 0..10000 {
let x = normal.sample(&mut rand::thread_rng());
sum += x.sin();
}
println!("E[sin(X)]:", sum / 10000.0);
}

:::

Approximates Lebesgue, MC expectations.


SymPy for densities.

::: code-group

from sympy import symbols, integrate, exp, oo, pi, sqrt
x = symbols('x')
pdf = 1/sqrt(2*pi) * exp(-x**2/2)
int_pdf = integrate(pdf, (x, -oo, oo))
print("∫ normal:", int_pdf)
exp_x = integrate(x * pdf, (x, -oo, oo))
print("E[X]:", exp_x)
// Hardcoded
fn main() {
println!("∫ normal: 1");
println!("E[X]: 0");
}

:::


16. Advanced Topics: Signed Measures, Hahn Decomposition

Section titled “16. Advanced Topics: Signed Measures, Hahn Decomposition”

Signed: Allow negative.

In ML: Rarely, but for signed densities in some kernels.


  • Sigma-algebras define events: Measurable for probs.
  • Lebesgue generalizes int: Handles irregularities.
  • Measures size abstractions: For complex spaces.
  • Functions measurable: RVs rigorous.
  • Code approximates: Practical probs.

Measure theory solidifies prob for AI.


Provided lite measure theory from sigma-algebras to Lebesgue, probability measures, with ML ties. Examples and code in Python/Rust. Concludes series, grounding calculus in prob for AI.

Word count: Approximately 3850.


  • Billingsley, Probability and Measure.
  • Murphy, Probabilistic ML (Ch. 21 advanced).
  • Folland, Real Analysis.
  • Rust: ‘rand’ for simulations, ‘statrs’ for stats.