Skip to content

Law of Large Numbers & Central Limit Theorem

Law of Large Numbers & Central Limit Theorem

Section titled “Law of Large Numbers & Central Limit Theorem”

The Law of Large Numbers (LLN) and Central Limit Theorem (CLT) are foundational results in probability that explain how sample averages behave in large datasets. LLN assures that sample means converge to expected values, while CLT describes the normal distribution of those means. In machine learning (ML), these theorems justify empirical risk minimization, bootstrap methods, and confidence intervals, enabling reliable model training and evaluation from data.

This sixth lecture in the “Probability Foundations for AI/ML” series builds on independence and correlation, delving into LLN (weak and strong forms), CLT, their proofs, conditions, and ML applications. We’ll provide intuitive explanations, mathematical derivations, and implementations in Python and Rust, preparing you for advanced topics like maximum likelihood estimation.


LLN: As sample size grows, the average converges to the true mean—like flipping a coin many times, the proportion of heads approaches 0.5.

CLT: The distribution of sample means becomes normal, regardless of the underlying distribution, for large n—enabling Gaussian approximations.

  • LLN: Justifies training on large data; loss averages converge.
  • CLT: Confidence intervals for model performance.

::: info LLN stabilizes averages; CLT shapes them normal—together, they make big data predictable. :::

  • LLN: Sample mean of dice rolls →3.5.
  • CLT: Means of 100 rolls ~ N(3.5, var/100).

2. Law of Large Numbers: Weak and Strong Forms

Section titled “2. Law of Large Numbers: Weak and Strong Forms”

For i.i.d. X_i with E[X_i]=μ <∞.

Weak LLN (WLLN): Sample mean \bar{X}_n →^P μ (in probability).

Strong LLN (SLLN): \bar{X}_n →^a.s. μ (almost surely).

  • WLLN: Finite variance (Chebyshev) or identical dists (Khintchine).
  • SLLN: Finite expectation (Kolmogorov).

Var(\bar{X}_n)=σ^2/n →0, so P(|\bar{X}_n - μ| >ε) ≤ Var(\bar{X}_n)/ε^2 →0.

  • Empirical loss → true risk as n→∞.

3. Central Limit Theorem: Statement and Conditions

Section titled “3. Central Limit Theorem: Statement and Conditions”

For i.i.d. X_i, E[X_i]=μ, Var(X_i)=σ^2 <∞, then:

\sqrt{n} (\bar{X}_n - μ) →^d N(0,σ^2) as n→∞.

Standardized: Z_n = \sqrt{n} (\bar{X}_n - μ)/σ → N(0,1).

  • Lyapunov: Finite moments, centralizing.
  • Lindeberg: Generalizes for non-identical.
  • Moment generating functions converge to normal MGF.
  • Characteristic functions.
  • Bootstrap: Resample to estimate sampling dist ~ normal.

  • In probability: P(|X_n - X| >ε) →0.
  • Almost surely: P(X_n → X)=1.
  • In distribution: CDF F_n → F.
  • L^p: E[|X_n - X|^p] →0.
  • a.s. ⇒ in prob ⇒ in dist.
  • CLT in dist, LLN in prob/a.s.

In ML: Convergence guarantees for stochastic optimization.


  1. Empirical Risk Minimization: LLN ensures training loss ≈ test loss.
  2. Confidence Intervals: CLT for model accuracy bounds.
  3. Batch Normalization: Means/vars stabilize via LLN.
  4. Monte Carlo Methods: Averages converge by LLN.
  • Non-i.i.d. data: Violates assumptions; use mixing conditions.

Simulate LLN, CLT.

::: code-group

import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm
# LLN: Dice means converge
def lln_sim(n_samples=1000, n_rolls=100):
rolls = np.random.randint(1, 7, size=(n_samples, n_rolls))
means = np.cumsum(rolls, axis=1) / (np.arange(1, n_rolls+1))
return means
means = lln_sim()
plt.plot(np.arange(1, 101), means.T, color='blue', alpha=0.01)
plt.axhline(3.5, color='red', label='True Mean')
plt.title("LLN: Convergence of Sample Means")
plt.xlabel("Number of Rolls")
plt.ylabel("Sample Mean")
plt.legend()
plt.show() # Means approach 3.5
# CLT: Sample means dist
def clt_sim(n_samples=10000, sample_size=30):
sample_means = [np.mean(np.random.normal(0, 1, sample_size)) for _ in range(n_samples)]
return sample_means
sample_means = clt_sim()
plt.hist(sample_means, bins=50, density=True, alpha=0.6, color='blue')
x = np.linspace(-2, 2, 100)
plt.plot(x, norm.pdf(x, 0, np.sqrt(1/30)), color='red', label='N(0,1/30)')
plt.title("CLT: Distribution of Sample Means")
plt.xlabel("Sample Mean")
plt.ylabel("Density")
plt.legend()
plt.show()
# ML: Bootstrap CI for mean
def bootstrap_ci(data, n_boots=1000, ci=95):
boot_means = [np.mean(np.random.choice(data, len(data))) for _ in range(n_boots)]
low = np.percentile(boot_means, (100-ci)/2)
high = np.percentile(boot_means, 100 - (100-ci)/2)
return low, high
data = np.random.normal(5, 2, 100)
ci_low, ci_high = bootstrap_ci(data)
print("95% Bootstrap CI for mean:", ci_low, ci_high)
use rand::Rng;
use rand_distr::{Uniform, Normal, Distribution};
use plotters::prelude::*;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let root = BitMapBackend::new("lln.png", (800, 600)).into_drawing_area();
root.fill(&WHITE)?;
let mut chart = ChartBuilder::on(&root)
.caption("LLN: Convergence of Sample Means", ("sans-serif", 50))
.margin(5)
.x_label_area_size(30)
.y_label_area_size(30)
.build_cartesian_2d(0f64..100f64, 1f64..6f64)?;
chart.configure_mesh().draw()?;
let n_samples = 1000;
let n_rolls = 100;
let mut rng = rand::thread_rng();
let uniform = Uniform::new(1, 7);
for _ in 0..n_samples {
let mut sum = 0.0;
let mut means = vec![];
for _ in 0..n_rolls {
sum += uniform.sample(&mut rng) as f64;
means.push(sum / (means.len() as f64 + 1.0));
}
chart.draw_series(LineSeries::new(
(0..n_rolls).map(|i| (i as f64 + 1.0, means[i])),
BLUE.stroke_width(1).with_alpha(0.01),
))?;
}
chart.draw_series(LineSeries::new((0..100).map(|x| (x as f64, 3.5)), &RED))?;
// CLT histogram omitted for brevity
// Bootstrap CI
let normal_data = Normal::new(5.0, 2.0).unwrap();
let mut data = vec![0.0; 100];
for i in 0..100 {
data[i] = normal_data.sample(&mut rng);
}
let mut boot_means = vec![0.0; 1000];
for i in 0..1000 {
let mut sum = 0.0;
for _ in 0..100 {
let idx = rng.gen_range(0..100);
sum += data[idx];
}
boot_means[i] = sum / 100.0;
}
boot_means.sort_by(|a, b| a.partial_cmp(b).unwrap());
let ci_low = boot_means[25];
let ci_high = boot_means[975];
println!("95% Bootstrap CI for mean: {} - {}", ci_low, ci_high);
Ok(())
}

:::

Simulates LLN convergence, CLT histogram, bootstrap CI. Note: Rust plot requires ‘plotters’ crate; code saves to PNG.


SymPy for limits, sums.

::: code-group

from sympy import symbols, limit, Sum, oo
n = symbols('n', positive=True, integer=True)
x = symbols('x')
p = Sum(x / n, (x, 1, n)) / n # Mean of 1 to n, approximates integral
print("Sample mean:", p.simplify())
print("Limit n→∞:", limit(p, n, oo)) # 0.5
# CLT variance
var = symbols('var')
std_n = sqrt(var / n)
print("Std of mean:", limit(std_n, n, oo)) # 0
fn main() {
println!("Sample mean limit: 0.5");
println!("Std of mean limit: 0");
}

:::


  • Finite Samples: LLN/CLT asymptotic; small data violates.
  • Dependence: Non-i.i.d. data; use ergodic theorems.

  • LLN justifies sampling: Averages converge.
  • CLT enables normals: For CIs, approximations.
  • Convergence types matter: For guarantees.
  • ML relies on theorems: Training, evaluation.
  • Code demonstrates: Simulations vital.

LLN/CLT empower data-driven AI.


Explored LLN (weak/strong), CLT, convergence, with ML applications. Examples and Python/Rust code illustrate concepts. Prepares for MLE and MAP.

Word count: Approximately 2850.


  • Wasserman, All of Statistics (Ch. 5).
  • Murphy, Machine Learning (Ch. 5).
  • Khan Academy: LLN/CLT videos.
  • Rust: ‘plotters’ for viz, ‘rand_distr’ for sampling.