Skip to content

Martingales & Random Walks

Martingales and random walks are fundamental stochastic processes that model sequential random phenomena, widely used in probability theory and machine learning (ML). A martingale represents a sequence where the expected future value, given the past, equals the current value, modeling "fair" processes. Random walks describe sequences of steps with random increments, often used in algorithms and simulations. In ML, these concepts underpin reinforcement learning, stochastic optimization, and convergence analysis of algorithms like stochastic gradient descent (SGD).

This lecture in the "Foundations for AI/ML" series (misc-math cluster) builds on concentration inequalities and information geometry, exploring martingales, random walks, their properties, convergence theorems, and ML applications. We'll provide intuitive explanations, mathematical derivations, and practical implementations in Python and Rust, offering tools to analyze stochastic processes in AI.


1. Intuition Behind Martingales and Random Walks

Martingales: Model processes where the expected next step, given all prior information, is the current state—like a fair game where you neither gain nor lose on average.

Random Walks: Model a path of random steps, like a drunkard wandering left or right. They generalize to many ML scenarios, such as exploration in reinforcement learning.

ML Connection

  • Reinforcement Learning: Martingales model value functions in MDPs.
  • Stochastic Optimization: Random walks describe SGD trajectories.
  • Algorithm Analysis: Martingale bounds for convergence.

INFO

Martingales are like a gambler's fair bet, keeping expectations steady; random walks are like a wanderer's unpredictable path, guiding ML algorithms through randomness.

Example

  • Martingale: Cumulative rewards in a fair game, E[X_{t+1}|X_1,...,X_t] = X_t.
  • Random Walk: SGD parameter updates with random gradients.

2. Definitions and Properties

Martingales

A sequence {X_t} is a martingale with respect to filtration {F_t} if:

  • E[|X_t|] < ∞ (finite expectation).
  • X_t is F_t-measurable (adapted).
  • E[X_{t+1}|F_t] = X_t.

Submartingale: E[X_{t+1}|F_t] ≥ X_t.

Supermartingale: E[X_{t+1}|F_t] ≤ X_t.

Random Walks

A random walk S_t = sum_{i=1}^t Z_i, where Z_i are i.i.d. increments (e.g., Z_i = ±1 with p=0.5).

Properties:

  • Martingale if E[Z_i] = 0.
  • Variance: Var(S_t) = t Var(Z_i) for i.i.d.

ML Insight

  • Martingales model fair processes in RL; random walks model SGD noise.

3. Key Theorems and Bounds

Martingale Convergence Theorem

If {X_t} martingale, E[|X_t|] bounded, then X_t → X almost surely.

Optional Stopping Theorem

For a martingale {X_t} and bounded stopping time τ, E[X_τ] = E[X_0].

Azuma-Hoeffding Inequality

For martingale difference sequence {Z_t} with |Z_t| ≤ c_t:

[ P(|X_t - X_0| \geq a) \leq 2 \exp\left(-\frac{2a^2}{\sum c_t^2}\right) ]

Random Walk Behavior

  • Simple symmetric walk (Z_i = ±1): E[S_t] = 0, Var(S_t) = t.
  • Recurrence: 1D/2D walks return to origin; 3D+ transient.

ML Application

  • Azuma for bounding SGD convergence.

4. Martingales in Algorithm Analysis

Martingales bound deviations in stochastic algorithms.

Example: SGD updates θ_{t+1} = θ_t - η ∇L(θ_t, ξ_t), where ξ_t is random data. Cumulative error forms a martingale.

Doob's Decomposition: Split process into martingale + predictable part.

In ML: Analyze convergence of online learning.


5. Random Walks in ML Algorithms

Random walks model:

  • SGD: Parameter updates as noisy steps.
  • Exploration: RL agents navigating state spaces.
  • MCMC: Sampling via random walks (e.g., Metropolis).

ML Connection

  • Random walk convergence informs MCMC efficiency.

6. Applications in Machine Learning

  1. Reinforcement Learning: Martingales for value function convergence in MDPs.
  2. Stochastic Optimization: Random walks model SGD trajectories.
  3. MCMC Sampling: Random walks for posterior sampling in Bayesian ML.
  4. Bandits: Martingale bounds for regret analysis.

Challenges

  • Non-i.i.d. data breaks martingale assumptions.
  • High-dim walks complex to analyze.

7. Numerical Simulations of Martingales and Random Walks

Simulate random walk, martingale, apply Azuma bound.

python
import numpy as np
import matplotlib.pyplot as plt

# Random walk simulation
def random_walk(n_steps):
    steps = np.random.choice([-1, 1], size=n_steps)
    return np.cumsum(steps)

n_steps = 1000
walk = random_walk(n_steps)
plt.plot(walk)
plt.title("Random Walk")
plt.show()

# Martingale (cumulative sum of zero-mean increments)
def martingale(n_steps):
    increments = np.random.normal(0, 1, n_steps)
    return np.cumsum(increments)

mart = martingale(n_steps)
plt.plot(mart)
plt.title("Martingale")
plt.show()

# Azuma-Hoeffding bound
def azuma_bound(a, n, c=1):
    return 2 * np.exp(-2 * a**2 / (n * c**2))

a = 10
n = 1000
bound = azuma_bound(a, n)
emp_prob = np.mean(np.abs(mart) >= a)
print("Azuma bound P(|X_n|≥10):", bound, "Empirical:", emp_prob)

# ML: SGD as random walk
def sgd_simulation(theta_init, n_steps, eta=0.01):
    theta = theta_init
    trajectory = [theta]
    for _ in range(n_steps):
        grad = np.random.normal(0, 1)  # Noisy gradient
        theta -= eta * grad
        trajectory.append(theta)
    return np.array(trajectory)

theta_init = 0
sgd_traj = sgd_simulation(theta_init, n_steps)
plt.plot(sgd_traj)
plt.title("SGD as Random Walk")
plt.show()
rust
use rand::Rng;
use plotters::prelude::*;

fn random_walk(n_steps: usize) -> Vec<f64> {
    let mut rng = rand::thread_rng();
    let mut walk = vec![0.0];
    for _ in 0..n_steps {
        let step = if rng.gen_bool(0.5) { 1.0 } else { -1.0 };
        walk.push(walk.last().unwrap() + step);
    }
    walk
}

fn martingale(n_steps: usize) -> Vec<f64> {
    let mut rng = rand::thread_rng();
    let normal = rand_distr::Normal::new(0.0, 1.0).unwrap();
    let mut mart = vec![0.0];
    for _ in 0..n_steps {
        let increment = normal.sample(&mut rng);
        mart.push(mart.last().unwrap() + increment);
    }
    mart
}

fn azuma_bound(a: f64, n: f64, c: f64) -> f64 {
    2.0 * (-2.0 * a.powi(2) / (n * c.powi(2))).exp()
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let n_steps = 1000;
    let walk = random_walk(n_steps);
    let root = BitMapBackend::new("random_walk.png", (800, 600)).into_drawing_area();
    root.fill(&WHITE)?;
    let mut chart = ChartBuilder::on(&root)
        .caption("Random Walk", ("sans-serif", 50))
        .build_cartesian_2d(0f64..n_steps as f64, -50f64..50f64)?;
    chart.draw_series(LineSeries::new(
        (0..=n_steps).map(|i| (i as f64, walk[i])),
        &BLUE,
    ))?;

    let mart = martingale(n_steps);
    let root_m = BitMapBackend::new("martingale.png", (800, 600)).into_drawing_area();
    root_m.fill(&WHITE)?;
    let mut chart_m = ChartBuilder::on(&root_m)
        .caption("Martingale", ("sans-serif", 50))
        .build_cartesian_2d(0f64..n_steps as f64, -50f64..50f64)?;
    chart_m.draw_series(LineSeries::new(
        (0..=n_steps).map(|i| (i as f64, mart[i])),
        &BLUE,
    ))?;

    let a = 10.0;
    let n = n_steps as f64;
    let bound = azuma_bound(a, n, 1.0);
    let emp_prob = mart.iter().filter(|&&x| x.abs() >= a).count() as f64 / n_steps as f64;
    println!("Azuma bound P(|X_n|≥10): {} Empirical: {}", bound, emp_prob);

    // SGD as random walk
    let mut rng = rand::thread_rng();
    let normal = rand_distr::Normal::new(0.0, 1.0).unwrap();
    let mut theta = 0.0;
    let eta = 0.01;
    let mut traj = vec![theta];
    for _ in 0..n_steps {
        let grad = normal.sample(&mut rng);
        theta -= eta * grad;
        traj.push(theta);
    }
    let root_sgd = BitMapBackend::new("sgd_walk.png", (800, 600)).into_drawing_area();
    root_sgd.fill(&WHITE)?;
    let mut chart_sgd = ChartBuilder::on(&root_sgd)
        .caption("SGD as Random Walk", ("sans-serif", 50))
        .build_cartesian_2d(0f64..n_steps as f64, -5f64..5f64)?;
    chart_sgd.draw_series(LineSeries::new(
        (0..=n_steps).map(|i| (i as f64, traj[i])),
        &BLUE,
    ))?;

    Ok(())
}

Simulates random walks, martingales, SGD.


8. Symbolic Derivations with SymPy

Derive martingale properties, Azuma bound.

python
from sympy import symbols, E, Sum, exp

# Martingale property
X_t, X_t1 = symbols('X_t X_t1')
F_t = symbols('F_t')
mart_prop = E(X_t1, F_t) - X_t
print("Martingale: E[X_{t+1}|F_t] =", mart_prop)

# Azuma bound
a, n, c = symbols('a n c', positive=True)
azuma = 2 * exp(-2 * a**2 / (n * c**2))
print("Azuma bound:", azuma)
rust
fn main() {
    println!("Martingale: E[X_{t+1}|F_t] = X_t");
    println!("Azuma bound: 2 exp(-2 a² / (n c²))");
}

9. Challenges in ML Applications

  • Non-Martingale Processes: Non-i.i.d. data in ML.
  • High-Dim Walks: Complex convergence analysis.
  • Stopping Times: Hard to define in RL.

10. Key ML Takeaways

  • Martingales model fairness: In stochastic processes.
  • Random walks model exploration: SGD, RL.
  • Azuma bounds deviations: Algorithm analysis.
  • Convergence theorems guide: ML guarantees.
  • Code simulates processes: Practical ML.

Martingales and random walks enhance ML analysis.


11. Summary

Explored martingales and random walks, their properties, theorems, and ML applications in RL and optimization. Examples and Python/Rust code bridge theory to practice. Strengthens stochastic ML analysis.

Word count: Approximately 3000.


Further Reading

  • Williams, Probability with Martingales.
  • Grimmett, Stirzaker, Probability and Random Processes.
  • Sutton, Barto, Reinforcement Learning (Ch. 5).
  • Rust: 'plotters' for visualization, 'rand_distr' for sampling.