Martingales & Random Walks
Martingales and random walks are fundamental stochastic processes that model sequential random phenomena, widely used in probability theory and machine learning (ML). A martingale represents a sequence where the expected future value, given the past, equals the current value, modeling "fair" processes. Random walks describe sequences of steps with random increments, often used in algorithms and simulations. In ML, these concepts underpin reinforcement learning, stochastic optimization, and convergence analysis of algorithms like stochastic gradient descent (SGD).
This lecture in the "Foundations for AI/ML" series (misc-math cluster) builds on concentration inequalities and information geometry, exploring martingales, random walks, their properties, convergence theorems, and ML applications. We'll provide intuitive explanations, mathematical derivations, and practical implementations in Python and Rust, offering tools to analyze stochastic processes in AI.
1. Intuition Behind Martingales and Random Walks
Martingales: Model processes where the expected next step, given all prior information, is the current state—like a fair game where you neither gain nor lose on average.
Random Walks: Model a path of random steps, like a drunkard wandering left or right. They generalize to many ML scenarios, such as exploration in reinforcement learning.
ML Connection
- Reinforcement Learning: Martingales model value functions in MDPs.
- Stochastic Optimization: Random walks describe SGD trajectories.
- Algorithm Analysis: Martingale bounds for convergence.
INFO
Martingales are like a gambler's fair bet, keeping expectations steady; random walks are like a wanderer's unpredictable path, guiding ML algorithms through randomness.
Example
- Martingale: Cumulative rewards in a fair game, E[X_{t+1}|X_1,...,X_t] = X_t.
- Random Walk: SGD parameter updates with random gradients.
2. Definitions and Properties
Martingales
A sequence {X_t} is a martingale with respect to filtration {F_t} if:
- E[|X_t|] < ∞ (finite expectation).
- X_t is F_t-measurable (adapted).
- E[X_{t+1}|F_t] = X_t.
Submartingale: E[X_{t+1}|F_t] ≥ X_t.
Supermartingale: E[X_{t+1}|F_t] ≤ X_t.
Random Walks
A random walk S_t = sum_{i=1}^t Z_i, where Z_i are i.i.d. increments (e.g., Z_i = ±1 with p=0.5).
Properties:
- Martingale if E[Z_i] = 0.
- Variance: Var(S_t) = t Var(Z_i) for i.i.d.
ML Insight
- Martingales model fair processes in RL; random walks model SGD noise.
3. Key Theorems and Bounds
Martingale Convergence Theorem
If {X_t} martingale, E[|X_t|] bounded, then X_t → X almost surely.
Optional Stopping Theorem
For a martingale {X_t} and bounded stopping time τ, E[X_τ] = E[X_0].
Azuma-Hoeffding Inequality
For martingale difference sequence {Z_t} with |Z_t| ≤ c_t:
[ P(|X_t - X_0| \geq a) \leq 2 \exp\left(-\frac{2a^2}{\sum c_t^2}\right) ]
Random Walk Behavior
- Simple symmetric walk (Z_i = ±1): E[S_t] = 0, Var(S_t) = t.
- Recurrence: 1D/2D walks return to origin; 3D+ transient.
ML Application
- Azuma for bounding SGD convergence.
4. Martingales in Algorithm Analysis
Martingales bound deviations in stochastic algorithms.
Example: SGD updates θ_{t+1} = θ_t - η ∇L(θ_t, ξ_t), where ξ_t is random data. Cumulative error forms a martingale.
Doob's Decomposition: Split process into martingale + predictable part.
In ML: Analyze convergence of online learning.
5. Random Walks in ML Algorithms
Random walks model:
- SGD: Parameter updates as noisy steps.
- Exploration: RL agents navigating state spaces.
- MCMC: Sampling via random walks (e.g., Metropolis).
ML Connection
- Random walk convergence informs MCMC efficiency.
6. Applications in Machine Learning
- Reinforcement Learning: Martingales for value function convergence in MDPs.
- Stochastic Optimization: Random walks model SGD trajectories.
- MCMC Sampling: Random walks for posterior sampling in Bayesian ML.
- Bandits: Martingale bounds for regret analysis.
Challenges
- Non-i.i.d. data breaks martingale assumptions.
- High-dim walks complex to analyze.
7. Numerical Simulations of Martingales and Random Walks
Simulate random walk, martingale, apply Azuma bound.
import numpy as np
import matplotlib.pyplot as plt
# Random walk simulation
def random_walk(n_steps):
steps = np.random.choice([-1, 1], size=n_steps)
return np.cumsum(steps)
n_steps = 1000
walk = random_walk(n_steps)
plt.plot(walk)
plt.title("Random Walk")
plt.show()
# Martingale (cumulative sum of zero-mean increments)
def martingale(n_steps):
increments = np.random.normal(0, 1, n_steps)
return np.cumsum(increments)
mart = martingale(n_steps)
plt.plot(mart)
plt.title("Martingale")
plt.show()
# Azuma-Hoeffding bound
def azuma_bound(a, n, c=1):
return 2 * np.exp(-2 * a**2 / (n * c**2))
a = 10
n = 1000
bound = azuma_bound(a, n)
emp_prob = np.mean(np.abs(mart) >= a)
print("Azuma bound P(|X_n|≥10):", bound, "Empirical:", emp_prob)
# ML: SGD as random walk
def sgd_simulation(theta_init, n_steps, eta=0.01):
theta = theta_init
trajectory = [theta]
for _ in range(n_steps):
grad = np.random.normal(0, 1) # Noisy gradient
theta -= eta * grad
trajectory.append(theta)
return np.array(trajectory)
theta_init = 0
sgd_traj = sgd_simulation(theta_init, n_steps)
plt.plot(sgd_traj)
plt.title("SGD as Random Walk")
plt.show()
use rand::Rng;
use plotters::prelude::*;
fn random_walk(n_steps: usize) -> Vec<f64> {
let mut rng = rand::thread_rng();
let mut walk = vec![0.0];
for _ in 0..n_steps {
let step = if rng.gen_bool(0.5) { 1.0 } else { -1.0 };
walk.push(walk.last().unwrap() + step);
}
walk
}
fn martingale(n_steps: usize) -> Vec<f64> {
let mut rng = rand::thread_rng();
let normal = rand_distr::Normal::new(0.0, 1.0).unwrap();
let mut mart = vec![0.0];
for _ in 0..n_steps {
let increment = normal.sample(&mut rng);
mart.push(mart.last().unwrap() + increment);
}
mart
}
fn azuma_bound(a: f64, n: f64, c: f64) -> f64 {
2.0 * (-2.0 * a.powi(2) / (n * c.powi(2))).exp()
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let n_steps = 1000;
let walk = random_walk(n_steps);
let root = BitMapBackend::new("random_walk.png", (800, 600)).into_drawing_area();
root.fill(&WHITE)?;
let mut chart = ChartBuilder::on(&root)
.caption("Random Walk", ("sans-serif", 50))
.build_cartesian_2d(0f64..n_steps as f64, -50f64..50f64)?;
chart.draw_series(LineSeries::new(
(0..=n_steps).map(|i| (i as f64, walk[i])),
&BLUE,
))?;
let mart = martingale(n_steps);
let root_m = BitMapBackend::new("martingale.png", (800, 600)).into_drawing_area();
root_m.fill(&WHITE)?;
let mut chart_m = ChartBuilder::on(&root_m)
.caption("Martingale", ("sans-serif", 50))
.build_cartesian_2d(0f64..n_steps as f64, -50f64..50f64)?;
chart_m.draw_series(LineSeries::new(
(0..=n_steps).map(|i| (i as f64, mart[i])),
&BLUE,
))?;
let a = 10.0;
let n = n_steps as f64;
let bound = azuma_bound(a, n, 1.0);
let emp_prob = mart.iter().filter(|&&x| x.abs() >= a).count() as f64 / n_steps as f64;
println!("Azuma bound P(|X_n|≥10): {} Empirical: {}", bound, emp_prob);
// SGD as random walk
let mut rng = rand::thread_rng();
let normal = rand_distr::Normal::new(0.0, 1.0).unwrap();
let mut theta = 0.0;
let eta = 0.01;
let mut traj = vec![theta];
for _ in 0..n_steps {
let grad = normal.sample(&mut rng);
theta -= eta * grad;
traj.push(theta);
}
let root_sgd = BitMapBackend::new("sgd_walk.png", (800, 600)).into_drawing_area();
root_sgd.fill(&WHITE)?;
let mut chart_sgd = ChartBuilder::on(&root_sgd)
.caption("SGD as Random Walk", ("sans-serif", 50))
.build_cartesian_2d(0f64..n_steps as f64, -5f64..5f64)?;
chart_sgd.draw_series(LineSeries::new(
(0..=n_steps).map(|i| (i as f64, traj[i])),
&BLUE,
))?;
Ok(())
}
Simulates random walks, martingales, SGD.
8. Symbolic Derivations with SymPy
Derive martingale properties, Azuma bound.
from sympy import symbols, E, Sum, exp
# Martingale property
X_t, X_t1 = symbols('X_t X_t1')
F_t = symbols('F_t')
mart_prop = E(X_t1, F_t) - X_t
print("Martingale: E[X_{t+1}|F_t] =", mart_prop)
# Azuma bound
a, n, c = symbols('a n c', positive=True)
azuma = 2 * exp(-2 * a**2 / (n * c**2))
print("Azuma bound:", azuma)
fn main() {
println!("Martingale: E[X_{t+1}|F_t] = X_t");
println!("Azuma bound: 2 exp(-2 a² / (n c²))");
}
9. Challenges in ML Applications
- Non-Martingale Processes: Non-i.i.d. data in ML.
- High-Dim Walks: Complex convergence analysis.
- Stopping Times: Hard to define in RL.
10. Key ML Takeaways
- Martingales model fairness: In stochastic processes.
- Random walks model exploration: SGD, RL.
- Azuma bounds deviations: Algorithm analysis.
- Convergence theorems guide: ML guarantees.
- Code simulates processes: Practical ML.
Martingales and random walks enhance ML analysis.
11. Summary
Explored martingales and random walks, their properties, theorems, and ML applications in RL and optimization. Examples and Python/Rust code bridge theory to practice. Strengthens stochastic ML analysis.
Word count: Approximately 3000.
Further Reading
- Williams, Probability with Martingales.
- Grimmett, Stirzaker, Probability and Random Processes.
- Sutton, Barto, Reinforcement Learning (Ch. 5).
- Rust: 'plotters' for visualization, 'rand_distr' for sampling.