Markov Chains & Sequential Models
Markov Chains & Sequential Models
Section titled “Markov Chains & Sequential Models”Markov chains are probabilistic models that describe sequences of events where the future state depends only on the current state, embodying the “memoryless” property. Sequential models extend this to handle time-series and ordered data in machine learning (ML), such as recurrent neural networks (RNNs) and hidden Markov models (HMMs). In AI, Markov chains model decision processes, language generation, and dynamic systems, providing a foundation for understanding temporal dependencies.
This tenth lecture in the “Probability Foundations for AI/ML” series builds on entropy and KL divergence, exploring Markov chain definitions, transition matrices, stationary distributions, ergodicity, HMMs, and their ML applications. We’ll provide intuitive explanations, mathematical formulations, and implementations in Python and Rust, preparing you for Bayesian inference in the series conclusion.
1. Intuition Behind Markov Chains
Section titled “1. Intuition Behind Markov Chains”A Markov chain is a sequence where each state depends only on the previous one—like a board game where your next move depends on your current position, not how you got there.
Formally, P(X_{t+1} = j | X_t = i, X_{t-1}, …, X_0) = P(X_{t+1} = j | X_t = i).
This “memoryless” property simplifies modeling long sequences.
ML Connection
Section titled “ML Connection”- Time-Series: Predict stock prices based on recent history.
- NLP: Generate text word by word.
::: info Markov chains model “next-step” predictions, like autocomplete suggesting words based on the last one. :::
Example
Section titled “Example”- Weather: States {sunny, rainy}, P(rainy|sunny)=0.2, P(sunny|rainy)=0.3.
Sequence: Sunny → Rainy → Sunny with probs.
2. Formal Definition of Markov Chains
Section titled “2. Formal Definition of Markov Chains”Finite state space S = {1,2,…,m}.
Transition Matrix P, P_{ij} = P(X_{t+1}=j | X_t=i).
Rows sum to 1 (stochastic matrix).
n-step: P^{(n)} = P^n.
Initial dist π_0, dist at t: π_t = π_0 P^t.
Homogeneous: P constant.
Properties
Section titled “Properties”- Chapman-Kolmogorov: P^{(m+n)} = P^m P^n.
ML Insight
Section titled “ML Insight”- Parameterize P with NN for flexible models.
3. Classification of States
Section titled “3. Classification of States”- Transient/Recurrent: Recurrent if return prob=1.
- Periodic: Period d if return times multiples of d>1.
- Irreducible: All states communicate.
- Ergodic: Irreducible, aperiodic, positive recurrent—long-run unique dist.
In ML: Assume ergodic for convergence.
4. Stationary Distributions
Section titled “4. Stationary Distributions”π stationary if π P = π.
Unique if ergodic.
Solve (P^T - I) π^T =0, sum π=1.
Long-run proportion in state j: π_j.
ML Application
Section titled “ML Application”- PageRank: Stationary dist of web graph chain.
Example: Two-state P=[[0.9,0.1],[0.5,0.5]], π=[5/6,1/6].
5. Absorbing Markov Chains
Section titled “5. Absorbing Markov Chains”Absorbing state: P_{ii}=1.
Canonical form, absorption probs.
In ML: Rarely, but for finite episodes in RL.
6. Continuous-Time Markov Chains
Section titled “6. Continuous-Time Markov Chains”Rate matrix Q, P(t)=exp(Q t).
Jump chains, exponential holding times.
In ML: CTMCs for event modeling.
7. Hidden Markov Models (HMMs): Sequential Models
Section titled “7. Hidden Markov Models (HMMs): Sequential Models”HMM: Observed Y_t, hidden states Z_t Markov.
Emission P(Y_t|Z_t), transition P(Z_t|Z_{t-1}).
Forward-backward for inference, Viterbi for decoding, Baum-Welch (EM) for learning.
ML Connection
Section titled “ML Connection”- Speech Recognition: Phonemes as hidden, audio as observed.
- Replaced by RNNs/LSTMs, but foundational.
8. Markov Decision Processes (MDPs): RL Foundations
Section titled “8. Markov Decision Processes (MDPs): RL Foundations”States, actions, transitions P(s’|s,a), rewards.
Policy π(a|s), value V(s)=E[sum r_t | s_0=s].
In ML: RL solves MDPs.
9. Applications in Machine Learning
Section titled “9. Applications in Machine Learning”- NLP: Markov for n-gram language models.
- Time-Series: HMMs for anomaly detection.
- RL: MDPs for decision-making.
- Graph Models: Random walks for embeddings.
Challenges
Section titled “Challenges”- State explosion in large S.
- Non-Markovian data.
10. Numerical Computations: Simulating Chains
Section titled “10. Numerical Computations: Simulating Chains”Transition matrix power, steady-state.
::: code-group
import numpy as np
# Transition matrix powerP = np.array([[0.9, 0.1], [0.5, 0.5]])P_n = np.linalg.matrix_power(P, 100)print("P^100:", P_n)
# Stationary distfrom scipy.linalg import null_spaceI = np.eye(2)stat = null_space(P.T - I)stat /= stat.sum()print("Stationary π:", stat.flatten())
# Simulate chaindef simulate_markov(P, pi0, steps=100): state = np.random.choice(len(pi0), p=pi0) states = [state] for _ in range(steps-1): state = np.random.choice(len(P), p=P[state]) states.append(state) return states
pi0 = [1, 0]states = simulate_markov(P, pi0)print("First 10 states:", states[:10])
# ML: HMM toyfrom hmmlearn.hmm import GaussianHMMmodel = GaussianHMM(n_components=2)model.fit(np.random.normal(size=(100,1)))print("HMM means:", model.means_)fn matrix_power(p: [[f64; 2]; 2], n: u32) -> [[f64; 2]; 2] { let mut result = [[1.0, 0.0], [0.0, 1.0]]; let mut base = p; let mut exp = n; while exp > 0 { if exp % 2 == 1 { result = [[result[0][0] * base[0][0] + result[0][1] * base[1][0], result[0][0] * base[0][1] + result[0][1] * base[1][1]], [result[1][0] * base[0][0] + result[1][1] * base[1][0], result[1][0] * base[0][1] + result[1][1] * base[1][1]]]; } base = [[base[0][0] * base[0][0] + base[0][1] * base[1][0], base[0][0] * base[0][1] + base[0][1] * base[1][1]], [base[1][0] * base[0][0] + base[1][1] * base[1][0], base[1][0] * base[0][1] + base[1][1] * base[1][1]]]; exp /= 2; } result}
fn main() { let p = [[0.9, 0.1], [0.5, 0.5]]; let p_n = matrix_power(p, 100); println!("P^100: {:?}", p_n);
// Stationary (solve (P^T - I) pi =0, sum pi=1) let pt_minus_i = [[0.9 - 1.0, 0.5], [0.1, 0.5 - 1.0]]; // Simplified solve let pi0 = 0.5 / (0.5 + 0.1); let pi1 = 0.1 / (0.5 + 0.1); println!("Stationary π: [{}, {}]", pi0, pi1);
// Simulate chain let mut rng = rand::thread_rng(); let mut state = 0; let mut states = vec![state]; for _ in 0..9 { state = if rng.gen::<f64>() < p[state][0] { 0 } else { 1 }; states.push(state); } println!("First 10 states: {:?}", states);}:::
Computes transition powers, stationary, simulates chain.
9. Symbolic Computations with SymPy
Section titled “9. Symbolic Computations with SymPy”Solve for stationary.
::: code-group
from sympy import symbols, Matrix, solve
p11, p12, p21, p22 = symbols('p11 p12 p21 p22')P = Matrix([[p11, p12], [p21, p22]])pi1, pi2 = symbols('pi1 pi2')eq = P.T * Matrix([pi1, pi2]) - Matrix([pi1, pi2])eq = eq.col_join(Matrix([pi1 + pi2 - 1]))sol = solve(eq, [pi1, pi2])print("Stationary π:", sol[pi1].subs({p11:0.9, p12:0.1, p21:0.5, p22:0.5}))fn main() { println!("Stationary π1: 5/6");}:::
10. Challenges in ML Applications
Section titled “10. Challenges in ML Applications”- State Space Explosion: Large S, use approximations.
- Non-Markovian: Memory in data; use higher-order.
11. Key ML Takeaways
Section titled “11. Key ML Takeaways”- Markov models sequences: Next depends on current.
- Transition matrix core: For probs.
- Stationary long-run: Steady-states.
- HMMs handle hidden: For sequential.
- Code simulates: Chains, HMMs.
Markov chains power sequential AI.
12. Summary
Section titled “12. Summary”Explored Markov chains from intuition to HMMs, properties, with ML applications. Examples and Python/Rust code illustrate concepts. Prepares for Bayesian inference.
Word count: Approximately 3000.
Further Reading
Section titled “Further Reading”- Norris, Markov Chains.
- Murphy, Machine Learning (Ch. 17).
- Rabiner, “HMMs” tutorial.
- Rust: ‘rand’ for simulations.