Skip to content

Markov Chains & Sequential Models

Markov chains are probabilistic models that describe sequences of events where the future state depends only on the current state, embodying the “memoryless” property. Sequential models extend this to handle time-series and ordered data in machine learning (ML), such as recurrent neural networks (RNNs) and hidden Markov models (HMMs). In AI, Markov chains model decision processes, language generation, and dynamic systems, providing a foundation for understanding temporal dependencies.

This tenth lecture in the “Probability Foundations for AI/ML” series builds on entropy and KL divergence, exploring Markov chain definitions, transition matrices, stationary distributions, ergodicity, HMMs, and their ML applications. We’ll provide intuitive explanations, mathematical formulations, and implementations in Python and Rust, preparing you for Bayesian inference in the series conclusion.


A Markov chain is a sequence where each state depends only on the previous one—like a board game where your next move depends on your current position, not how you got there.

Formally, P(X_{t+1} = j | X_t = i, X_{t-1}, …, X_0) = P(X_{t+1} = j | X_t = i).

This “memoryless” property simplifies modeling long sequences.

  • Time-Series: Predict stock prices based on recent history.
  • NLP: Generate text word by word.

::: info Markov chains model “next-step” predictions, like autocomplete suggesting words based on the last one. :::

  • Weather: States {sunny, rainy}, P(rainy|sunny)=0.2, P(sunny|rainy)=0.3.

Sequence: Sunny → Rainy → Sunny with probs.


Finite state space S = {1,2,…,m}.

Transition Matrix P, P_{ij} = P(X_{t+1}=j | X_t=i).

Rows sum to 1 (stochastic matrix).

n-step: P^{(n)} = P^n.

Initial dist π_0, dist at t: π_t = π_0 P^t.

Homogeneous: P constant.

  • Chapman-Kolmogorov: P^{(m+n)} = P^m P^n.
  • Parameterize P with NN for flexible models.

  • Transient/Recurrent: Recurrent if return prob=1.
  • Periodic: Period d if return times multiples of d>1.
  • Irreducible: All states communicate.
  • Ergodic: Irreducible, aperiodic, positive recurrent—long-run unique dist.

In ML: Assume ergodic for convergence.


π stationary if π P = π.

Unique if ergodic.

Solve (P^T - I) π^T =0, sum π=1.

Long-run proportion in state j: π_j.

  • PageRank: Stationary dist of web graph chain.

Example: Two-state P=[[0.9,0.1],[0.5,0.5]], π=[5/6,1/6].


Absorbing state: P_{ii}=1.

Canonical form, absorption probs.

In ML: Rarely, but for finite episodes in RL.


Rate matrix Q, P(t)=exp(Q t).

Jump chains, exponential holding times.

In ML: CTMCs for event modeling.


7. Hidden Markov Models (HMMs): Sequential Models

Section titled “7. Hidden Markov Models (HMMs): Sequential Models”

HMM: Observed Y_t, hidden states Z_t Markov.

Emission P(Y_t|Z_t), transition P(Z_t|Z_{t-1}).

Forward-backward for inference, Viterbi for decoding, Baum-Welch (EM) for learning.

  • Speech Recognition: Phonemes as hidden, audio as observed.
  • Replaced by RNNs/LSTMs, but foundational.

8. Markov Decision Processes (MDPs): RL Foundations

Section titled “8. Markov Decision Processes (MDPs): RL Foundations”

States, actions, transitions P(s’|s,a), rewards.

Policy π(a|s), value V(s)=E[sum r_t | s_0=s].

In ML: RL solves MDPs.


  1. NLP: Markov for n-gram language models.
  2. Time-Series: HMMs for anomaly detection.
  3. RL: MDPs for decision-making.
  4. Graph Models: Random walks for embeddings.
  • State explosion in large S.
  • Non-Markovian data.

10. Numerical Computations: Simulating Chains

Section titled “10. Numerical Computations: Simulating Chains”

Transition matrix power, steady-state.

::: code-group

import numpy as np
# Transition matrix power
P = np.array([[0.9, 0.1], [0.5, 0.5]])
P_n = np.linalg.matrix_power(P, 100)
print("P^100:", P_n)
# Stationary dist
from scipy.linalg import null_space
I = np.eye(2)
stat = null_space(P.T - I)
stat /= stat.sum()
print("Stationary π:", stat.flatten())
# Simulate chain
def simulate_markov(P, pi0, steps=100):
state = np.random.choice(len(pi0), p=pi0)
states = [state]
for _ in range(steps-1):
state = np.random.choice(len(P), p=P[state])
states.append(state)
return states
pi0 = [1, 0]
states = simulate_markov(P, pi0)
print("First 10 states:", states[:10])
# ML: HMM toy
from hmmlearn.hmm import GaussianHMM
model = GaussianHMM(n_components=2)
model.fit(np.random.normal(size=(100,1)))
print("HMM means:", model.means_)
fn matrix_power(p: [[f64; 2]; 2], n: u32) -> [[f64; 2]; 2] {
let mut result = [[1.0, 0.0], [0.0, 1.0]];
let mut base = p;
let mut exp = n;
while exp > 0 {
if exp % 2 == 1 {
result = [[result[0][0] * base[0][0] + result[0][1] * base[1][0], result[0][0] * base[0][1] + result[0][1] * base[1][1]],
[result[1][0] * base[0][0] + result[1][1] * base[1][0], result[1][0] * base[0][1] + result[1][1] * base[1][1]]];
}
base = [[base[0][0] * base[0][0] + base[0][1] * base[1][0], base[0][0] * base[0][1] + base[0][1] * base[1][1]],
[base[1][0] * base[0][0] + base[1][1] * base[1][0], base[1][0] * base[0][1] + base[1][1] * base[1][1]]];
exp /= 2;
}
result
}
fn main() {
let p = [[0.9, 0.1], [0.5, 0.5]];
let p_n = matrix_power(p, 100);
println!("P^100: {:?}", p_n);
// Stationary (solve (P^T - I) pi =0, sum pi=1)
let pt_minus_i = [[0.9 - 1.0, 0.5], [0.1, 0.5 - 1.0]];
// Simplified solve
let pi0 = 0.5 / (0.5 + 0.1);
let pi1 = 0.1 / (0.5 + 0.1);
println!("Stationary π: [{}, {}]", pi0, pi1);
// Simulate chain
let mut rng = rand::thread_rng();
let mut state = 0;
let mut states = vec![state];
for _ in 0..9 {
state = if rng.gen::<f64>() < p[state][0] { 0 } else { 1 };
states.push(state);
}
println!("First 10 states: {:?}", states);
}

:::

Computes transition powers, stationary, simulates chain.


Solve for stationary.

::: code-group

from sympy import symbols, Matrix, solve
p11, p12, p21, p22 = symbols('p11 p12 p21 p22')
P = Matrix([[p11, p12], [p21, p22]])
pi1, pi2 = symbols('pi1 pi2')
eq = P.T * Matrix([pi1, pi2]) - Matrix([pi1, pi2])
eq = eq.col_join(Matrix([pi1 + pi2 - 1]))
sol = solve(eq, [pi1, pi2])
print("Stationary π:", sol[pi1].subs({p11:0.9, p12:0.1, p21:0.5, p22:0.5}))
fn main() {
println!("Stationary π1: 5/6");
}

:::


  • State Space Explosion: Large S, use approximations.
  • Non-Markovian: Memory in data; use higher-order.

  • Markov models sequences: Next depends on current.
  • Transition matrix core: For probs.
  • Stationary long-run: Steady-states.
  • HMMs handle hidden: For sequential.
  • Code simulates: Chains, HMMs.

Markov chains power sequential AI.


Explored Markov chains from intuition to HMMs, properties, with ML applications. Examples and Python/Rust code illustrate concepts. Prepares for Bayesian inference.

Word count: Approximately 3000.


  • Norris, Markov Chains.
  • Murphy, Machine Learning (Ch. 17).
  • Rabiner, “HMMs” tutorial.
  • Rust: ‘rand’ for simulations.