Bayesian Inference for Machine Learning

Bayesian inference treats parameters as random variables, updating prior beliefs with data to form posterior distributions. In machine learning (ML), it enables uncertainty quantification, regularization, and flexible modeling, as seen in Bayesian neural networks, Gaussian processes, and probabilistic programming. Unlike frequentist methods, Bayesian approaches provide full distributions over parameters, allowing better decision-making under uncertainty.

This eleventh and final lecture in the “Probability Foundations for AI/ML” series builds on entropy, Markov chains, and estimation techniques, exploring Bayesian basics, conjugate priors, approximate inference methods (MCMC, variational), and ML applications. We’ll provide intuitive explanations, mathematical derivations, and practical implementations in Python and Rust, concluding the series with a solid foundation in probabilistic ML.

1. Intuition Behind Bayesian Inference

Bayesian inference uses Bayes’ theorem to update beliefs: Posterior ∝ Likelihood × Prior.

Prior: Initial belief about parameters. Likelihood: Data’s probability given parameters. Posterior: Updated belief.

Think of it as learning from experience—start with assumptions, revise with evidence.

ML Connection

Uncertainty: Posteriors give confidence (e.g., credible intervals).
Regularization: Priors prevent overfitting.

::: info Bayesian inference turns “gut feelings” (priors) into data-driven knowledge (posteriors). :::

Example

Coin bias θ: Prior Beta(1,1) uniform, 3 heads in 5 flips, posterior Beta(4,3), mean 4/7≈0.57.

2. Formal Bayesian Framework

P(θ|D) = P(D|θ) P(θ) / P(D)

Marginal Likelihood (Evidence): P(D) = ∫ P(D|θ) P(θ) dθ.

Intractable often, use approximations.

Point Estimates: MAP or mean of posterior.

Priors

Conjugate: Posterior same family (e.g., Beta for Bernoulli).
Non-informative: Flat or Jeffreys.

ML Insight

Probabilistic models: Full P(θ|D) for ensembles.

3. Conjugate Priors and Closed-Form Posteriors

Conjugate: Prior family closed under likelihood.

Bernoulli-Beta: Prior Beta(α,β), likelihood Binomial, posterior Beta(α+k,β+n-k).

Normal-Normal: Prior N(μ0,τ^2), likelihood N(μ,σ^2 known), posterior N weighted.

Gamma-Poisson: Prior Gamma(α,β), posterior Gamma(α+sum x_i, β+n).

Properties

Easy computation.
Interpret α,β as pseudo-observations.

ML Application

Bayesian A/B testing with Beta.

4. Approximate Inference: MCMC Methods

For non-conjugate, sample posterior via Markov Chain Monte Carlo (MCMC).

Metropolis-Hastings: Propose θ’, accept with prob min(1, P(θ’|D)/P(θ|D) * q(θ|θ’)/q(θ’|θ)).

Gibbs Sampling: Sample each param conditionally.

Hamiltonian MC (HMC): Use gradients for efficient.

ML Connection

Pyro, Stan for probabilistic programming.

5. Variational Inference (VI)

Approximate posterior with q(θ|φ), min D_{KL}(q||P(θ|D)) ≈ max ELBO = E_q [log P(D,θ) - log q(θ)].

Mean-field: Assume q factorizes.

In ML: Faster than MCMC for large data.

6. Bayesian vs. Frequentist in ML

Frequentist: Point estimates, p-values. Bayesian: Distributions, credible intervals.

Bayesian advantages: Incorporate priors, full uncertainty.

Challenges: Computational cost, prior sensitivity.

7. Applications in Machine Learning

Bayesian Neural Nets: Priors on weights, posteriors for uncertainty.
Gaussian Processes: Bayesian non-param regression.
Variational Autoencoders: VI for latent.
RL: Thompson sampling with posteriors.

Challenges

Scalability: MCMC slow for big models.

8. Numerical Bayesian Inference

Sample posteriors, compute means.

::: code-group

import numpy as np
import pymc as pm
import arviz as az

# Conjugate: Beta-Bernoulli
k, n = 3, 5
with pm.Model() as model:
    p = pm.Beta('p', alpha=1, beta=1)
    obs = pm.Bernoulli('obs', p=p, observed=np.ones(k))  # Simplified
    trace = pm.sample(1000)

print("Posterior mean p:", trace.posterior['p'].mean())

# MCMC for normal mean
data = np.random.normal(0, 1, 100)
with pm.Model() as norm_model:
    mu = pm.Normal('mu', mu=0, sigma=10)
    sigma = pm.HalfNormal('sigma', sigma=10)
    obs = pm.Normal('obs', mu=mu, sigma=sigma, observed=data)
    trace_norm = pm.sample(1000)

print("Posterior mean μ:", trace_norm.posterior['mu'].mean())

# ML: Bayesian linear reg
X = np.array([[1,1],[1,2],[1,3]])
y = np.array([2,3,4])
with pm.Model() as blr:
    beta = pm.Normal('beta', mu=0, sigma=10, shape=2)
    sigma = pm.HalfNormal('sigma', sigma=10)
    mu = pm.math.dot(X, beta)
    obs = pm.Normal('obs', mu=mu, sigma=sigma, observed=y)
    trace_blr = pm.sample(1000)

print("Posterior beta mean:", trace_blr.posterior['beta'].mean(axis=0))

fn metropolis_bern(k: f64, n: f64, alpha: f64, beta: f64, steps: usize) -> f64 {
    let mut rng = rand::thread_rng();
    let mut p = 0.5;
    let mut sum = 0.0;
    for _ in 0..steps {
        let p_prop = rng.gen_range(0.0..1.0);
        let post_curr = (k + alpha - 1.0) * p.ln() + (n - k + beta - 1.0) * (1.0 - p).ln();
        let post_prop = (k + alpha - 1.0) * p_prop.ln() + (n - k + beta - 1.0) * (1.0 - p_prop).ln();
        let accept = (post_prop - post_curr).exp().min(1.0);
        if rng.gen::<f64>() < accept {
            p = p_prop;
        }
        sum += p;
    }
    sum / steps as f64
}

fn main() {
    let k = 3.0;
    let n = 5.0;
    let alpha = 1.0;
    let beta = 1.0;
    println!("Posterior mean p (MCMC): {}", metropolis_bern(k, n, alpha, beta, 10000));

    // Normal mean MCMC (simplified prior N(0,10))
    // Omit for brevity, similar proposal
}

:::

Implements conjugate sampling, MCMC for posterior, Bayesian linear reg.

9. Symbolic Bayesian with SymPy

Exact posteriors.

::: code-group

from sympy import symbols, exp, integrate, oo

p, k, n, alpha, beta_sym = symbols('p k n alpha beta', positive=True)
post = p**(k + alpha - 1) * (1-p)**(n - k + beta_sym - 1)
norm = integrate(post, (p, 0, 1))
mean = integrate(p * post, (p, 0, 1)) / norm
print("Posterior mean:", mean)

fn main() {
    println!("Posterior mean: (k + alpha)/(n + alpha + beta)");
}

:::

10. Challenges in Bayesian ML

Computation: MCMC/VI approximate.
Prior Sensitivity: Subjective choice.
Scalability: High-dim posteriors.

11. Key ML Takeaways

Bayesian updates beliefs: Prior to posterior.
Conjugates simplify: Closed forms.
MCMC/VI approximate: For complex.
Uncertainty quantification: Posteriors.
Code implements: Inference.

Bayesian inference empowers uncertain ML.

12. Summary

Explored Bayesian inference from intuition to MCMC/VI, conjugate priors, with ML applications. Examples and Python/Rust code bridge theory to practice. Concludes series with probabilistic ML foundation.

Word count: Approximately 3000.

Bayesian Inference for Machine Learning

Bayesian Inference for Machine Learning

1. Intuition Behind Bayesian Inference

ML Connection

Example

2. Formal Bayesian Framework

Priors

ML Insight

3. Conjugate Priors and Closed-Form Posteriors

Properties

ML Application

4. Approximate Inference: MCMC Methods

ML Connection

5. Variational Inference (VI)

6. Bayesian vs. Frequentist in ML

7. Applications in Machine Learning

Challenges

8. Numerical Bayesian Inference

9. Symbolic Bayesian with SymPy

10. Challenges in Bayesian ML

11. Key ML Takeaways

12. Summary

Further Reading