Resampling Methods - Bootstrap & Permutation Tests

Resampling methods like bootstrap and permutation tests leverage computational power to estimate sampling distributions and test hypotheses without strict distributional assumptions. In machine learning (ML), these techniques are vital for quantifying uncertainty in model performance, validating feature importance, and conducting robust hypothesis testing, especially when data violates normality or other parametric assumptions. Bootstrap estimates variability by resampling with replacement, while permutation tests assess significance by shuffling data labels.

This eighth lecture in the “Statistics Foundations for AI/ML” series builds on ANOVA and hypothesis testing, exploring the principles, algorithms, and ML applications of bootstrap and permutation tests. We’ll provide intuitive explanations, theoretical foundations, and practical implementations in Python and Rust, preparing you for MLE vs. method of moments and advanced inference topics.

1. Why Resampling Matters in ML

Traditional statistical methods rely on assumptions (e.g., normality, equal variances), which may not hold in ML datasets with complex distributions or small sizes. Resampling methods:

Estimate sampling distributions via computation.
Test hypotheses without parametric models.
Quantify uncertainty in metrics like accuracy or feature importance.

ML Connection

Bootstrap: Estimate confidence intervals for model performance.
Permutation Tests: Assess feature significance or model differences.
Model Evaluation: Robust metrics for small or non-normal data.

::: info Resampling is like repeatedly tasting different spoonfuls of soup to understand its flavor variability without assuming its recipe. :::

Example

Bootstrap: Estimate 95% CI for model accuracy from 100 samples.
Permutation: Test if a feature significantly impacts predictions.

2. Bootstrap: Estimating Sampling Distributions

Bootstrap Principle: Sample with replacement from data to mimic repeated sampling from the population, estimating the distribution of a statistic (e.g., mean, variance).

Algorithm:

From data {x₁,…,x_n}, draw B bootstrap samples (size n, with replacement).
Compute statistic θ_hat for each sample.
Use resulting distribution for inference (e.g., CI, SE).

Types

Non-parametric: Direct resampling.
Parametric: Fit model, sample from it.

Properties

Unbiased: E[θ_hat] ≈ θ for large n.
Consistency: Converges as n, B → ∞.

ML Application

Confidence intervals for model accuracy or regression coefficients.

Example: Data [1,2,3,4], bootstrap samples yield mean distribution.

3. Confidence Intervals via Bootstrap

Percentile Method: For statistic θ_hat, compute B bootstrap θ*_i, take [α/2, 1-α/2] percentiles for 100(1-α)% CI.

BCa (Bias-Corrected and Accelerated): Adjusts for bias and skewness.

ML Connection

CI for neural network test accuracy.

4. Permutation Tests: Non-Parametric Hypothesis Testing

Tests H₀ (no difference between groups) by shuffling labels to break associations.

Algorithm:

Compute test statistic T (e.g., mean difference).
Permute labels B times, recompute T*_i.
p-value = proportion of |T*_i| ≥ |T|.

Properties

Non-parametric: No distribution assumption.
Exact for small samples, approximate for large.

ML Application

Test feature importance by permuting feature values.

Example: Two groups [1,2,3], [4,5,6], permute labels to test mean difference.

5. Theoretical Foundations

Bootstrap: Relies on empirical distribution approximating population. CLT ensures bootstrap mean ~ N(θ, SE²).

Permutation: Under H₀, all permutations equally likely, giving exact p-value.

Assumptions

Bootstrap: i.i.d. or weak dependence.
Permutation: Exchangeability under H₀.

6. Applications in Machine Learning

Model Evaluation: Bootstrap CIs for accuracy, F1 score.
Feature Importance: Permutation tests for feature significance.
A/B Testing: Permutation for treatment effects.
Uncertainty Quantification: Bootstrap for robust metrics.

Challenges

Computation: High B costly.
Dependence: Non-i.i.d. data complicates.

7. Numerical Resampling Computations

Implement bootstrap and permutation tests.

::: code-group

import numpy as np
from scipy.stats import ttest_ind

# Bootstrap CI for mean
def bootstrap_ci(data, statistic=np.mean, n_boots=1000, alpha=0.05):
    boot_stats = [statistic(np.random.choice(data, len(data))) for _ in range(n_boots)]
    return np.percentile(boot_stats, [100*alpha/2, 100*(1-alpha/2)])

data = np.random.normal(10, 2, 100)
ci = bootstrap_ci(data)
print("Bootstrap 95% CI for mean:", ci)

# Permutation test for mean difference
def permutation_test(data1, data2, n_perms=1000):
    obs_diff = np.mean(data1) - np.mean(data2)
    combined = np.concatenate([data1, data2])
    perm_diffs = []
    for _ in range(n_perms):
        perm = np.random.permutation(combined)
        perm_diffs.append(np.mean(perm[:len(data1)]) - np.mean(perm[len(data1):]))
    p_val = np.mean(np.abs(perm_diffs) >= np.abs(obs_diff))
    return obs_diff, p_val

group1 = np.random.normal(10, 1, 30)
group2 = np.random.normal(11, 1, 30)
diff, p_val = permutation_test(group1, group2)
print("Permutation test: diff=", diff, "p=", p_val)

# ML: Feature importance via permutation
from sklearn.ensemble import RandomForestClassifier
X = np.random.rand(100, 2)
y = (X[:,0] + X[:,1] > 1).astype(int)
model = RandomForestClassifier(random_state=0).fit(X, y)
baseline_acc = model.score(X, y)
perm_acc = []
for _ in range(100):
    X_perm = X.copy()
    np.random.shuffle(X_perm[:,0])
    perm_acc.append(model.score(X_perm, y))
p_val = np.mean(np.array(perm_acc) <= baseline_acc)
print("Feature importance p:", p_val)

use rand::Rng;
use rand::seq::SliceRandom;

fn bootstrap_ci(data: &[f64], n_boots: usize, alpha: f64) -> (f64, f64) {
    let mut rng = rand::thread_rng();
    let mut boot_stats = vec![0.0; n_boots];
    for i in 0..n_boots {
        let boot_sample: Vec<f64> = (0..data.len()).map(|_| data[rng.gen_range(0..data.len())]).collect();
        boot_stats[i] = boot_sample.iter().sum::<f64>() / data.len() as f64;
    }
    boot_stats.sort_by(|a, b| a.partial_cmp(b).unwrap());
    (boot_stats[(n_boots as f64 * alpha / 2.0) as usize], boot_stats[(n_boots as f64 * (1.0 - alpha / 2.0)) as usize])
}

fn permutation_test(data1: &[f64], data2: &[f64], n_perms: usize) -> (f64, f64) {
    let mut rng = rand::thread_rng();
    let obs_diff = data1.iter().sum::<f64>() / data1.len() as f64 - data2.iter().sum::<f64>() / data2.len() as f64;
    let combined: Vec<f64> = data1.iter().chain(data2.iter()).copied().collect();
    let mut perm_diffs = vec![0.0; n_perms];
    for i in 0..n_perms {
        let mut perm = combined.clone();
        perm.shuffle(&mut rng);
        let mean1 = perm[..data1.len()].iter().sum::<f64>() / data1.len() as f64;
        let mean2 = perm[data1.len()..].iter().sum::<f64>() / data2.len() as f64;
        perm_diffs[i] = mean1 - mean2;
    }
    let p_val = perm_diffs.iter().filter(|&&d| d.abs() >= obs_diff.abs()).count() as f64 / n_perms as f64;
    (obs_diff, p_val)
}

fn main() {
    let mut rng = rand::thread_rng();
    let normal1 = rand_distr::Normal::new(10.0, 1.0).unwrap();
    let normal2 = rand_distr::Normal::new(11.0, 1.0).unwrap();

    // Bootstrap CI
    let data: Vec<f64> = (0..100).map(|_| normal1.sample(&mut rng)).collect();
    let ci = bootstrap_ci(&data, 1000, 0.05);
    println!("Bootstrap 95% CI for mean: {:?}", ci);

    // Permutation test
    let group1: Vec<f64> = (0..30).map(|_| normal1.sample(&mut rng)).collect();
    let group2: Vec<f64> = (0..30).map(|_| normal2.sample(&mut rng)).collect();
    let (diff, p_val) = permutation_test(&group1, &group2, 1000);
    println!("Permutation test: diff={} p={}", diff, p_val);
}

:::

Implements bootstrap CI and permutation tests.

8. Theoretical Insights

Bootstrap: Approximates sampling distribution via empirical distribution function.

Permutation: Exact under exchangeability, robust to non-normality.

ML Insight

Bootstrap for small datasets; permutation for feature tests.

9. Challenges in ML Applications

Computational Cost: High B slows computation.
Non-i.i.d. Data: Time-series, clustered data violate assumptions.
Small Samples: Bootstrap bias in low n.

10. Key ML Takeaways

Bootstrap estimates variability: CIs for metrics.
Permutation tests significance: Non-parametric.
Resampling robust: Avoids normality.
ML evaluation relies: On resampling.
Code implements: Practical tests.

Resampling powers flexible ML inference.

11. Summary

Explored bootstrap and permutation tests, their algorithms, and ML applications in uncertainty estimation and hypothesis testing. Examples and Python/Rust code bridge theory to practice. Prepares for MLE vs. MoM and Bayesian statistics.

Word count: Approximately 3000.

Resampling Methods - Bootstrap & Permutation Tests

Resampling Methods - Bootstrap & Permutation Tests

1. Why Resampling Matters in ML

ML Connection

Example

2. Bootstrap: Estimating Sampling Distributions

Types

Properties

ML Application

3. Confidence Intervals via Bootstrap

ML Connection

4. Permutation Tests: Non-Parametric Hypothesis Testing

Properties

ML Application

5. Theoretical Foundations

Assumptions

6. Applications in Machine Learning

Challenges

7. Numerical Resampling Computations

8. Theoretical Insights

ML Insight

9. Challenges in ML Applications

10. Key ML Takeaways

11. Summary

Further Reading