ANOVA & Comparing Multiple Groups
ANOVA & Comparing Multiple Groups
Section titled “ANOVA & Comparing Multiple Groups”Analysis of Variance (ANOVA) is a statistical method for comparing means across multiple groups to determine if differences are significant, extending t-tests to more than two groups. In machine learning (ML), ANOVA helps compare model performances, analyze feature importance, and assess treatment effects in experiments, ensuring robust insights into group differences.
This seventh lecture in the “Statistics Foundations for AI/ML” series builds on hypothesis testing, exploring one-way and two-way ANOVA, their assumptions, mathematical derivations, post-hoc tests, and ML applications. We’ll provide intuitive explanations, rigorous formulations, and practical implementations in Python and Rust, preparing you for resampling methods and advanced inference.
1. Why ANOVA Matters in ML
Section titled “1. Why ANOVA Matters in ML”When comparing more than two groups (e.g., multiple ML models, feature categories), multiple t-tests inflate Type I errors. ANOVA tests if at least one group mean differs, controlling error rates.
Applications:
- Compare accuracies of multiple algorithms.
- Analyze feature effects across groups (e.g., age groups in prediction).
ML Connection
Section titled “ML Connection”- Model Selection: Identify best-performing models.
- Feature Analysis: Assess categorical feature impacts.
- A/B Testing: Compare multiple treatments.
::: info ANOVA is like a referee deciding if teams’ scores differ significantly, avoiding repeated pairwise checks. :::
Example
Section titled “Example”- Test accuracies of three ML models: ANOVA checks if differences are significant.
2. One-Way ANOVA: Comparing Multiple Means
Section titled “2. One-Way ANOVA: Comparing Multiple Means”Tests if k group means are equal.
H₀: μ₁ = μ₂ = … = μ_k.
H₁: At least one μ_i differs.
y_{ij} = μ + α_i + ε_{ij}, ε_{ij} ~ N(0,σ²).
μ overall mean, α_i group effect, ε_{ij} error.
Test Statistic
Section titled “Test Statistic”F = (Between-group variance) / (Within-group variance).
[ F = \frac{\text{SSB}/(k-1)}{\text{SSW}/(N-k)} ]
SSB (sum of squares between): n_i (\bar{y}_i - \bar{y})^2.
SSW (sum of squares within): sum (y_{ij} - \bar{y}_i)^2.
N total samples, k groups.
F ~ F_{k-1,N-k} under H₀.
ML Application
Section titled “ML Application”- Compare model accuracies across k algorithms.
3. Two-Way ANOVA: Multiple Factors
Section titled “3. Two-Way ANOVA: Multiple Factors”Tests main effects and interactions of two factors (e.g., algorithm type, dataset size).
Model: y_{ijk} = μ + α_i + β_j + (αβ){ij} + ε{ijk}.
α_i, β_j main effects, (αβ)_{ij} interaction.
Test Statistics
Section titled “Test Statistics”F-tests for each effect:
- Main A: SSB_A / SSW.
- Main B: SSB_B / SSW.
- Interaction: SSB_AB / SSW.
ML Connection
Section titled “ML Connection”- Analyze feature interactions (e.g., age and income).
4. Assumptions of ANOVA
Section titled “4. Assumptions of ANOVA”- Normality: Errors ~ N(0,σ²).
- Homogeneity of Variance: Equal σ² across groups.
- Independence: Observations independent.
Violations: Use non-parametric (Kruskal-Wallis) or robust methods.
5. Post-Hoc Tests
Section titled “5. Post-Hoc Tests”If ANOVA rejects H₀, identify which groups differ:
- Tukey’s HSD: Pairwise comparisons.
- Bonferroni: Adjusts p-values for multiple tests.
In ML: Pinpoint best model or feature group.
6. Derivations and F-Distribution
Section titled “6. Derivations and F-Distribution”F = MSB/MSW, MSB = SSB/(k-1), MSW = SSW/(N-k).
Under H₀, F follows F-distribution with k-1, N-k df.
Derived from ratio of chi-square variables.
7. Applications in Machine Learning
Section titled “7. Applications in Machine Learning”- Model Comparison: ANOVA for k model accuracies.
- Feature Selection: Test categorical feature effects.
- Hyperparameter Tuning: Compare configurations.
- Experimentation: Multi-treatment A/B tests.
Challenges
Section titled “Challenges”- Multiple Testing: Adjust with Bonferroni.
- Non-Normality: Use transformations or non-parametric.
8. Numerical ANOVA Computations
Section titled “8. Numerical ANOVA Computations”Perform one-way ANOVA, post-hoc.
::: code-group
import numpy as npfrom scipy.stats import f_onewayfrom statsmodels.stats.multicomp import pairwise_tukeyhsd
# One-way ANOVAgroup1 = np.random.normal(10, 1, 30)group2 = np.random.normal(11, 1, 30)group3 = np.random.normal(10.5, 1, 30)f_stat, p_val = f_oneway(group1, group2, group3)print("One-way ANOVA: F=", f_stat, "p=", p_val)
# Post-hoc: Tukey's HSDdata = np.concatenate([group1, group2, group3])groups = np.array([1]*30 + [2]*30 + [3]*30)tukey = pairwise_tukeyhsd(data, groups)print("Tukey HSD:", tukey)
# ML: Model comparisonmodels = [np.random.normal(0.85, 0.1, 50), np.random.normal(0.87, 0.1, 50), np.random.normal(0.90, 0.1, 50)]f_stat, p_val = f_oneway(*models)print("Model ANOVA: F=", f_stat, "p=", p_val)fn anova_one_way(groups: &[Vec<f64>]) -> (f64, f64) { let k = groups.len() as f64; let n = groups.iter().map(|g| g.len() as f64).sum::<f64>(); let overall_mean = groups.iter().flat_map(|g| g.iter()).sum::<f64>() / n; let ssb = groups.iter().enumerate().map(|(i, g)| { let mean = g.iter().sum::<f64>() / g.len() as f64; g.len() as f64 * (mean - overall_mean).powi(2) }).sum::<f64>(); let ssw = groups.iter().map(|g| { let mean = g.iter().sum::<f64>() / g.len() as f64; g.iter().map(|&x| (x - mean).powi(2)).sum::<f64>() }).sum::<f64>(); let msb = ssb / (k - 1.0); let msw = ssw / (n - k); let f = msb / msw; (f, 0.0) // p-value requires F-dist table}
fn main() { let mut rng = rand::thread_rng(); let normal1 = rand_distr::Normal::new(10.0, 1.0).unwrap(); let normal2 = rand_distr::Normal::new(11.0, 1.0).unwrap(); let normal3 = rand_distr::Normal::new(10.5, 1.0).unwrap(); let group1: Vec<f64> = (0..30).map(|_| normal1.sample(&mut rng)).collect(); let group2: Vec<f64> = (0..30).map(|_| normal2.sample(&mut rng)).collect(); let group3: Vec<f64> = (0..30).map(|_| normal3.sample(&mut rng)).collect(); let (f, p) = anova_one_way(&[group1, group2, group3]); println!("One-way ANOVA: F={} p={}", f, p);
// ML: Model comparison let model1: Vec<f64> = (0..50).map(|_| rand_distr::Normal::new(0.85, 0.1).unwrap().sample(&mut rng)).collect(); let model2: Vec<f64> = (0..50).map(|_| rand_distr::Normal::new(0.87, 0.1).unwrap().sample(&mut rng)).collect(); let model3: Vec<f64> = (0..50).map(|_| rand_distr::Normal::new(0.90, 0.1).unwrap().sample(&mut rng)).collect(); let (f, p) = anova_one_way(&[model1, model2, model3]); println!("Model ANOVA: F={} p={}", f, p);}:::
Performs one-way ANOVA, Tukey’s HSD.
9. Symbolic Derivations with SymPy
Section titled “9. Symbolic Derivations with SymPy”Derive F-statistic.
::: code-group
from sympy import symbols, Sum, IndexedBase
k, n = symbols('k n', integer=True, positive=True)y, y_bar = IndexedBase('y'), IndexedBase('y_bar')n_i = symbols('n_i')overall_mean = symbols('y_bar_bar')ssb = Sum(n_i * (y_bar[i] - overall_mean)**2, (i, 1, k))ssw = Sum((y[i,j] - y_bar[i])**2, (i, 1, k), (j, 1, n_i))msb = ssb / (k-1)msw = ssw / (n-k)F = msb / mswprint("F-statistic:", F)fn main() { println!("F-statistic: [sum n_i (y_bar_i - y_bar)^2 / (k-1)] / [sum (y_ij - y_bar_i)^2 / (n-k)]");}:::
10. Challenges in ML Applications
Section titled “10. Challenges in ML Applications”- Non-Normality: Use Kruskal-Wallis.
- Unequal Variances: Welch’s ANOVA.
- Multiple Testing: Adjust p-values.
11. Key ML Takeaways
Section titled “11. Key ML Takeaways”- ANOVA compares groups: Multiple models/features.
- F-statistic tests means: Significant differences.
- Post-hoc pinpoints: Specific group diffs.
- Assumptions critical: Normality, variance.
- Code performs tests: Practical ANOVA.
ANOVA drives multi-group ML analysis.
12. Summary
Section titled “12. Summary”Explored one-way and two-way ANOVA, assumptions, derivations, and ML applications like model comparison. Examples and Python/Rust code bridge theory to practice. Prepares for resampling and advanced inference.
Word count: Approximately 3000.
Further Reading
Section titled “Further Reading”- Wasserman, All of Statistics (Ch. 11).
- James, Introduction to Statistical Learning (Ch. 3).
- Montgomery, Design and Analysis of Experiments.
- Rust: ‘statrs’ for stats, ‘nalgebra’ for matrices.