Appearance
Statistics
Statistics provides tools to analyze data and evaluate machine learning (ML) models. This section introduces descriptive statistics, hypothesis testing, and confidence intervals, with a Rust lab using the statrs
crate.
Descriptive Statistics
Descriptive statistics summarize data through measures like:
- Mean: Average value,
. - Variance: Spread of data,
. - Standard Deviation:
.
In ML, these describe datasets and model performance.
Hypothesis Testing
Hypothesis testing assesses if observed data supports a hypothesis. A t-test compares means of two groups to determine if they differ significantly.
Example: Test if a model’s predictions have a different mean error than a baseline. The t-statistic is:
where
Confidence Intervals
A confidence interval estimates a parameter’s range (e.g., mean) with a confidence level (e.g., 95%). For a mean:
where
Lab: T-Test with statrs
You’ll perform a t-test on two synthetic datasets using statrs
to compare their means.
Edit
src/main.rs
in yourrust_ml_tutorial
project:rustuse statrs::statistics::{Data, Distribution}; use statrs::distribution::StudentsT; fn main() { // Synthetic datasets let data1 = vec![2.1, 2.3, 2.0, 2.2, 2.4]; // Group 1 let data2 = vec![2.5, 2.7, 2.4, 2.6, 2.8]; // Group 2 // Compute means let mean1 = Data::new(data1.clone()).mean().unwrap(); let mean2 = Data::new(data2.clone()).mean().unwrap(); println!("Mean1: {}, Mean2: {}", mean1, mean2); // Perform t-test (assuming equal variances) let t_stat = t_test(&data1, &data2); println!("T-statistic: {}", t_stat); // Check p-value (two-tailed, df=8) let t_dist = StudentsT::new(0.0, 1.0, 8.0).unwrap(); let p_value = 2.0 * (1.0 - t_dist.cdf(t_stat.abs())); println!("P-value: {}", p_value); } fn t_test(data1: &[f64], data2: &[f64]) -> f64 { let n1 = data1.len() as f64; let n2 = data2.len() as f64; let mean1 = data1.iter().sum::<f64>() / n1; let mean2 = data2.iter().sum::<f64>() / n2; let var1 = data1.iter().map(|x| (x - mean1).powi(2)).sum::<f64>() / (n1 - 1.0); let var2 = data2.iter().map(|x| (x - mean2).powi(2)).sum::<f64>() / (n2 - 1.0); let se = ((var1 / n1) + (var2 / n2)).sqrt(); (mean1 - mean2) / se }
Ensure Dependencies:
- Verify
Cargo.toml
includes:toml[dependencies] statrs = "0.16.0"
- Run
cargo build
.
- Verify
Run the Program:
bashcargo run
Expected Output (approximate):
Mean1: 2.2, Mean2: 2.6 T-statistic: -3.46 P-value: 0.008
A low p-value (< 0.05) suggests the means differ significantly.
Understanding the Results
- Datasets: Two groups with slightly different means (~2.2 vs. ~2.6).
- T-Test: The t-statistic and p-value indicate a significant difference, relevant for ML model evaluation.
- ML Relevance: Hypothesis testing validates model performance (e.g., comparing error rates).
This lab prepares you for statistical ML methods.
Learning from Official Resources
Deepen Rust skills with:
- The Rust Programming Language (The Book): Free at doc.rust-lang.org/book.
- Programming Rust: By Blandy, Orendorff, and Tindall.
Next Steps
Proceed to Core Machine Learning for regression techniques, or revisit Probability.
Further Reading
- An Introduction to Statistical Learning by James et al. (Chapter 5)
- Andrew Ng’s Machine Learning Specialization (Course 1, Week 3)
statrs
Documentation: docs.rs/statrs