Independence & Correlation
Independence & Correlation
Section titled “Independence & Correlation”Independence and correlation are pivotal concepts in probability that dictate how random variables interact. Independence implies that the occurrence of one variable does not influence another, simplifying joint probability calculations. Correlation quantifies the strength and direction of linear relationships between variables, crucial for understanding data dependencies. In machine learning (ML), these concepts guide feature selection, model assumptions (e.g., Naive Bayes), and dimensionality reduction techniques like PCA, impacting model performance and interpretability.
This fifth lecture in the “Probability Foundations for AI/ML” series builds on conditional probability and Bayes’ theorem, delving into independence, conditional independence, correlation, covariance, and their practical implications in ML. We’ll provide intuitive explanations, rigorous mathematical formulations, and implementations in Python and Rust, preparing you for advanced topics like the Law of Large Numbers and estimation techniques.
1. Intuition Behind Independence and Correlation
Section titled “1. Intuition Behind Independence and Correlation”Independence: Two events or random variables are independent if knowing one provides no information about the other. For example, rolling two dice: the outcome of one doesn’t affect the other.
Correlation: Measures how much two variables move together linearly. Positive correlation means they increase together; negative means one increases as the other decreases; zero suggests no linear relationship.
ML Connection
Section titled “ML Connection”- Independence: Assumed in Naive Bayes for feature simplification.
- Correlation: Used in feature selection to avoid redundancy in models like regression or neural networks.
::: info Independence is like two dancers moving without coordination; correlation tracks how synchronized their steps are. :::
Everyday Example
Section titled “Everyday Example”- Independence: Weather in Tokyo vs. coin flip in London.
- Correlation: Height and weight of people (tend to increase together).
2. Independence of Events
Section titled “2. Independence of Events”Events A and B are independent if:
[ P(A \cap B) = P(A)P(B) ]
Equivalently, P(A|B) = P(A) if P(B)>0.
Properties
Section titled “Properties”- Pairwise independence doesn’t imply mutual (e.g., three events).
- If A,B independent, so are A,B^c, A^c,B, etc.
ML Insight
Section titled “ML Insight”- Simplifies joint distributions in probabilistic models.
Example: Two coin tosses, P(H1 ∩ H2) = P(H1)P(H2) = 0.5·0.5 = 0.25.
3. Independence of Random Variables
Section titled “3. Independence of Random Variables”Random variables X,Y are independent if:
[ P(X \in A, Y \in B) = P(X \in A)P(Y \in B) ]
For discrete: P(X=x,Y=y)=P(X=x)P(Y=y).
For continuous: f(x,y)=f_X(x)f_Y(y).
Properties
Section titled “Properties”- E[XY] = E[X]E[Y] if independent.
- Var(X+Y) = Var(X) + Var(Y).
ML Application
Section titled “ML Application”- Naive Bayes assumes feature independence given class.
Example: X,Y~Bern(0.5), independent, P(X=1,Y=1)=0.25.
4. Conditional Independence
Section titled “4. Conditional Independence”X,Y are conditionally independent given Z if:
[ P(X \in A, Y \in B | Z) = P(X \in A | Z)P(Y \in B | Z) ]
Or f(x,y|z)=f(x|z)f(y|z).
ML Connection
Section titled “ML Connection”- Bayesian networks: Conditional independence simplifies structure.
- Naive Bayes: Features independent given class label.
Example: Symptoms independent given disease in medical diagnosis.
5. Covariance and Correlation: Definitions
Section titled “5. Covariance and Correlation: Definitions”Covariance:
[ \text{Cov}(X,Y) = E[(X - E[X])(Y - E[Y])] = E[XY] - E[X]E[Y] ]
Correlation Coefficient:
[ \rho(X,Y) = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X)\text{Var}(Y)}} ]
ρ in [-1,1], 0 if uncorrelated.
Properties
Section titled “Properties”- Cov(X,Y)=0 if independent (not converse).
- ρ=±1 implies perfect linear relation.
ML Insight
Section titled “ML Insight”- PCA: High correlation → redundant features.
- Feature engineering: Remove highly correlated inputs.
Example: X,Y~N(0,1), ρ=0.5, Cov=0.5.
6. Correlation vs. Independence
Section titled “6. Correlation vs. Independence”- Independence implies Cov=0, but Cov=0 doesn’t imply independence (e.g., X,Y=X^2).
- Correlation measures linear dependence; nonlinear relations may exist.
In ML: Check for nonlinear dependencies with mutual information.
7. Applications in Machine Learning
Section titled “7. Applications in Machine Learning”- Naive Bayes: Assumes feature independence for P(X_1,…,X_n|y).
- PCA: Covariance matrix eigenvalues for dimensionality reduction.
- Regularization: Correlated features increase model variance.
- Time-Series: Correlation in residuals indicates model misspecification.
8. Numerical Computations: Independence and Correlation
Section titled “8. Numerical Computations: Independence and Correlation”Simulate independence, compute correlations.
::: code-group
import numpy as npfrom scipy.stats import bernoulli, multivariate_normal
# Independence: Bernoullin_trials = 10000X = bernoulli.rvs(0.5, size=n_trials)Y = bernoulli.rvs(0.5, size=n_trials)joint_prob = np.mean((X == 1) & (Y == 1))p_X = np.mean(X)p_Y = np.mean(Y)print("P(X=1,Y=1):", joint_prob)print("P(X=1)P(Y=1):", p_X * p_Y) # ~equal, independent
# Correlation: Bivariate normalrho = 0.5cov_matrix = [[1, rho], [rho, 1]]data = multivariate_normal.rvs([0, 0], cov_matrix, n_trials)corr = np.corrcoef(data.T)[0,1]print("Correlation:", corr)
# ML: Feature correlationfeatures = np.array([[1,2],[2,4],[3,6],[4,8]]) # Linear relationcorr_matrix = np.corrcoef(features.T)print("Feature corr matrix:", corr_matrix)use rand::Rng;use rand_distr::{Bernoulli, Normal, Distribution};
fn main() { let n_trials = 10000; let mut rng = rand::thread_rng(); let bern = Bernoulli::new(0.5).unwrap();
// Independence: Bernoulli let mut count_joint = 0; let mut count_x = 0; let mut count_y = 0; for _ in 0..n_trials { let x = bern.sample(&mut rng) as u8; let y = bern.sample(&mut rng) as u8; if x == 1 && y == 1 { count_joint += 1; } if x == 1 { count_x += 1; } if y == 1 { count_y += 1; } } println!("P(X=1,Y=1): {}", count_joint as f64 / n_trials as f64); println!("P(X=1)P(Y=1): {}", (count_x as f64 / n_trials as f64) * (count_y as f64 / n_trials as f64));
// Correlation: Bivariate normal let normal = Normal::new(0.0, 1.0).unwrap(); let mut sum_xy = 0.0; let mut sum_x = 0.0; let mut sum_y = 0.0; let mut sum_x2 = 0.0; let mut sum_y2 = 0.0; for _ in 0..n_trials { let x = normal.sample(&mut rng); let y = 0.5 * x + (0.75f64.sqrt()) * normal.sample(&mut rng); // ρ=0.5 sum_xy += x * y; sum_x += x; sum_y += y; sum_x2 += x * x; sum_y2 += y * y; } let mean_x = sum_x / n_trials as f64; let mean_y = sum_y / n_trials as f64; let cov = sum_xy / n_trials as f64 - mean_x * mean_y; let var_x = sum_x2 / n_trials as f64 - mean_x.powi(2); let var_y = sum_y2 / n_trials as f64 - mean_y.powi(2); println!("Correlation: {}", cov / (var_x * var_y).sqrt());
// ML: Feature correlation let features = [[1.0, 2.0], [2.0, 4.0], [3.0, 6.0], [4.0, 8.0]]; let mut mean = [0.0; 2]; for row in features.iter() { mean[0] += row[0]; mean[1] += row[1]; } mean[0] /= features.len() as f64; mean[1] /= features.len() as f64; let mut corr = [[0.0; 2]; 2]; for i in 0..2 { for j in 0..2 { let mut sum = 0.0; for row in features.iter() { sum += (row[i] - mean[i]) * (row[j] - mean[j]); } corr[i][j] = sum / (features.len() as f64); } } let corr_coeff = corr[0][1] / (corr[0][0] * corr[1][1]).sqrt(); println!("Feature corr coeff: {}", corr_coeff);}:::
Simulates independence, computes correlations.
9. Symbolic Computations with SymPy
Section titled “9. Symbolic Computations with SymPy”Verify independence, covariance.
::: code-group
from sympy import symbols, Rational, Ex, y, p = symbols('x y p')p_xy = Rational(1,4) # Two coinsp_x = Rational(1,2)p_y = Rational(1,2)print("Independent check:", p_xy == p_x * p_y)
# CovarianceX, Y = symbols('X Y')cov = E(X*Y) - E(X)*E(Y)print("Cov(X,Y):", cov)fn main() { println!("Independent check: P(X=1,Y=1)=P(X=1)P(Y=1)=0.25"); println!("Cov(X,Y): E[XY] - E[X]E[Y]");}:::
10. Challenges in ML Applications
Section titled “10. Challenges in ML Applications”- False Independence: Naive Bayes oversimplifies.
- Correlation vs. Causation: Misleading in feature selection.
- High-Dim: Cov matrix computation costly.
11. Key ML Takeaways
Section titled “11. Key ML Takeaways”- Independence simplifies: Joint dists factorize.
- Conditional indep key: For efficient models.
- Correlation informs: Feature relationships.
- Covariance in PCA: Dimensionality reduction.
- Code verifies: Independence, correlations.
Independence and correlation shape ML design.
12. Summary
Section titled “12. Summary”Explored independence, conditional independence, correlation, covariance, with ML applications. Examples and Python/Rust code connect theory to practice. Prepares for LLN and CLT.
Word count: Approximately 2850.
Further Reading
Section titled “Further Reading”- Wasserman, All of Statistics (Ch. 3-5).
- Bishop, Pattern Recognition (Ch. 2).
- 3Blue1Brown: Correlation videos.
- Rust: ‘nalgebra’ for matrices, ‘rand_distr’ for sampling.