Correlation & Covariance in Data

Covariance and correlation quantify relationships between variables, revealing how they move together or apart. In machine learning (ML), these measures are essential for feature engineering, identifying redundant variables, and performing dimensionality reduction via techniques like Principal Component Analysis (PCA). Covariance indicates the direction of linear relationships, while correlation normalizes this to a scale, making it easier to interpret. Together, they help ML practitioners understand data dependencies, improve model efficiency, and avoid overfitting.

This third lecture in the “Statistics Foundations for AI/ML” series builds on descriptive statistics and distributions, delving into covariance, correlation, their mathematical properties, and their applications in ML. We’ll provide intuitive explanations, rigorous derivations, and practical implementations in Python and Rust, preparing you for sampling and inference topics in the series.

1. Intuition Behind Covariance and Correlation

Covariance: Measures whether two variables increase or decrease together. Positive covariance means they move in the same direction; negative means opposite.

Correlation: Normalizes covariance to [-1,1], indicating strength and direction of a linear relationship. A correlation of 1 means perfect linear increase; -1, perfect decrease; 0, no linear relationship.

ML Connection

Feature Selection: High correlation suggests redundant features.
PCA: Covariance matrix drives dimensionality reduction.
Model Diagnostics: Correlated residuals indicate model issues.

::: info Covariance is like tracking how two dancers move together; correlation tells you how perfectly synchronized they are. :::

Example

Heights and weights: Positive covariance (taller people tend to weigh more).
Correlation ~0.7, strong but not perfect linear relation.

2. Formal Definition of Covariance

For random variables X, Y with means E[X]=μ_X, E[Y]=μ_Y:

[ \text{Cov}(X,Y) = E[(X - \mu_X)(Y - \mu_Y)] = E[XY] - \mu_X \mu_Y ]

Sample covariance (n samples):

[ s_{xy} = \frac{1}{n-1} \sum (x_i - \bar{x})(y_i - \bar{y}) ]

Properties

Cov(X,X) = Var(X).
Cov(X,Y) = Cov(Y,X) (symmetric).
Cov(aX+b,cY+d) = ac Cov(X,Y).
Cov(X,Y)=0 if X,Y independent (not converse).

ML Insight

Covariance matrix for multivariate data guides PCA.

Example: X=[1,2,3], Y=[2,4,6], \bar{x}=2, \bar{y}=4, s_{xy} = (1·2 + 0·0 + 1·2)/(3-1)=2.

3. Correlation: Normalized Covariance

Pearson Correlation Coefficient:

[ \rho(X,Y) = \frac{\text{Cov}(X,Y)}{\sqrt{\text{Var}(X) \text{Var}(Y)}} ]

Sample correlation:

[ r = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum (x_i - \bar{x})^2 \sum (y_i - \bar{y})^2}} ]

Properties

-1 ≤ ρ ≤ 1.
ρ=±1 implies perfect linear relation.
ρ=0 means no linear relation (nonlinear possible).

ML Application

Feature redundancy: High |r| suggests collinearity.

Example: X,Y perfectly linear, r=1.

4. Covariance Matrix for Multiple Variables

For vector X=(X_1,…,X_p), covariance matrix Σ:

[ \Sigma_{ij} = \text{Cov}(X_i,X_j) ]

Diagonal: Variances. Off-diagonal: Covariances.

Sample covariance matrix:

[ S = \frac{1}{n-1} (X - \bar{X})^T (X - \bar{X}) ]

ML Connection

PCA: Eigen decomposition of Σ.
Gaussian Models: Σ for multivariate normals.

5. Correlation vs. Causation

Correlation ≠ causation. High correlation may reflect confounding.

In ML: Use causal inference for true effects.

Example: Ice cream sales correlate with drownings (summer confounder).

6. Correlation Types Beyond Pearson

Spearman: Rank-based, for monotonic relations.
Kendall: Concordance-based, robust for small samples.

ML Insight

Spearman for non-linear feature relationships.

7. Applications in Machine Learning

Feature Selection: Remove high |r| features to reduce redundancy.
PCA: Cov matrix eigenvalues for dim reduction.
Regularization: Ridge penalizes correlated weights.
Diagnostics: Correlated errors suggest model misspecification.

Challenges

High-dim: Σ estimation noisy.
Nonlinear relations: Pearson misses.

8. Numerical Computations of Covariance and Correlation

Compute sample covariance, correlation, matrix.

::: code-group

import numpy as np
from scipy.stats import spearmanr

# Sample covariance and correlation
X = np.array([1, 2, 3, 4])
Y = np.array([2, 4, 5, 8])
cov = np.cov(X, Y, bias=False)[0,1]
corr = np.corrcoef(X, Y)[0,1]
print("Cov(X,Y):", cov, "Corr(X,Y):", corr)

# Covariance matrix
data = np.array([[1,2], [2,4], [3,5], [4,8]])
cov_matrix = np.cov(data.T, bias=False)
print("Cov matrix:", cov_matrix)

# ML: Feature correlation
features = np.random.multivariate_normal([0,0], [[1,0.5],[0.5,1]], 100)
corr_matrix = np.corrcoef(features.T)
print("Feature corr matrix:", corr_matrix)

# Spearman
spearman_corr, _ = spearmanr(X, Y)
print("Spearman corr:", spearman_corr)

fn cov(x: &[f64], y: &[f64]) -> f64 {
    let n = x.len() as f64;
    let mean_x = x.iter().sum::<f64>() / n;
    let mean_y = y.iter().sum::<f64>() / n;
    x.iter().zip(y.iter()).map(|(&xi, &yi)| (xi - mean_x) * (yi - mean_y)).sum::<f64>() / (n - 1.0)
}

fn corr(x: &[f64], y: &[f64]) -> f64 {
    let cov_xy = cov(x, y);
    let var_x = cov(x, x);
    let var_y = cov(y, y);
    cov_xy / (var_x * var_y).sqrt()
}

fn main() {
    let x = [1.0, 2.0, 3.0, 4.0];
    let y = [2.0, 4.0, 5.0, 8.0];
    println!("Cov(X,Y): {}", cov(&x, &y));
    println!("Corr(X,Y): {}", corr(&x, &y));

    // Cov matrix
    let data = [[1.0, 2.0], [2.0, 4.0], [3.0, 5.0], [4.0, 8.0]];
    let n = data.len() as f64;
    let mean = data.iter().fold([0.0, 0.0], |acc, row| [acc[0] + row[0], acc[1] + row[1]]);
    let mean = [mean[0] / n, mean[1] / n];
    let mut cov_matrix = [[0.0; 2]; 2];
    for i in 0..2 {
        for j in 0..2 {
            cov_matrix[i][j] = data.iter().map(|row| (row[i] - mean[i]) * (row[j] - mean[j])).sum::<f64>() / (n - 1.0);
        }
    }
    println!("Cov matrix: {:?}", cov_matrix);
}

:::

Computes covariance, correlation, and matrices.

9. Symbolic Computations with SymPy

Derive covariance, correlation.

::: code-group

from sympy import symbols, Sum, IndexedBase, Indexed

n = symbols('n', integer=True, positive=True)
x, y = IndexedBase('x'), IndexedBase('y')
mean_x = (1/n) * Sum(x[i], (i, 1, n))
mean_y = (1/n) * Sum(y[i], (i, 1, n))
cov = (1/(n-1)) * Sum((x[i] - mean_x) * (y[i] - mean_y), (i, 1, n))
print("Cov(X,Y):", cov)

var_x = (1/(n-1)) * Sum((x[i] - mean_x)**2, (i, 1, n))
var_y = (1/(n-1)) * Sum((y[i] - mean_y)**2, (i, 1, n))
corr = cov / sqrt(var_x * var_y)
print("Corr(X,Y):", corr)

fn main() {
    println!("Cov(X,Y): (1/(n-1)) sum (x_i - mean_x)(y_i - mean_y)");
    println!("Corr(X,Y): Cov(X,Y) / sqrt(Var(X) Var(Y))");
}

:::

10. Challenges in ML Applications

High-Dimensionality: Cov matrix estimation noisy.
Nonlinear Relationships: Pearson misses; use Spearman.
Causation Misinterpretation: Requires careful analysis.

11. Key ML Takeaways

Covariance tracks co-movement: For multivariate models.
Correlation standardizes: Easy interpretation.
Cov matrix powers PCA: Dim reduction.
Feature selection critical: Avoid redundancy.
Code computes: Practical correlations.

Covariance and correlation shape ML data understanding.

12. Summary

Explored covariance and correlation, their properties, derivations, and ML applications like PCA and feature selection. Examples and Python/Rust code bridge theory to practice. Prepares for sampling and inference.

Word count: Approximately 3000.

Correlation & Covariance in Data

Correlation & Covariance in Data

1. Intuition Behind Covariance and Correlation

ML Connection

Example

2. Formal Definition of Covariance

Properties

ML Insight

3. Correlation: Normalized Covariance

Properties

ML Application

4. Covariance Matrix for Multiple Variables

ML Connection

5. Correlation vs. Causation

6. Correlation Types Beyond Pearson

ML Insight

7. Applications in Machine Learning

Challenges

8. Numerical Computations of Covariance and Correlation

9. Symbolic Computations with SymPy

10. Challenges in ML Applications

11. Key ML Takeaways

12. Summary

Further Reading