Convexity and Optimization Landscapes
Convexity is a powerful property in mathematics that ensures optimization problems have unique global minima, making them solvable efficiently. In AI and machine learning, understanding convexity helps explain why some models train reliably (e.g., linear regression) while others face challenges like local minima in deep networks. Optimization landscapes visualize these functions as terrains, where valleys represent minima and paths depict training trajectories.
This lecture extends higher-order derivatives, delving into convex sets and functions, their properties, convex optimization, and the complexities of non-convex landscapes in ML. We'll fuse theory with practice through intuitions, theorems, ML ties, and code in Python and Rust, equipping you to navigate and analyze optimization in AI systems.
1. Intuition for Convexity
A set is convex if the line segment between any two points lies within it—like a ball, not a star.
A function is convex if its graph lies below any chord: For λ in [0,1], f(λx + (1-λ)y) ≤ λ f(x) + (1-λ) f(y).
Intuitively, no "dents"—always bowl-shaped.
ML Connection
- Convex losses (e.g., SVM hinge) guarantee global optima.
- Non-convex (neural nets) risk suboptimal solutions but enable complex modeling.
INFO
Convexity promises "no bad local minima"—every local is global, simplifying search.
Example
- Convex: f(x)=x^2, exponential.
- Non-convex: sin(x), x^4 - x^2 (has local min).
2. Convex Sets: Definitions and Properties
Set C convex if for x,y in C, λx + (1-λ)y in C, λ in [0,1].
Operations preserving: Intersection, affine transforms, Minkowski sum.
Examples
- Half-spaces, balls, polyhedra.
- In ML: Feasible regions in constrained optimization (e.g., simplex for probabilities).
Properties: Extreme points, separation theorems.
3. Convex Functions: Formal Definition and Tests
f: Convex set → R convex if f(λx + (1-λ)y) ≤ λ f(x) + (1-λ) f(y).
Strict: <.
Jensen's inequality: f(E[X]) ≤ E[f(X)] for random X.
Tests:
- First: Epigraph convex.
- Second: f'' ≥0 (twice diff).
- Multivar: Hessian positive semi-definite (PSD).
Examples
- Affine: ax + b.
- Norms, max functions.
- Compositions: If g convex increasing, h convex, g(h(x)) convex.
ML Insight
- Regularizers: L1 (convex, sparse), L2 (convex, smooth).
4. Convex Optimization Problems
Min f(x) s.t. g_i(x) ≤0, h_j(x)=0, x in C.
Convex if f,g_i convex, h_j affine, C convex.
Properties: Local min global, duality (Lagrange), efficient solvers (interior-point).
Algorithms
- Gradient descent converges to global.
- Newton's uses Hessian.
ML Applications
- Logistic regression, SVMs convex.
- Lasso (L1 penalized) convex.
Non-convex: Deep learning—use heuristics.
5. Subgradients and Generalized Convexity
For non-diff convex f, subgradient ∂f(x): Vectors g where f(y) ≥ f(x) + g^T (y-x).
Subdiff calculus: Rules for sums, max.
In ML: For ReLU (non-smooth), subgrads enable optimization.
6. Strongly Convex and Smooth Functions
Strongly convex: f(y) ≥ f(x) + ∇f(x)^T (y-x) + (μ/2) ||y-x||^2, μ>0.
Implies unique min, faster convergence.
Smooth: |∇f(y) - ∇f(x)| ≤ L ||y-x|| (Lipschitz gradient).
In ML: Controls step sizes in GD.
7. Optimization Landscapes: Visualizing Convexity
Landscape: Plot f over domain—hills, valleys.
Convex: Single bowl.
Non-convex: Multiple valleys, plateaus, saddles.
In high-dim: "Curse" but structure helps.
Visualization (Conceptual)
- 1D: x^2 vs x^4 - 2x^2.
- 2D: Paraboloid vs wavy surface.
- ML: Neural loss surfaces—project to 2D/3D.
Tools: Loss contours, gradient flows.
8. Non-Convex Challenges in ML
Local minima, saddles (Hessian mixed signs).
Flat regions slow training.
Solutions: Stochastic GD escapes, momentum, adaptive rates (Adam).
Overparametrization: In DL, many params make landscapes "benign"—local mins near global.
9. Numerical Checks and Visualizations
Check convexity: Hessian eigenvalues ≥0.
Visualize landscapes.
import numpy as np
import matplotlib.pyplot as plt
from scipy.linalg import eigh
def is_convex(H):
eigenvalues = eigh(H)[0]
return np.all(eigenvalues >= 0)
# Example Hessian
H = np.array([[2, 0], [0, 2]]) # Positive definite
print("Is convex:", is_convex(H))
# Visualize landscape
def f(x, y):
return x**2 + y**2 # Convex
x = np.linspace(-5, 5, 100)
y = np.linspace(-5, 5, 100)
X, Y = np.meshgrid(x, y)
Z = f(X, Y)
plt.contourf(X, Y, Z)
plt.colorbar()
plt.title("Convex Landscape")
plt.show() # In practice, save or display
# Non-convex example
def g(x, y):
return np.sin(x) + np.sin(y)
Zg = g(X, Y)
plt.contourf(X, Y, Zg)
plt.colorbar()
plt.title("Non-Convex Landscape")
plt.show()
# ML: Simple loss landscape
def loss(w1, w2, X, y):
w = np.array([w1, w2])
return np.mean((X @ w - y)**2)
X_ml = np.array([[1,1],[1,2],[1,3]])
y_ml = np.array([2,3,4]) # Noisy linear
w1s = np.linspace(-1, 3, 50)
w2s = np.linspace(-1, 3, 50)
W1, W2 = np.meshgrid(w1s, w2s)
Loss = np.vectorize(lambda w1, w2: loss(w1, w2, X_ml, y_ml))(W1, W2)
plt.contourf(W1, W2, Loss)
plt.colorbar()
plt.title("ML Loss Landscape")
plt.show()
use ndarray::{array, Array2};
use plotters::prelude::*;
fn is_convex(h: &Array2<f64>) -> bool {
// Simplified: assume small matrix, check det and trace for 2x2
if h.shape() == &[2, 2] {
let trace = h[[0,0]] + h[[1,1]];
let det = h[[0,0]] * h[[1,1]] - h[[0,1]].powi(2);
trace > 0.0 && det > 0.0
} else {
false // Extend for general
}
}
fn main() -> Result<(), Box<dyn std::error::Error>> {
let h = array![[2.0, 0.0], [0.0, 2.0]];
println!("Is convex: {}", is_convex(&h));
// Visualize convex
let root = BitMapBackend::new("convex.png", (800, 600)).into_drawing_area();
root.fill(&WHITE)?;
let mut chart = ChartBuilder::on(&root)
.caption("Convex Landscape", ("sans-serif", 50))
.build_cartesian_3d(-5.0..5.0, 0.0..25.0, -5.0..5.0)?;
chart.draw_series(
SurfaceSeries::xoz(
(0..100).map(|x| -5.0 + 0.1 * x as f64),
(0..100).map(|z| -5.0 + 0.1 * z as f64),
|x, z| x.powi(2) + z.powi(2),
)
.style(BLUE.filled()),
)?;
// Similar for non-convex and ML loss, but omit for brevity
Ok(())
}
Code checks convexity, plots landscapes. Note: Rust uses plotters; run to generate PNG.
10. Symbolic Convexity Checks
Use SymPy for Hessians.
from sympy import symbols, hessian, Matrix
x, y = symbols('x y')
f = x**2 + y**2
H = hessian(f, (x,y))
print("Hessian:", H)
# Check PSD symbolically or eval
g = (x**2 + y**2 - 1)**2 # Non-convex
Hg = hessian(g, (x,y))
print("Hessian g:", Hg)
// Simulated
fn main() {
// Hessian f: [[2,0],[0,2]] PSD
println!("Hessian: [[2,0],[0,2]]");
}
11. Duality and Convex Programming
Strong duality: Primal min = dual max for convex.
In ML: Dual SVM for kernel trick.
12. Landscapes in Deep Learning
High-dim: Most critical points saddles.
Mode connectivity: Low-loss paths between minima.
Lottery ticket: Subnetworks perform well.
13. Key ML Takeaways
- Convexity guarantees: Efficient global optimization.
- Jensen, subgrads: Tools for analysis.
- Landscapes visualize: Intuit challenges.
- Non-convex strategies: SGD variants.
- Code for exploration: Test, plot functions.
Convexity bridges theory to reliable ML.
14. Summary
Covered convexity from sets to functions, tests, optimization, landscapes, with ML focus. Examples and Python/Rust code enable hands-on. This foundation aids tackling non-convex AI problems.
Next: Stochastic Optimization.
Word count: Approximately 2950.
Further Reading
- Boyd, Vandenberghe, Convex Optimization.
- Goodfellow et al., Deep Learning (Ch. 8: Optimization).
- Li et al., "Visualizing the Loss Landscape of Neural Nets".
- Rust: 'plotters' for viz, 'nalgebra' for linear alg.