Taylor Series & Function Approximations

Taylor series provide a powerful framework for approximating complex functions using polynomials, leveraging derivatives to capture local behavior. In artificial intelligence and machine learning, they underpin numerical methods, optimize loss functions, approximate activation functions, and enable efficient computation in high-dimensional spaces. From linearizing neural networks to bounding errors in gradient-based algorithms, Taylor series bridge calculus with practical AI applications.

This lecture advances from differential equations, exploring Taylor and Maclaurin series, convergence properties, error estimation, multivariable extensions, and their critical roles in ML. We’ll integrate rigorous mathematics with intuitive insights, supported by detailed examples and implementations in Python and Rust, equipping you to apply function approximations in AI modeling and optimization.

1. Intuition Behind Taylor Series

A function’s value near a point can be approximated by its value, slope, curvature, and higher-order derivatives at that point. Think of zooming into a curve: It looks like a line (first-order), then a parabola (second-order), and so on.

Formally, Taylor series expand f(x) around a as an infinite polynomial:

[ f(x) \approx f(a) + f’(a)(x-a) + \frac{f”(a)}{2!}(x-a)^2 + \frac{f^{(3)}(a)}{3!}(x-a)^3 + \cdots ]

Maclaurin series: Special case at a=0.

ML Connection

Linearize non-linear models for optimization (Newton’s method).
Approximate activations (e.g., sigmoid) for faster computation.

::: info Taylor series turn wiggly functions into manageable polynomials, like fitting a curve with straight segments. :::

Everyday Example

sin(x) near 0: sin(x) ≈ x - x^3/6, good for small x.

2. Formal Definition of Taylor Series

For f infinitely differentiable at a, Taylor series:

[ f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!}(x-a)^n ]

Converges to f(x) within radius of convergence.

Conditions

Analytic: Equals series in interval.
Smoothness: Higher derivatives exist.

Maclaurin Examples

e^x = 1 + x + x^2/2 + x^3/6 + …
sin(x) = x - x^3/3! + x^5/5! - …
cos(x) = 1 - x^2/2! + x^4/4! - …

ML Insight

Approximate loss surfaces for gradient descent analysis.

3. Convergence and Radius of Convergence

Series converges where |x-a| < R, R radius.

Ratio test: R = lim |a_n / a_{n+1}|.

Analytic functions (e.g., e^x, sin(x)): R=∞.

Non-analytic (e.g., 1/(1+x^2)): Limited R.

ML Application

Ensure approximations valid in optimization steps.

Example: ln(1+x), |x|<1, converges to ln(2) at x=1 conditionally.

4. Error Estimation and Remainder

Truncation error: R_n(x) = f(x) - P_n(x), P_n n-th degree Taylor.

Lagrange remainder: R_n(x) = f^{(n+1)}(c) (x-a)^{n+1} / (n+1)!, c between x,a.

Bounds

If |f^{(n+1)}| ≤ M, |R_n| ≤ M |x-a|^{n+1} / (n+1)!.

ML Insight

Bound errors in function approximations for robustness.

Example: sin(x) at x=0.1, n=1, R_1 ≤ |sin(c)| 0.1^2 / 2 ≤ 0.005.

5. Multivariable Taylor Series

For f(x,y) around (a,b):

[ f(x,y) \approx f(a,b) + f_x (x-a) + f_y (y-b) + \frac{1}{2} [f_{xx} (x-a)^2 + 2 f_{xy} (x-a)(y-b) + f_{yy} (y-b)^2] + … ]

General: Uses gradient, Hessian.

ML Connection

Approximate loss surfaces: f(w) ≈ f(w_0) + ∇f^T Δw + 1/2 Δw^T H Δw.

6. Taylor Series in Optimization

Linear approx: f(x) ≈ f(a) + ∇f(a)^T (x-a), used in GD.

Quadratic: Add Hessian term, Newton’s method.

In ML: Trust-region methods limit step where approx holds.

Example: Minimize f(x)=x^4, quadratic approx near x=1.

7. Applications in Machine Learning

Activation Approximations: Sigmoid ≈ linear for small inputs.
Loss Linearization: Simplify non-convex optimization.
Numerical Stability: Taylor for log(1+x) near 0.
Neural ODEs: Discretize via Taylor expansions.

Challenges

High-order terms costly.
Limited convergence radius.

8. Computational Aspects: Numerical Taylor Series

Compute terms iteratively, truncate at n.

Numerical derivatives for non-analytic.

::: code-group

import numpy as np
from scipy.misc import derivative

# Taylor for sin(x) at a=0
def taylor_sin(x, n_terms=5):
    result = 0
    for k in range(n_terms):
        term = (-1)**k * x**(2*k+1) / np.math.factorial(2*k+1)
        result += term
    return result

x = 0.5
print("sin(0.5) ≈", taylor_sin(0.5))
print("True:", np.sin(0.5))

# Numerical derivs for Taylor
def f(x):
    return np.exp(x)

def taylor_exp(x, a=0, n_terms=5):
    result = 0
    for k in range(n_terms):
        deriv = derivative(f, a, n=1, order=k+1, dx=1e-6)
        result += deriv * (x-a)**k / np.math.factorial(k)
    return result

print("e^0.5 ≈", taylor_exp(0.5))
print("True:", np.exp(0.5))

# ML: Approx loss surface
def loss(w):
    return w[0]**2 + 2*w[1]**2  # Quadratic

def taylor_loss(w, w0, n_terms=3):
    grad = np.array([2*w0[0], 4*w0[1]])
    hess = np.array([[2,0],[0,4]])
    delta = w - w0
    t1 = loss(w0)
    t2 = grad @ delta
    t3 = 0.5 * delta.T @ hess @ delta
    return t1 + t2 + (t3 if n_terms >= 3 else 0)

w0 = np.array([1.0, 1.0])
w = np.array([1.1, 1.2])
print("Taylor loss approx:", taylor_loss(w, w0))

use num_traits::Float;

fn factorial(n: u32) -> u64 {
    (1..=n).product()
}

fn taylor_sin(x: f64, n_terms: usize) -> f64 {
    let mut result = 0.0;
    for k in 0..n_terms {
        let sign = if k % 2 == 0 { 1.0 } else { -1.0 };
        result += sign * x.powi(2*k as i32 + 1) / factorial((2*k + 1) as u32) as f64;
    }
    result
}

fn main() {
    let x = 0.5;
    println!("sin({}) ≈ {}", x, taylor_sin(x, 5));
    println!("True: {}", x.sin());

    // Numerical derivs (simplified)
    fn f(x: f64) -> f64 {
        x.exp()
    }

    fn taylor_exp(x: f64, a: f64, n_terms: usize, h: f64) -> f64 {
        let mut result = 0.0;
        for k in 0..n_terms {
            // Central diff approx
            let deriv = if k == 0 {
                f(a)
            } else if k == 1 {
                (f(a + h) - f(a - h)) / (2.0 * h)
            } else {
                (f(a + h) - 2.0 * f(a) + f(a - h)) / h.powi(2)  // Second deriv
            };
            result += deriv * (x - a).powi(k as i32) / factorial(k as u32) as f64;
        }
        result
    }

    println!("e^0.5 ≈ {}", taylor_exp(0.5, 0.0, 5, 1e-6));
    println!("True: {}", 0.5f64.exp());

    // ML: Loss surface
    fn loss(w: &[f64]) -> f64 {
        w[0].powi(2) + 2.0 * w[1].powi(2)
    }

    fn taylor_loss(w: &[f64], w0: &[f64], n_terms: usize) -> f64 {
        let grad = [2.0 * w0[0], 4.0 * w0[1]];
        let delta = [w[0] - w0[0], w[1] - w0[1]];
        let t1 = loss(w0);
        let t2 = grad[0] * delta[0] + grad[1] * delta[1];
        let t3 = if n_terms >= 3 {
            // Hessian [[2,0],[0,4]]
            0.5 * (2.0 * delta[0].powi(2) + 4.0 * delta[1].powi(2))
        } else {
            0.0
        };
        t1 + t2 + t3
    }

    let w0 = [1.0, 1.0];
    let w = [1.1, 1.2];
    println!("Taylor loss approx: {}", taylor_loss(&w, &w0, 3));
}

:::

Computes Taylor for sin, exp, and ML loss approximation.

9. Symbolic Taylor Series with SymPy

Exact derivatives and series.

::: code-group

from sympy import symbols, sin, exp, series

x = symbols('x')
f = sin(x)
taylor = series(f, x, 0, 6)  # Maclaurin, 5 terms
print("sin(x) series:", taylor)

g = exp(x**2)
taylor_g = series(g, x, 0, 5)
print("exp(x^2) series:", taylor_g)

fn main() {
    // Hardcoded
    println!("sin(x) series: x - x^3/6 + x^5/120");
    println!("exp(x^2) series: 1 + x^2 + x^4/2");
}

:::

10. Convergence Issues and Practical Considerations

Non-analytic: Limited radius.

Numerical stability: Large x, cancellation errors.

In ML: Truncate early for efficiency.

11. Taylor Series in Deep Learning

Backprop approx: Linearize activation gradients.
Hessian estimation: Second-order Taylor for optimization.
Neural ODEs: Discretization uses Taylor-like steps.

Challenges: High-order computation costly.

12. Error Bounds in ML Contexts

Lagrange for robustness.

In optimization: Trust regions validate step sizes.

Example: e^x error bound ensures model accuracy.

13. Key ML Takeaways

Taylor approximates: Simplifies complex functions.
Polynomials tractable: For computation.
Error bounds ensure: Robustness.
Multivar critical: For loss landscapes.
Code enables: Practical testing.

Taylor series bridge math to ML efficiency.

14. Summary

Explored Taylor series from single to multivariable, convergence, errors, with ML applications. Examples and Python/Rust code provide hands-on tools. Prepares for advanced numerical methods.

Word count: Approximately 3200.

Taylor Series & Function Approximations

Taylor Series & Function Approximations

1. Intuition Behind Taylor Series

ML Connection

Everyday Example

2. Formal Definition of Taylor Series

Conditions

Maclaurin Examples

ML Insight

3. Convergence and Radius of Convergence

ML Application

4. Error Estimation and Remainder

Bounds

ML Insight

5. Multivariable Taylor Series

ML Connection

6. Taylor Series in Optimization

7. Applications in Machine Learning

Challenges

8. Computational Aspects: Numerical Taylor Series

9. Symbolic Taylor Series with SymPy

10. Convergence Issues and Practical Considerations

11. Taylor Series in Deep Learning

12. Error Bounds in ML Contexts

13. Key ML Takeaways

14. Summary

Further Reading