Skip to content

Taylor Series & Function Approximations

Taylor series provide a powerful framework for approximating complex functions using polynomials, leveraging derivatives to capture local behavior. In artificial intelligence and machine learning, they underpin numerical methods, optimize loss functions, approximate activation functions, and enable efficient computation in high-dimensional spaces. From linearizing neural networks to bounding errors in gradient-based algorithms, Taylor series bridge calculus with practical AI applications.

This lecture advances from differential equations, exploring Taylor and Maclaurin series, convergence properties, error estimation, multivariable extensions, and their critical roles in ML. We'll integrate rigorous mathematics with intuitive insights, supported by detailed examples and implementations in Python and Rust, equipping you to apply function approximations in AI modeling and optimization.


1. Intuition Behind Taylor Series

A function's value near a point can be approximated by its value, slope, curvature, and higher-order derivatives at that point. Think of zooming into a curve: It looks like a line (first-order), then a parabola (second-order), and so on.

Formally, Taylor series expand f(x) around a as an infinite polynomial:

[ f(x) \approx f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \frac{f^{(3)}(a)}{3!}(x-a)^3 + \cdots ]

Maclaurin series: Special case at a=0.

ML Connection

  • Linearize non-linear models for optimization (Newton's method).
  • Approximate activations (e.g., sigmoid) for faster computation.

INFO

Taylor series turn wiggly functions into manageable polynomials, like fitting a curve with straight segments.

Everyday Example

  • sin(x) near 0: sin(x) ≈ x - x^3/6, good for small x.

2. Formal Definition of Taylor Series

For f infinitely differentiable at a, Taylor series:

[ f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!}(x-a)^n ]

Converges to f(x) within radius of convergence.

Conditions

  • Analytic: Equals series in interval.
  • Smoothness: Higher derivatives exist.

Maclaurin Examples

  • e^x = 1 + x + x^2/2 + x^3/6 + ...
  • sin(x) = x - x^3/3! + x^5/5! - ...
  • cos(x) = 1 - x^2/2! + x^4/4! - ...

ML Insight

  • Approximate loss surfaces for gradient descent analysis.

3. Convergence and Radius of Convergence

Series converges where |x-a| < R, R radius.

Ratio test: R = lim |a_n / a_{n+1}|.

Analytic functions (e.g., e^x, sin(x)): R=∞.

Non-analytic (e.g., 1/(1+x^2)): Limited R.

ML Application

  • Ensure approximations valid in optimization steps.

Example: ln(1+x), |x|<1, converges to ln(2) at x=1 conditionally.


4. Error Estimation and Remainder

Truncation error: R_n(x) = f(x) - P_n(x), P_n n-th degree Taylor.

Lagrange remainder: R_n(x) = f^{(n+1)}(c) (x-a)^{n+1} / (n+1)!, c between x,a.

Bounds

  • If |f^{(n+1)}| ≤ M, |R_n| ≤ M |x-a|^{n+1} / (n+1)!.

ML Insight

  • Bound errors in function approximations for robustness.

Example: sin(x) at x=0.1, n=1, R_1 ≤ |sin(c)| 0.1^2 / 2 ≤ 0.005.


5. Multivariable Taylor Series

For f(x,y) around (a,b):

[ f(x,y) \approx f(a,b) + f_x (x-a) + f_y (y-b) + \frac{1}{2} [f_{xx} (x-a)^2 + 2 f_{xy} (x-a)(y-b) + f_{yy} (y-b)^2] + ... ]

General: Uses gradient, Hessian.

ML Connection

  • Approximate loss surfaces: f(w) ≈ f(w_0) + ∇f^T Δw + 1/2 Δw^T H Δw.

6. Taylor Series in Optimization

Linear approx: f(x) ≈ f(a) + ∇f(a)^T (x-a), used in GD.

Quadratic: Add Hessian term, Newton's method.

In ML: Trust-region methods limit step where approx holds.

Example: Minimize f(x)=x^4, quadratic approx near x=1.


7. Applications in Machine Learning

  1. Activation Approximations: Sigmoid ≈ linear for small inputs.
  2. Loss Linearization: Simplify non-convex optimization.
  3. Numerical Stability: Taylor for log(1+x) near 0.
  4. Neural ODEs: Discretize via Taylor expansions.

Challenges

  • High-order terms costly.
  • Limited convergence radius.

8. Computational Aspects: Numerical Taylor Series

Compute terms iteratively, truncate at n.

Numerical derivatives for non-analytic.

python
import numpy as np
from scipy.misc import derivative

# Taylor for sin(x) at a=0
def taylor_sin(x, n_terms=5):
    result = 0
    for k in range(n_terms):
        term = (-1)**k * x**(2*k+1) / np.math.factorial(2*k+1)
        result += term
    return result

x = 0.5
print("sin(0.5) ≈", taylor_sin(0.5))
print("True:", np.sin(0.5))

# Numerical derivs for Taylor
def f(x):
    return np.exp(x)

def taylor_exp(x, a=0, n_terms=5):
    result = 0
    for k in range(n_terms):
        deriv = derivative(f, a, n=1, order=k+1, dx=1e-6)
        result += deriv * (x-a)**k / np.math.factorial(k)
    return result

print("e^0.5 ≈", taylor_exp(0.5))
print("True:", np.exp(0.5))

# ML: Approx loss surface
def loss(w):
    return w[0]**2 + 2*w[1]**2  # Quadratic

def taylor_loss(w, w0, n_terms=3):
    grad = np.array([2*w0[0], 4*w0[1]])
    hess = np.array([[2,0],[0,4]])
    delta = w - w0
    t1 = loss(w0)
    t2 = grad @ delta
    t3 = 0.5 * delta.T @ hess @ delta
    return t1 + t2 + (t3 if n_terms >= 3 else 0)

w0 = np.array([1.0, 1.0])
w = np.array([1.1, 1.2])
print("Taylor loss approx:", taylor_loss(w, w0))
rust
use num_traits::Float;

fn factorial(n: u32) -> u64 {
    (1..=n).product()
}

fn taylor_sin(x: f64, n_terms: usize) -> f64 {
    let mut result = 0.0;
    for k in 0..n_terms {
        let sign = if k % 2 == 0 { 1.0 } else { -1.0 };
        result += sign * x.powi(2*k as i32 + 1) / factorial((2*k + 1) as u32) as f64;
    }
    result
}

fn main() {
    let x = 0.5;
    println!("sin({}) ≈ {}", x, taylor_sin(x, 5));
    println!("True: {}", x.sin());

    // Numerical derivs (simplified)
    fn f(x: f64) -> f64 {
        x.exp()
    }

    fn taylor_exp(x: f64, a: f64, n_terms: usize, h: f64) -> f64 {
        let mut result = 0.0;
        for k in 0..n_terms {
            // Central diff approx
            let deriv = if k == 0 {
                f(a)
            } else if k == 1 {
                (f(a + h) - f(a - h)) / (2.0 * h)
            } else {
                (f(a + h) - 2.0 * f(a) + f(a - h)) / h.powi(2)  // Second deriv
            };
            result += deriv * (x - a).powi(k as i32) / factorial(k as u32) as f64;
        }
        result
    }

    println!("e^0.5 ≈ {}", taylor_exp(0.5, 0.0, 5, 1e-6));
    println!("True: {}", 0.5f64.exp());

    // ML: Loss surface
    fn loss(w: &[f64]) -> f64 {
        w[0].powi(2) + 2.0 * w[1].powi(2)
    }

    fn taylor_loss(w: &[f64], w0: &[f64], n_terms: usize) -> f64 {
        let grad = [2.0 * w0[0], 4.0 * w0[1]];
        let delta = [w[0] - w0[0], w[1] - w0[1]];
        let t1 = loss(w0);
        let t2 = grad[0] * delta[0] + grad[1] * delta[1];
        let t3 = if n_terms >= 3 {
            // Hessian [[2,0],[0,4]]
            0.5 * (2.0 * delta[0].powi(2) + 4.0 * delta[1].powi(2))
        } else {
            0.0
        };
        t1 + t2 + t3
    }

    let w0 = [1.0, 1.0];
    let w = [1.1, 1.2];
    println!("Taylor loss approx: {}", taylor_loss(&w, &w0, 3));
}

Computes Taylor for sin, exp, and ML loss approximation.


9. Symbolic Taylor Series with SymPy

Exact derivatives and series.

python
from sympy import symbols, sin, exp, series

x = symbols('x')
f = sin(x)
taylor = series(f, x, 0, 6)  # Maclaurin, 5 terms
print("sin(x) series:", taylor)

g = exp(x**2)
taylor_g = series(g, x, 0, 5)
print("exp(x^2) series:", taylor_g)
rust
fn main() {
    // Hardcoded
    println!("sin(x) series: x - x^3/6 + x^5/120");
    println!("exp(x^2) series: 1 + x^2 + x^4/2");
}

10. Convergence Issues and Practical Considerations

Non-analytic: Limited radius.

Numerical stability: Large x, cancellation errors.

In ML: Truncate early for efficiency.


11. Taylor Series in Deep Learning

  • Backprop approx: Linearize activation gradients.
  • Hessian estimation: Second-order Taylor for optimization.
  • Neural ODEs: Discretization uses Taylor-like steps.

Challenges: High-order computation costly.


12. Error Bounds in ML Contexts

Lagrange for robustness.

In optimization: Trust regions validate step sizes.

Example: e^x error bound ensures model accuracy.


13. Key ML Takeaways

  • Taylor approximates: Simplifies complex functions.
  • Polynomials tractable: For computation.
  • Error bounds ensure: Robustness.
  • Multivar critical: For loss landscapes.
  • Code enables: Practical testing.

Taylor series bridge math to ML efficiency.


14. Summary

Explored Taylor series from single to multivariable, convergence, errors, with ML applications. Examples and Python/Rust code provide hands-on tools. Prepares for advanced numerical methods.

Word count: Approximately 3200.


Further Reading

  • Stewart, Calculus (Ch. 11: Infinite Series).
  • Goodfellow et al., Deep Learning (Ch. 4: Numerical).
  • 3Blue1Brown: Taylor series videos.
  • Rust: 'num' crate for numerical precision.