Skip to content

Taylor Series & Function Approximations

Taylor series provide a powerful framework for approximating complex functions using polynomials, leveraging derivatives to capture local behavior. In artificial intelligence and machine learning, they underpin numerical methods, optimize loss functions, approximate activation functions, and enable efficient computation in high-dimensional spaces. From linearizing neural networks to bounding errors in gradient-based algorithms, Taylor series bridge calculus with practical AI applications.

This lecture advances from differential equations, exploring Taylor and Maclaurin series, convergence properties, error estimation, multivariable extensions, and their critical roles in ML. We’ll integrate rigorous mathematics with intuitive insights, supported by detailed examples and implementations in Python and Rust, equipping you to apply function approximations in AI modeling and optimization.


A function’s value near a point can be approximated by its value, slope, curvature, and higher-order derivatives at that point. Think of zooming into a curve: It looks like a line (first-order), then a parabola (second-order), and so on.

Formally, Taylor series expand f(x) around a as an infinite polynomial:

[ f(x) \approx f(a) + f’(a)(x-a) + \frac{f”(a)}{2!}(x-a)^2 + \frac{f^{(3)}(a)}{3!}(x-a)^3 + \cdots ]

Maclaurin series: Special case at a=0.

  • Linearize non-linear models for optimization (Newton’s method).
  • Approximate activations (e.g., sigmoid) for faster computation.

::: info Taylor series turn wiggly functions into manageable polynomials, like fitting a curve with straight segments. :::

  • sin(x) near 0: sin(x) ≈ x - x^3/6, good for small x.

For f infinitely differentiable at a, Taylor series:

[ f(x) = \sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!}(x-a)^n ]

Converges to f(x) within radius of convergence.

  • Analytic: Equals series in interval.
  • Smoothness: Higher derivatives exist.
  • e^x = 1 + x + x^2/2 + x^3/6 + …
  • sin(x) = x - x^3/3! + x^5/5! - …
  • cos(x) = 1 - x^2/2! + x^4/4! - …
  • Approximate loss surfaces for gradient descent analysis.

Series converges where |x-a| < R, R radius.

Ratio test: R = lim |a_n / a_{n+1}|.

Analytic functions (e.g., e^x, sin(x)): R=∞.

Non-analytic (e.g., 1/(1+x^2)): Limited R.

  • Ensure approximations valid in optimization steps.

Example: ln(1+x), |x|<1, converges to ln(2) at x=1 conditionally.


Truncation error: R_n(x) = f(x) - P_n(x), P_n n-th degree Taylor.

Lagrange remainder: R_n(x) = f^{(n+1)}(c) (x-a)^{n+1} / (n+1)!, c between x,a.

  • If |f^{(n+1)}| ≤ M, |R_n| ≤ M |x-a|^{n+1} / (n+1)!.
  • Bound errors in function approximations for robustness.

Example: sin(x) at x=0.1, n=1, R_1 ≤ |sin(c)| 0.1^2 / 2 ≤ 0.005.


For f(x,y) around (a,b):

[ f(x,y) \approx f(a,b) + f_x (x-a) + f_y (y-b) + \frac{1}{2} [f_{xx} (x-a)^2 + 2 f_{xy} (x-a)(y-b) + f_{yy} (y-b)^2] + … ]

General: Uses gradient, Hessian.

  • Approximate loss surfaces: f(w) ≈ f(w_0) + ∇f^T Δw + 1/2 Δw^T H Δw.

Linear approx: f(x) ≈ f(a) + ∇f(a)^T (x-a), used in GD.

Quadratic: Add Hessian term, Newton’s method.

In ML: Trust-region methods limit step where approx holds.

Example: Minimize f(x)=x^4, quadratic approx near x=1.


  1. Activation Approximations: Sigmoid ≈ linear for small inputs.
  2. Loss Linearization: Simplify non-convex optimization.
  3. Numerical Stability: Taylor for log(1+x) near 0.
  4. Neural ODEs: Discretize via Taylor expansions.
  • High-order terms costly.
  • Limited convergence radius.

8. Computational Aspects: Numerical Taylor Series

Section titled “8. Computational Aspects: Numerical Taylor Series”

Compute terms iteratively, truncate at n.

Numerical derivatives for non-analytic.

::: code-group

import numpy as np
from scipy.misc import derivative
# Taylor for sin(x) at a=0
def taylor_sin(x, n_terms=5):
result = 0
for k in range(n_terms):
term = (-1)**k * x**(2*k+1) / np.math.factorial(2*k+1)
result += term
return result
x = 0.5
print("sin(0.5) ≈", taylor_sin(0.5))
print("True:", np.sin(0.5))
# Numerical derivs for Taylor
def f(x):
return np.exp(x)
def taylor_exp(x, a=0, n_terms=5):
result = 0
for k in range(n_terms):
deriv = derivative(f, a, n=1, order=k+1, dx=1e-6)
result += deriv * (x-a)**k / np.math.factorial(k)
return result
print("e^0.5 ≈", taylor_exp(0.5))
print("True:", np.exp(0.5))
# ML: Approx loss surface
def loss(w):
return w[0]**2 + 2*w[1]**2 # Quadratic
def taylor_loss(w, w0, n_terms=3):
grad = np.array([2*w0[0], 4*w0[1]])
hess = np.array([[2,0],[0,4]])
delta = w - w0
t1 = loss(w0)
t2 = grad @ delta
t3 = 0.5 * delta.T @ hess @ delta
return t1 + t2 + (t3 if n_terms >= 3 else 0)
w0 = np.array([1.0, 1.0])
w = np.array([1.1, 1.2])
print("Taylor loss approx:", taylor_loss(w, w0))
use num_traits::Float;
fn factorial(n: u32) -> u64 {
(1..=n).product()
}
fn taylor_sin(x: f64, n_terms: usize) -> f64 {
let mut result = 0.0;
for k in 0..n_terms {
let sign = if k % 2 == 0 { 1.0 } else { -1.0 };
result += sign * x.powi(2*k as i32 + 1) / factorial((2*k + 1) as u32) as f64;
}
result
}
fn main() {
let x = 0.5;
println!("sin({}) ≈ {}", x, taylor_sin(x, 5));
println!("True: {}", x.sin());
// Numerical derivs (simplified)
fn f(x: f64) -> f64 {
x.exp()
}
fn taylor_exp(x: f64, a: f64, n_terms: usize, h: f64) -> f64 {
let mut result = 0.0;
for k in 0..n_terms {
// Central diff approx
let deriv = if k == 0 {
f(a)
} else if k == 1 {
(f(a + h) - f(a - h)) / (2.0 * h)
} else {
(f(a + h) - 2.0 * f(a) + f(a - h)) / h.powi(2) // Second deriv
};
result += deriv * (x - a).powi(k as i32) / factorial(k as u32) as f64;
}
result
}
println!("e^0.5 ≈ {}", taylor_exp(0.5, 0.0, 5, 1e-6));
println!("True: {}", 0.5f64.exp());
// ML: Loss surface
fn loss(w: &[f64]) -> f64 {
w[0].powi(2) + 2.0 * w[1].powi(2)
}
fn taylor_loss(w: &[f64], w0: &[f64], n_terms: usize) -> f64 {
let grad = [2.0 * w0[0], 4.0 * w0[1]];
let delta = [w[0] - w0[0], w[1] - w0[1]];
let t1 = loss(w0);
let t2 = grad[0] * delta[0] + grad[1] * delta[1];
let t3 = if n_terms >= 3 {
// Hessian [[2,0],[0,4]]
0.5 * (2.0 * delta[0].powi(2) + 4.0 * delta[1].powi(2))
} else {
0.0
};
t1 + t2 + t3
}
let w0 = [1.0, 1.0];
let w = [1.1, 1.2];
println!("Taylor loss approx: {}", taylor_loss(&w, &w0, 3));
}

:::

Computes Taylor for sin, exp, and ML loss approximation.


Exact derivatives and series.

::: code-group

from sympy import symbols, sin, exp, series
x = symbols('x')
f = sin(x)
taylor = series(f, x, 0, 6) # Maclaurin, 5 terms
print("sin(x) series:", taylor)
g = exp(x**2)
taylor_g = series(g, x, 0, 5)
print("exp(x^2) series:", taylor_g)
fn main() {
// Hardcoded
println!("sin(x) series: x - x^3/6 + x^5/120");
println!("exp(x^2) series: 1 + x^2 + x^4/2");
}

:::


10. Convergence Issues and Practical Considerations

Section titled “10. Convergence Issues and Practical Considerations”

Non-analytic: Limited radius.

Numerical stability: Large x, cancellation errors.

In ML: Truncate early for efficiency.


  • Backprop approx: Linearize activation gradients.
  • Hessian estimation: Second-order Taylor for optimization.
  • Neural ODEs: Discretization uses Taylor-like steps.

Challenges: High-order computation costly.


Lagrange for robustness.

In optimization: Trust regions validate step sizes.

Example: e^x error bound ensures model accuracy.


  • Taylor approximates: Simplifies complex functions.
  • Polynomials tractable: For computation.
  • Error bounds ensure: Robustness.
  • Multivar critical: For loss landscapes.
  • Code enables: Practical testing.

Taylor series bridge math to ML efficiency.


Explored Taylor series from single to multivariable, convergence, errors, with ML applications. Examples and Python/Rust code provide hands-on tools. Prepares for advanced numerical methods.

Word count: Approximately 3200.


  • Stewart, Calculus (Ch. 11: Infinite Series).
  • Goodfellow et al., Deep Learning (Ch. 4: Numerical).
  • 3Blue1Brown: Taylor series videos.
  • Rust: ‘num’ crate for numerical precision.