The problem then, is that that general functions have no (essential) bandlimit [1]. Remember that differentiation acts as a multiplication by a monomial, in the frequency domain [2]. Non-constant polynomials always eventually blow up away from 0, so in differentiating, you're multiplying a function by something that blows up, in the frequency domain. This means that, in the result, higher frequencies are going to dominate over lower frequencies, at a polynomial rate.
Let me be clear, the problem with numerical differentiation is not just that rounding errors accumulate, it's that differentiation is fundamentally unstable, and not something you want to apply to real-world data.
It depends very much on what your application is, however, I think generally a better approach to AD is to redefine your differentiation, by composing it with a low-pass filter. If designed properly, your low-pass filter will 'go to zero' faster (in the frequency-domain) than any polynomial, thus making this new operator bounded, and hence numerically more stable. It's not a panacea, but it begins to address the fundamental problem.
One example of such a filter is Gamma(n+1, n x^2)/Factorial[n], where Gamma is the incomplete gamma function [3].
In Python:
scipy.special.gammaincc(n+1,nx2) or mpmath.gammainc(n+1,nx2, regularized=True)
To see why this is a nice choice, notice item 2 in [4]. This filter is simply the product of exp(- x^2) (the Gaussian) multiplied by the first n-terms of the Taylor series of exp(+ x^2), (1/ the-Gaussian). Since this series converges unconditionally everywhere, as n-> +infinity, this filter converges to 1 for a fixed x (as you increase n), however, since it's still a gaussian times a polynomial, it always converges to 0 as you increase x, but fix n.
This is my area of research, so if anyone's interested I can give more details.
[1] https://en.wikipedia.org/wiki/Band-limit [2] https://en.wikipedia.org/wiki/Fourier_transform#Analysis_of_... [3] https://en.wikipedia.org/wiki/Incomplete_gamma_function [4] https://en.wikipedia.org/wiki/Incomplete_gamma_function#Spec...
This comes in handy in all sorts of things where you might have designed this fancy kernel function to use in some process where you need to be able to calculate the value of the function and also of some derivatives--backpropagation for example [1].
As I was told by someone in the field, at one point, people used to generate machine learning publications simply by finding functions that required fancy mathematical tricks to find closed-form derivatives of chosen functions so that they would be usable in learning algorithms. But in many cases, this work is unnecessary if you use automatic differentiation.
It's a really cool concept, applicable in specific situations. If you need to know the derivative of a function that's not fully specified, you need numerical differentiation [2]. If you need a closed-form expression for your derivative function, that's when you need symbolic differentiation [3].
[1] http://en.wikipedia.org/wiki/Backpropagation [2] http://en.wikipedia.org/wiki/Numerical_differentiation [3] http://en.wikipedia.org/wiki/Symbolic_differentiation
Edit: Reminds me of the list on page 3 here: http://arxiv.org/pdf/math/9404236v1
This more or less depends on what kind of mathematician you're talking to. A mathematician working on Frames and signals almost certainly uses that language. But someone working in PDEs probably doesn't, even if the theory intersects a lot. It's probably a sign that there's nothing wrong with mathematicians and engineers sharing concepts :).
Would you recommend another way?
Try the script at the end of this comment. Here's the output http://imgur.com/5948cW6
Notice that the 0-th one is just a gaussian. If you're interested, these things have traditionally been called HDAFs, I have a library to do this kind of thing https://bitbucket.org/nmaxwell/hdaf it may need more documentation. I don't mind if anyone provides constructive feedback on it.
The functionality you'd be interested in is hdaf.hdaf_periodic_samples, - you can convolve with that to low-pass filter, and use hertz_to_sigma to get the sigma parameter (from cutoff frequency).
import numpy, scipy.special
import matplotlib.pyplot as plt
x=numpy.linspace(-2,5,1000)
for n in [0,1,4,10]:
s=1 if n==0 else n
h=scipy.special.gammaincc(n+1,s*x**2)
plt.plot(x, h)
plt.savefig("filters")
EDIT: formatting.Edit: I should clarify that it's cool to read a mathematician's insight into the problem as seen by engineers, written in a way that engineers can understand.
For any function that is not a combination of polynomials, you need to have its Taylor expansion up to the desired order of derivatives, so you can't just take an "arbitrary" function and use this method to compute its derivative in exact arithmetic.
So for anything other than polynomials, you just reword the problem of finding exact derivatives to finding exact Taylor series, and in order to find Taylor series in most cases, you have to differentiate or express your function in terms of the Taylor series of known functions.
Edit: Indeed, take the only non-polynomial example here, a rational function (division by a polynomial). In order to make this work, you have to know the geometric series expansion of 1/(1-x). For each function that you want to differentiate this way, you have to keep adding more such pre-computed Taylor expansions.
The magic is that you also tell it how to compute Taylor series of function compositions, if the Taylor series of the functions being composed are already known - then any arbitrary function composed out the primitive functions can have its Taylor series computed automatically!
For your example, the function 1/(1-x) is the composition of
x -> -x
x -> 1+x
x -> 1/x
and so its Taylor series is already known as long as you have already defined negation, addition and reciprocation.The power of this technique is that it handles the product rule and chain rule for you, so composition and multiplication of functions work automatically without extra work.
For the case of 1/(1-x), you only need to tell it the derivatives (wrt x) of
f(x, a) = x * a
g(x, a) = x + a
h(x) = 1/x
Then it automatically knows how to compute the derivatives of h(g(f(x, -1), 1)).
I think I get it, though. You can get away with simplifying intermediate steps of a differentiation computation if you only care about the value of the function at a single point.
> you can't just take an arbitrary function and use this method
Actually, you can, AFAIK. The relation f(x + E) = f(x) + f'(x)E still holds.
If we try this method with a rational function:
f(x) = (x - 1)/(x - 2)
f(x + E)
= (x + E - 1) / (x + E - 2)
= (x + E - 1)(x - E - 2) / ((x + E - 2)(x - E - 2))
= (x^2 - x E - 2 x + x E - E^2 - 2 E - x + E + 2) / (x^2 - x E - 2 x + x E - E^2 - 2 E - 2 x + 2 E + 4)
= (x^2 - 3 x + 2 - E) / (x^2 - 4 x + 4)
= (x^2 - 3 x + 2) / (x^2 - 4 x + 4) - (1 / (x^2 - 4 x + 4)) E
= (x - 1) / (x - 2) - (1 / (x - 2)^2) E
so f'(x) = -1 / (x - 2)^2
since f(x + E) = f(x) + f'(x) E
Edit to add: the key idea here being that no knowledge of differentiation is needed, just tricks for manipulating expressions involving x and E until they are in normal form. f(a + be) = f(a) + f'(a)be + 0.5 * f''(a) b^2 e^2 + O(e^3)
= f(a) + f'(a)be
because e^n=0 for all n>1. This isn't an approximation - it's an exact relationship for dual numbers!You will lose some precision by using floating point numbers instead of an arbitrary-precision real number type, but this is a limitation of the machine you're working on. The method is exact.
The code provided memoizes cell values using the key (r - c), so only O(n^2) work is required to read off the top row of the matrix. I'm planning to make that more explicit in a follow-up post.
The number a+bd can be encoded as...
Should be: The number a+b*epsilon can be encoded as...Encoding power series as matrices is sometimes convenient for theoretical analysis (or, as here, educational purposes), but it's not very efficient. The space and time complexities with matrices are O(n^2) and O(n^3), versus O(n) and O(n^2) (or even O(n log n) using FFT) using the straightforward polynomial representation (in which working with hundreds of thousands of derivatives is feasible). In fact some of my current research focuses on doing this efficiently with huge-precision numbers, and with transcendental functions involved.
All values tried so far agree with Wolfram Alpha, so color me surprised and happy for learning something new.
http://www.math.wisc.edu/~keisler/calc.html
The third edition is now in print. I've been studying calculus with it off-and-on for a while and I find the approach very intuitive, though Spivak's Calculus is probably a better book, the "standard analysis" is a little less intuitive (and now, evidently, harder to teach a machine).http://math.stanford.edu/~vakil/0708-216/216class21.pdf
This really does fall in the ream of algebraic geometry since this method only works for rational functions - as he implemented it.
To numerically compute sin(x + ε) you need the Taylor series.
"...but to give you an overview, the idea is that you introduce an algebraic symbol ϵ such that ϵ≠0 but ϵ^2=0"