I love this. The code is simple and documented. However, whenever I’ve tried to understand autograd, I get stuck at dual numbers.
As a programmer, I understand building up a computation graph where each node is some sort of an elementary function which knows how to take its own gradient. So a constant/scalar node has derivative/gradient of zero, x^n has derivative of nx^(n-1), etc. these gradients are passed from the end to the beginning according to the chain rule, etc., etc.
However, autograd is not supposed to be the symbolic differentiation we learned in high school.
This project doesn’t seem to have anything to do with duals...confused!