We've got a model, implemented in code. Since it's code can be differentiated - not sure how that works with branches, I guess that's the math. :) This is generated through a set of input parameters.
We've got an error function, representing the difference between the model and reality.
If we differentiate the error function, we can choose which set of parameter mutations are heading in the right direction to then generate a new model? We check each close point and find the max benefit?
However, if everything is taking the parameters as input, is the derivative of the error function only generated once?
Is it saying that the derivative of the error function is independent of the parameters, so it doesn't matter what the model is, they all have the same error function, and that error function can be found by generating a single model?