undefined | Better HN

0 pointsjpollock5y ago0 comments

So, we've got "reality", as represented by data.

We've got a model, implemented in code. Since it's code can be differentiated - not sure how that works with branches, I guess that's the math. :) This is generated through a set of input parameters.

We've got an error function, representing the difference between the model and reality.

If we differentiate the error function, we can choose which set of parameter mutations are heading in the right direction to then generate a new model? We check each close point and find the max benefit?

However, if everything is taking the parameters as input, is the derivative of the error function only generated once?

Is it saying that the derivative of the error function is independent of the parameters, so it doesn't matter what the model is, they all have the same error function, and that error function can be found by generating a single model?

0 comments

1 comments · 1 top-level

eigenspace5y ago

Dealing with branches is indeed and interesting problem. Many AD systems can't accomodate branches. I think most of the Julia ones do.

The gist of it is that branches don't actually require that much fanciness, but they can introduce discontinuties, and AD systems will often do things like happily give you a finite derivative right at a discontinuity where a calculus student would normally complain.

j / k navigate · click thread line to collapse