undefined | Better HN

0 pointstelotortium2y ago0 comments

It is a computationally clever application of the chain rule to minimize the amount of computation needed to compute gradients for all parameters in the network.

0 pointstelotortium2y ago0 comments

It is a computationally clever application of the chain rule to minimize the amount of computation needed to compute gradients for all parameters in the network.

0 comments

5 comments · 1 top-level

om82y ago· 4 in thread

> to minimize the amount of computation

IMO backprop is the most trivial implementation of differentiation in neural networks. Do you know an easier way to compute gradients with larger overhead? If so, please share it.

QuadmasterXLII2y ago

My first forays into making neural networks used replacement rules to modify an expression tree until all the “D” operators went away, but that takes exponential complexity in network depth if you aren’t careful. Finite differences is linear in number of parameters, as is differentiation by Dual Numbers

eli_gottlieb2y ago

Backprop is the application of dynamic programming to the chain rule for total derivatives, which sounds trivial only in retrospect.

Jensson2y ago

You can do forward propagation. Humans typically finds forward easier than backwards.

tripzilch2y ago

since you asked ... how about Monte Carlo with Gibbs sampling?

j / k navigate · click thread line to collapse