undefined | Better HN

0 pointsGaggiX3y ago0 comments

>There's nothing analogous to a "reward" or "punishment" when neural networks are learning.

Well deep reinforcement learning.

0 comments

1 comments · 1 top-level

Yeah but even in that case, "reward" is just the thing a NN is trying to predict. The NN itself is not receiving the reward (or any punishment). Instead, it's following gradient signals to improve that estimate of reward, which is then used as a proxy for an optimal policy decision.

j / k navigate · click thread line to collapse