undefined | Better HN

0 pointshyperbovine9y ago0 comments

Really naive question, can't they just train the net to react instantaneously on a $d$-delayed screen? I don't see conceptually why this approach would succeed with d=0 but fail for (say) d=25ms. (I am too busy/lazy to read the papers and understand what breaks down.)

0 comments

1 comments · 1 top-level

vladfi19y ago

Basically we tried that and it sort of works, but performance degrades pretty fast with each frame of delay. The issue is likely that it makes credit assignment much harder. Instead of seeing an immediate change in the state (which your critic can interpret as good or bad), you have to wait a bunch of frames during which your previous actions are taking effect and interfering with the reward signal.

j / k navigate · click thread line to collapse