undefined | Better HN

0 pointseli_gottlieb8y ago0 comments

The basic problem with AlphaGo Zero is that the state of a Go game is fully deterministic, fully Markovian, and fully amenable to quick simulation. The player makes a move, and the simulator computes the next game-state in milliseconds from only the current game-state. This is what lets the AlphaGo Zero agent train so quickly on self-play.

If you start requiring high-dimensional empirical data where the generating dynamics aren't Markovian (or aren't neatly predictable with a Markovian simulator, even if God considers them fully determined), you start having to do stuff like full-blown physics simulations while also specifying agent goals in terms of those physical states. Then you've got the machine learning part and the simulation part taking up comparable amounts of compute power, and self-supervised training becomes much more difficult.

0 comments

dsacco8y ago

I agree that partial observation and imperfect information present computational difficulties to generalization. Do you know of any interesting research offhand for reading about optimizations for this problem?

j / k navigate · click thread line to collapse

0 comments

dsacco8y ago

j / k navigate · click thread line to collapse