Playing hard exploration games by watching YouTube (opens in new tab)

(arxiv.org)

127 pointsindescions_20188y ago11 comments

11 comments

10 comments · 6 top-level

jonbaer8y ago· 2 in thread

Are audio cues also analyzed here? ie: "We observe that use of the audio signal in CMC results in more emphasis being placed on key items and their location in the inventory"

maffydub8y ago

Yes. s3.2 suggests they use audio cues to help them align the video frames from different videos. (I guess it's easier to correlate audio than video.)

Cthulhu_8y ago

I can imagine video quality on youtube varies more than audio, or that audio is easier to hash / make signatures of.

1 more reply

sleepychu8y ago· 1 in thread

Neat, I don't understand what they mean by having embedded a reward video into the set. Is that a video where copying the behaviour will deliver victory?

algon338y ago

Yes, they take the state of the video every 16 frames and look at its embedding. These were made into checkpoints.

The AI is rewarded if at each checkpoint the state vector its produced is sufficiently aligned with the videos.

I guess that's the initial training to deal with sparse rewards.

navaati8y ago· 1 in thread

This should probably say "ML" or "AI" or whatever, I was slightly disappointed to realize it was not a funny paper about… I don't know to be fair.

zodPod8y ago

I can see where you're coming from, the title definitely made me initially feel like it was going to be about getting satisfaction from watching other people play a game or something like that.

eric_h8y ago

here's video of the agent actually playing (linked in the paper): https://www.youtube.com/watch?v=Msy82sIfprI

jexah8y ago

This is really cool. A step in the right direction towards general learning through observation.

erikb8y ago

This is actually quite human. I also watch Let's plays if I struggle with a quest (or game in general).

Also interesting assumption to say "harder = fewer rewards". Probably doesn't always apply but is a good generalization.

j / k navigate · click thread line to collapse