story

Playing Atari with Six Neurons (opens in new tab)

arxiv.org

169 pointstogelius7y ago30 comments

30 comments

As someone working on a reinforcement learning/neuroevolution problem right now, I find this to be extremely exciting. Fewer parameters, ceteris paribus, is always better—the fact that the experiments in this paper were run on one workstation, rather than on a massive farm of TPUs à la AlphaGo, implies quicker development iteration time and more accessibility to the average researcher.

The staging of components in this paper (compressor/controller), where neuroevolution is only applied to a low-dimensional controller, reminds me of Ha and Schmidhuber's recent paper on world models (which is briefly cited) [1]. They employ a variational autoencoder with ~4.4M parameters, an RNN with ~1.7M parameters, and a final controller with just 1,088 parameters! Though it's recently been shown that neuroevolution can scale to millions of parameters [2], the technique of applying evolution to as few parameters as possible and supplementing with either autoencoders or vector quantization seems to be gaining traction. I hope to apply some of the ideas in this paper to multiple co-evolving agents...

[1]. https://worldmodels.github.io

[2]. https://arxiv.org/abs/1712.06567

giuse7y ago

You may be interested in an even older paper: http://www.idsia.ch/~juergen/icdl2011cuccu.pdf

pjrule7y ago

Thanks so much! I read this (and a few related papers) today. Besides the novel algorithm discussed in the new Atari paper, do you have a reference implementation of online vector quantization you might be able to recommend? I think I could probably figure it out from the paper alone, but sometimes it's nice to see code other people have already optimized. :)

giuse7y ago

Uhm unfortunately I do not, I could search for some on Google but I doubt I would fare better than you at it. I went to code my own version, it is quite straightforward. You can find it here: https://github.com/giuse/machine_learning_workbench/blob/mas... although polluted by research's trial and error, you can easily check the minimal code necessary to run. Here's an example of how to use it: https://github.com/giuse/machine_learning_workbench/blob/mas... Let me know if that works for you or if you have further questions!

1 more reply

spewilly7y ago

>I hope to apply some of the ideas in this paper to multiple co-evolving agents...

care to elaborate?

kthejoker27y ago

Cool article, lots to digest, one thing caught my eye:

"To the best of our knowledge, the only prior work using unsupervised learning as a pre-processor for neuroevolution is (cite)."

Just amazing how much low-hanging fruit there still is in the space.

giuse7y ago

Author here. The idea is low-hanging indeed, several friends (including @togelius!) commented "I always wanted to do that -- eventually". Realization is another matter. Have a look at the mess necessary to make it work: we had to discard UL initialization for online learning, accept that the encoding would grow in size, adapt the network sensibly to these changes, and tweak the ES to account for the extra weights.

markatkinson7y ago

I have been wolfing down RL articles, videos and publications after a intro to deep learning via Manning's Deep Learning for some time now and while the overall concept of RL is easy to grasp (agents, actions and state etc) some of the finer details and processes are quite confusing.

I am tempted to blame inconsistency across terminology and implementations for this lack of understanding but I suspect it has more to do with approaching this field through the lens of a developer and not a researcher or academic. Trying to understand the code without grasping the "science" of the mechanisms completely.

Either way if you feel to be in a similar spot check out this resource: https://reinforce.io and their respective Github repo: https://github.com/reinforceio/tensorforce.

Just reading through their code, and documentation has made a lot of the concepts clearer.

And a few more resources I found really helpful: http://karpathy.github.io/2016/05/31/rl/ https://www.analyticsvidhya.com/blog/2017/01/introduction-to... https://www.oreilly.com/ideas/reinforcement-learning-with-te...

Edit: My point that I forgot to mention was that I always feel like I am playing catch-up to understand what is going on half the time as the amount of new content being released exceeds what I can absorb.

kthejoker27y ago

And the Github library:

https://github.com/giuse/DNE/tree/nips2018

giuse7y ago

Ruby should be perfectly legible with a Python background, but for any question just ping me on twitter. I would be happy to build a dialog :)

mkrum7y ago

Just curious, why did you pick ruby over python? Personal familiarity?

giuse7y ago

Sure, familiarity matters. But I believe in the rational reasons that brought me back to this tool over and over, until I built familiarity indeed :) I posted an overbloated discussion on it on Reddit, feel free to read as little as you need ;) https://www.reddit.com/r/MachineLearning/comments/8p1o8d/r_p...

vidarh7y ago

It's particularly interesting that they've chosen to wrap Python using Pycall. I'd love to hear about the tradeoffs of that.

1 more reply

giuse7y ago

Uhm maybe I should have pointed this out earlier but the algorithms implementation can be found (independent of deep neuroevolution) in my Ruby machine learning workbench repo (in turn imported in DNE): https://github.com/giuse/machine_learning_workbench

kabdib7y ago

... and that's three more than the average Atari marketing exec had back then. No wonder they had trouble understanding the game industry :-)

kabdib7y ago

Okay, a true story, my second disappointing interaction with Atari marketing.

One fine day my boss came to me and said that he had an ask from Atari Marketing (in the Home Computer arm of the company).

The marketing drone came to my office (yes, we had offices in those days). "My idea is to pre-copyright all possible 8x8 bitmaps so that people can't use them without our permission. Can you print them out for me so we can submit them to the copyright office?" He actually meant all possible 8x8 bitmaps containing five colors, with colors chosen from an 7 or 8 bit space (I forget which).

I told him the story of the guy who supposedly invented chess, and was offered a choice of reward by his king. The fellow simply asked, "Just give me one grain of rice for the first square, two grains of rice for the second, four for the third, and so on." Most of you know how this ends, it's grade school math.

I explained to the marketing guy that the printout would probably outweigh the planet, maybe the solar system, maybe the galaxy. He went away, a little disgusted with those pesky engineers. (I don't know if he was the same oxygen waster who wanted me to write a 16K cartridge in just a couple of weeks, but he certainly was in the same department).

So I'm still sticking with three brain cells, despite all the downvotes :-)

arijun7y ago

I was curious, so I threw it into Wolfram alpha:

(weight of a sheet of paper) * 5^64

.4 x Milky Way mass

Almost half the galaxy mass, that's a lot of grains of rice!

kabdib7y ago

Actually it's a LOT more than that, since each of the 5 colors has 128 possible values. There's some duplication (same color slot, trivial rotations and reflections and such) but I think the order of magnitude is probably "cluster of galaxies" at a minimum :-)

1 more reply

comboy7y ago

OK, you're so grayed out but your bio says you've been programming since '79 and you've written games for Atari. So perhaps all we need is some elaboration? They seem like a successful company, don't they?

vidarh7y ago

There is not really much if anything left of the original Atari. That pretty much failed after the video game crash in 1983, and was split in two, and bounced around various owners.

The Atari that brought out the Atari ST etc. was one of those, but that pretty much failed and Tramiel merged it into JTS and later sold the remains to Hasbro which then sold it to Infogrames Entertainment. The current Atari Inc. used to be Infogrames, and just licensed the name. Infogrames Entertainment itself then renamed itself to Atari SA.

The other part of the original Atari, Atari Games Inc. failed in 2003. The intellectual property of that division is as far as I know now owned by Warner.

taneq7y ago

Story time maybe? Atari were successful for a while but they crashed pretty hard.

wmblaettler7y ago

A story from the GP: http://www.dadhacker.com/blog/?p=987

1 more reply

anjc7y ago

Stories!

coldseattle7y ago

I can post on hacker news with only 4.

a_t487y ago

Yeah, that's pretty apparent. :)

j / k navigate · click thread line to collapse

30 comments

pjrule7y ago

[1]. https://worldmodels.github.io

[2]. https://arxiv.org/abs/1712.06567

giuse7y ago

You may be interested in an even older paper: http://www.idsia.ch/~juergen/icdl2011cuccu.pdf

pjrule7y ago

giuse7y ago

1 more reply

spewilly7y ago

>I hope to apply some of the ideas in this paper to multiple co-evolving agents...

care to elaborate?

kthejoker27y ago

Cool article, lots to digest, one thing caught my eye:

"To the best of our knowledge, the only prior work using unsupervised learning as a pre-processor for neuroevolution is (cite)."

Just amazing how much low-hanging fruit there still is in the space.

giuse7y ago

markatkinson7y ago

Either way if you feel to be in a similar spot check out this resource: https://reinforce.io and their respective Github repo: https://github.com/reinforceio/tensorforce.

Just reading through their code, and documentation has made a lot of the concepts clearer.

kthejoker27y ago

And the Github library:

https://github.com/giuse/DNE/tree/nips2018

giuse7y ago

Ruby should be perfectly legible with a Python background, but for any question just ping me on twitter. I would be happy to build a dialog :)

mkrum7y ago

Just curious, why did you pick ruby over python? Personal familiarity?

giuse7y ago

vidarh7y ago

It's particularly interesting that they've chosen to wrap Python using Pycall. I'd love to hear about the tradeoffs of that.

1 more reply

giuse7y ago

kabdib7y ago

... and that's three more than the average Atari marketing exec had back then. No wonder they had trouble understanding the game industry :-)

kabdib7y ago

Okay, a true story, my second disappointing interaction with Atari marketing.

One fine day my boss came to me and said that he had an ask from Atari Marketing (in the Home Computer arm of the company).

So I'm still sticking with three brain cells, despite all the downvotes :-)

arijun7y ago

I was curious, so I threw it into Wolfram alpha:

(weight of a sheet of paper) * 5^64

.4 x Milky Way mass

Almost half the galaxy mass, that's a lot of grains of rice!

kabdib7y ago

1 more reply

comboy7y ago

vidarh7y ago

There is not really much if anything left of the original Atari. That pretty much failed after the video game crash in 1983, and was split in two, and bounced around various owners.

The other part of the original Atari, Atari Games Inc. failed in 2003. The intellectual property of that division is as far as I know now owned by Warner.

taneq7y ago

Story time maybe? Atari were successful for a while but they crashed pretty hard.

wmblaettler7y ago

A story from the GP: http://www.dadhacker.com/blog/?p=987

1 more reply

anjc7y ago

Stories!

coldseattle7y ago

I can post on hacker news with only 4.

a_t487y ago

Yeah, that's pretty apparent. :)

j / k navigate · click thread line to collapse