https://en.wikipedia.org/wiki/Polyworld
https://www.youtube.com/watch?v=_m97_kL4ox0&t=9m43s
It seems that OpenAI has a great little game simulated for their agents to play in. The next step to make this even cooler would be to use physical, robotic agents learning to overcome challenges in real meatspace!
That's one of the main challenges - how to learn safely and with fewer than millions of trials, so it can be feasible to do in the real world.
I'm doing something like this as a hobby but only single agent. The input is camera images and reward is based on a stopped/moving flag determined by changes between successive images as well as favoring going forward over turning. So far, it can learn to avoid crashing into walls, which is about all I'd expect. Trying to find good automated rewards without building too much special hardware is difficult. It's a vanilla DQN.
Like there are simple primitives in civil engineering like the level and pulley, I expect we find a zoo of tiny primitives which are the things that these agents are learning.
That's always a good question! One thing to remember is that in RL you are looking through large solution spaces. You probably aren't going to find a global optima (if one even exists!). What will happen is that a local optima is found and just one that _works_. This is why having a feature rich space is important, because it helps you escape the locality, but also remember that we don't even know what the solution space looks like and what an optimal solution is.
It is also entirely possible that you retrain something from scratch and find a different local optima. Self play can help with this as well as multi-agents, but we're still not guaranteed to find every solution nor solutions that appear obvious to us. RL just tries things (often randomly) till they start working (then they bias towards what worked).
You might want to make it clearer that the agents don't actually receive any visual observations, but rather directly the xy positions of all other agents and objects.
This also seems very similar to "Capture the Flag: the emergence of complex cooperative agents" (https://deepmind.com/blog/article/capture-the-flag-science)?
Regarding the conclusion:
> We’ve provided evidence that human-relevant strategies and skills, far more complex than the seed game dynamics and environment, can emerge from multi-agent competition and standard reinforcement learning algorithms at scale. These results inspire confidence that in a more open-ended and diverse environment, multi-agent dynamics could lead to extremely complex and human-relevant behavior.
This has been well established for a while already, e.g. the DeepMind Capture the Flag paper above, AlphaGo discovering the history of Go openings and techniques as it learns from playing itself, AlphaZero doing the same for chess, etc.
Instead of teaching the "AI" intelligent rules or rules for creating rules for maximising their goals. They teach them nothing, which means they have 0 usable high level knowledge. And the "AI" pure bruteforce for finding empirically best solutions for this ridiculously simple universe.
How is that advancing research? This is just a showcase of what modern hardware can do, and also a showcase of how far we are from teaching intelligence. My brain understand the semantics of this universe and would have been able to find most strategies without simulating the game more than once in my head. So definitely this is a showcase of how far (bruteforce is like step 0) we (or at least openAI) are from making AGI.
- Bitter Lesson essay: http://www.incompleteideas.net/IncIdeas/BitterLesson.html
- A lecture of his on temporal difference learning, which is a "model-free" method of reinforcement learning: https://www.youtube.com/watch?v=LyCpuLikLyQ
I personally don't agree with his emphasis on model-free learning, but it's not the case that people are building model-free RL agents because they don't understand the trade off that they're making.
This is the type of stuff that amazes me - I really wish I had more of an opportunity to play with AI/ML in my day to day work.
Anyone who feels this way — we're hiring :)! https://openai.com/jobs/.
(Also if I can answer any questions about OpenAI, feel free to ping me at gdb@openai.com.)
After the idea is polished it looks clever, but it may have been invented through a series of mostly random steps
I also very much enjoyed this section:
"We propose using a suite of domain-specific intelligence tests that target capabilities we believe agents may eventually acquire. Transfer performance in these settings can act as a quantitative measure of representation quality or skill, and we compare against pretraining with count-based exploration as well as a trained from scratch baseline."
Along with the videos, I can't help but get a very 'Portal' vibe from it all. "Thank you for helping us help you help us all." - GLaDOS
https://openai.com/blog/emergent-tool-use/#surprisingbehavio...
https://medium.com/@dshields/working-with-emotional-models-i...
https://austingwalters.com/modeling-and-building-robotic-sea...
I think sometimes we see what we want to see. Not saying it’s not interesting work, just that it’s less round breaking than you may think.
In any case this is a simulation- so it's basically impossible to take the learned model and use it immediately in a real-world environment with true physics and arbitrary elements, let alone with unrestricted dimensions (the agents in the article are for the most part restricted to a limited play area). So if I understand this correctly the trained model is only good for the specific simulated environment and would not work as well under even slightly different conditions.
"We’ve shown that agents can learn sophisticated tool use in a high fidelity physics simulator"
I always suspected to evolve intelligence you need an environment rich in complexity. Intelligence we're familiar with (e.g. humans) evolved in a primordial soup packed with possibilities and building blocks (e.g. elaborate rules of physics, amino acids, etc). It's great to see this concept being explored.
It reminds me of Adrian Thompson's experiments in the 90's running generational genetic algorithms on a real FPGA instead of mere simulations [1].
After 5000 generations he coaxed out a perfect tone recognizer. He was able to prune 70% of the circuit (lingering remnants of earlier mutations?) to find it still worked with only 32 gates - an unimaginable feat! Engineers were baffled when they reverse-engineered what remained: if I recall correctly, transistors were run outside of saturation mode, and EM effects were being exploited between adjacent components. In short, the system took a bunch of components designed for digital logic but optimized them using the full range of analog quirks they exhibited.
More recent attempts to recreate his work have reportedly been hampered by modern FPGA's which make it harder to exploit those effects as they don't allow reconfiguration at the raw wiring level [2].
In Thompson's own words:
"Evolution has been free to explore the full repertoire of behaviours available from the silicon resources provided, even being able to exploit the subtle interactions between adjacent components that are not directly connected.... A 'primordial soup' of reconfigurable electronic components has been manipulated according to the overall behavior it exhibits"
---
[1] Paper: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=669...
Less technical article: https://www.damninteresting.com/on-the-origin-of-circuits/
[2] https://www.reddit.com/r/MachineLearning/comments/2t5ozk/wha...
Probably belongs in our definition of intelligence.
The "genes" consisted a simplified assembler (run on a VM) that could describe a program the tank would use to control itself - it could sense other tanks within line-of-sight to a certain degree, it could sense walls, it could fire its cannon, move in a particular direction, sense when another tank had a bearing (cannon pointed) on itself, etc.
He set up 100 random tanks (with random "genes"/programs) and let the simulation run. Top scorers (who had the most kills) would be used to seed the next "generation", using a form of sexual "mating" and (pseudo-) random mutation. Then that generation would run.
He said he ran the simulation for days at a time. One day he noticed something odd. He started to notice that certain tanks had "evolved" the means to "teleport" from location to location on the map. He didn't design this possibility in - what had happened was (he later determined) that a bug he had left in the VM was being exploited to allow the tanks to instantaneously move within their environment. He thought it was interesting, so he left it as-is and let the simulation continue.
After a long period of running, my friend then noticed something very odd. Some tanks were "wiggling" their turrets - other tanks would "wiggle" in a similar fashion. After a while all he could deduce was that in some manner, they were communicating with each other, similar to "bee dancing", and starting to form factions against each other...
...it was at that point he decided things were getting much too strange, and he stopped the experiment.
Sadly, he no longer has a copy of this software, but I believe his story, simply because I have seen quite a bit of other code and have worked closely with him on various projects since (as an adult) to know that such a system was well within his capability of creating.
At the time, he was probably only 16 or 17 years old, the computer was a 386, and this was sometime in the early 1990s. I believe the software was likely a combination of QuickBasic 4.5 and 8086 assembler running under DOS, as that was his preferred environment at the time.
I've often considered recreating the experiment, using today's technology, just to see what would happen (at the time he related this to me, as an adult, he asked me how difficult it would be to make a more physical version of this "game"; I'm still not sure if he meant scale model tanks, or full-sized - knowing him, though, he would have loved to play with the latter).
Did you end up using this as a way to estimate how "healthy" the agents are, or was this explored after the system was already working well?