Emergent Tool Use from Multi-Agent Interaction (opens in new tab)

(openai.com)

333 pointsgdb6y ago61 comments

61 comments

50 comments · 24 top-level

brianpgordon6y ago· 5 in thread

This is incredible. The various emergent behaviors are fascinating. I remember being amazed a decade ago by the primitive graphics in artificial life simulators like Polyworld:

https://en.wikipedia.org/wiki/Polyworld

https://www.youtube.com/watch?v=_m97_kL4ox0&t=9m43s

It seems that OpenAI has a great little game simulated for their agents to play in. The next step to make this even cooler would be to use physical, robotic agents learning to overcome challenges in real meatspace!

visarga6y ago

> The next step to make this even cooler would be to use physical, robotic agents learning to overcome challenges in real meatspace!

That's one of the main challenges - how to learn safely and with fewer than millions of trials, so it can be feasible to do in the real world.

lopmotr6y ago

> real meatspace!

I'm doing something like this as a hobby but only single agent. The input is camera images and reward is based on a stopped/moving flag determined by changes between successive images as well as favoring going forward over turning. So far, it can learn to avoid crashing into walls, which is about all I'd expect. Trying to find good automated rewards without building too much special hardware is difficult. It's a vanilla DQN.

bryanrasmussen6y ago

hmm, yes in the story I'm envisioning the AIs don't wipe out humanity because they have achieved sentience, but just because it turns out killing all humans is an optimizing component of solving some other problem.

CodeGlitch6y ago

I think we humans have already solved this problem you describe... we call them laws. We use these laws to prevent people doing bad things, and I see no reason why they can't be described to an AI to drive its behavior to one that isn't going to end humanity .
for the most part. fingers crossed.

1 more reply

ismail6y ago

Asimov 3 rules as the final policy when making decisions should sort this problem out. This assumes that the rules cannot be changed by the AI.

2 more replies

breck6y ago· 4 in thread

What is the size of these “strategies”, measured in weights,bytes, or whatever measurement you look at?

tlb6y ago

1.6 million parameters. There are some details in section 5 and appendix B.7 of https://d4mucfpksywv.cloudfront.net/emergent-tool-use/paper/...

breck6y ago

Thanks! I like that section on how batch size affects convergence. I wonder how parameter size limits would similarly affect which Stages could be reached. I would not be surprised if you could hit those stages with 100x or fewer params.

XCSme6y ago

You mean what is actually stored in "memory" to play those actions? It's usually the trained model, which is a graph with many layers, each containing many nodes and being connected to other layers. Depending on the model size, it can take anywhere from a few MB to tens or hundreds of GBs, but usually the smaller the better (as having a too large model will lead to over-fitting, meaning that it has enough data to only learn the strategies needed for the current problem and not generalize the solving of such problems).

breck6y ago

Yup, this is what I’m talking about. I’ve been following googles WANNs and other methods for reducing the size of these “strategies” drastically. In the WANN paper they have an example of going from About 2500 Parmams to 40. I’ve got my own hunch as to where it is going, but I’m wondering if openai is studying this at all.

Like there are simple primitives in civil engineering like the level and pulley, I expect we find a zoo of tiny primitives which are the things that these agents are learning.

corey_moncure6y ago· 3 in thread

One plausible, perhaps optimal strategy in the second arena is for the hiders to build a shelter around the seekers and lock them in place, circumventing the whole cat and mouse over ramps and ramp surfing (which the seekers would never be able to access). I wonder why this strategy is not arrived at.

godelski6y ago

> I wonder why this strategy is not arrived at.

That's always a good question! One thing to remember is that in RL you are looking through large solution spaces. You probably aren't going to find a global optima (if one even exists!). What will happen is that a local optima is found and just one that _works_. This is why having a feature rich space is important, because it helps you escape the locality, but also remember that we don't even know what the solution space looks like and what an optimal solution is.

It is also entirely possible that you retrain something from scratch and find a different local optima. Self play can help with this as well as multi-agents, but we're still not guaranteed to find every solution nor solutions that appear obvious to us. RL just tries things (often randomly) till they start working (then they bias towards what worked).

minimaxir6y ago

There are multiple seekers, and the seekers may not be placed close together.

PhasmaFelis6y ago

At one point in the video, it looked like a hider moving a object past a frozen seeker jostled the seeker with it. I wonder if it's possible to use the objects to push seekers together, then "jail" them.

Inufu6y ago· 2 in thread

Nice visualizations and explanation!

You might want to make it clearer that the agents don't actually receive any visual observations, but rather directly the xy positions of all other agents and objects.

This also seems very similar to "Capture the Flag: the emergence of complex cooperative agents" (https://deepmind.com/blog/article/capture-the-flag-science)?

Regarding the conclusion:

> We’ve provided evidence that human-relevant strategies and skills, far more complex than the seed game dynamics and environment, can emerge from multi-agent competition and standard reinforcement learning algorithms at scale. These results inspire confidence that in a more open-ended and diverse environment, multi-agent dynamics could lead to extremely complex and human-relevant behavior.

This has been well established for a while already, e.g. the DeepMind Capture the Flag paper above, AlphaGo discovering the history of Go openings and techniques as it learns from playing itself, AlphaZero doing the same for chess, etc.

gdbOP6y ago

Good catch! Will update the post to be explicit that there are many pre-existing awesome results in this vein.

cscurmudgeon6y ago

Any possibility of releasing the simulation environment? Looks quite cool!

1 more reply

tlb6y ago· 2 in thread

The animations are nice, compared to a default visualization with dots and lines moving around. Was this done just for the public release, or was it worth it to researchers to have an eye-pleasing visualization while doing the experiments?

visarga6y ago

The environment was actually an important part of the project. It does physics simulation. Having such a 'realistic' environment allowed the agents to discover all sorts of cheats (they appear at the end of the article).

boardwaalk6y ago

They're talking about the visualization, not the physics. The agents aren't getting visual input. That would make things much, much slower.

1 more reply

The_rationalist6y ago· 2 in thread

Am I misunderstanding something?

Instead of teaching the "AI" intelligent rules or rules for creating rules for maximising their goals. They teach them nothing, which means they have 0 usable high level knowledge. And the "AI" pure bruteforce for finding empirically best solutions for this ridiculously simple universe.

How is that advancing research? This is just a showcase of what modern hardware can do, and also a showcase of how far we are from teaching intelligence. My brain understand the semantics of this universe and would have been able to find most strategies without simulating the game more than once in my head. So definitely this is a showcase of how far (bruteforce is like step 0) we (or at least openAI) are from making AGI.

The_Amp_Walrus6y ago

Some AI researchers believe that using learning methods with no built-in prior knowledge and throwing a bunch of compute at them is the path to building effective AI. I'm thinking of Richard Sutton in particular:

- Bitter Lesson essay: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

- A lecture of his on temporal difference learning, which is a "model-free" method of reinforcement learning: https://www.youtube.com/watch?v=LyCpuLikLyQ

I personally don't agree with his emphasis on model-free learning, but it's not the case that people are building model-free RL agents because they don't understand the trade off that they're making.

nitrogen6y ago

How do you know your own brain isn't running thousands of parallel simulations in your head, even though you perceive it only once? How did your brain learn to reason about physics in the first place if not by repeatedly finding objects in your environment and randomly manipulating them?

homieg336y ago· 2 in thread

I wonder if it’s possible to incorporate a monkey see monkey do aspect to the learning algorithm that could observe human’s playing the game and incorporate that information into its models?

visarga6y ago

Yes, it's called imitation learning and is a subfield of reinforcement learning. The problem is that even a small error could gradually accumulate and cause the sequence of actions to diverge. RL agents learn not just how to act in a given situation but also to evaluate possible actions, situations and even to model the environment. That way they can adapt dynamically instead of diverging from the optimal actions.

homieg336y ago

Interesting, ideally it uses the observed human behaviors to seed/inform it’s own attempts as a shortcut to advanced behavior without the many millions of generations needed.

jpetrucc6y ago· 2 in thread

As always, crazy interesting stuff coming out of OpenAI!

This is the type of stuff that amazes me - I really wish I had more of an opportunity to play with AI/ML in my day to day work.

gdbOP6y ago

> I really wish I had more of an opportunity to play with AI/ML in my day to day work.

Anyone who feels this way — we're hiring :)! https://openai.com/jobs/.

(Also if I can answer any questions about OpenAI, feel free to ping me at gdb@openai.com.)

shubidubi6y ago

Anything remote friendly? Unfortunately all jobs are in SF only.

dooglius6y ago· 1 in thread

The state space here looks pretty small, it seems to me that with so much training it's just a case of brute-force search. When I think of "tool use" in regards to the intelligence of early humans, I imagine something more like [0] where the state space is enormous and it takes a good deal of reasoning and planning to get to a desired result.

[0] https://www.youtube.com/watch?v=BN-34JfUrHY

nicklovescode6y ago

It's unclear to me that we navigated the state space so discretely. My guess would be that we used a combination of rock throwing + stock hitting before eventually deciding that combining the two might be fruitful.

After the idea is polished it looks clever, but it may have been invented through a series of mostly random steps

SmooL6y ago· 1 in thread

Amazing. Very cool to see this sort of emergent behavior.

I also very much enjoyed this section:

"We propose using a suite of domain-specific intelligence tests that target capabilities we believe agents may eventually acquire. Transfer performance in these settings can act as a quantitative measure of representation quality or skill, and we compare against pretraining with count-based exploration as well as a trained from scratch baseline."

Along with the videos, I can't help but get a very 'Portal' vibe from it all. "Thank you for helping us help you help us all." - GLaDOS

jcims6y ago

Did you see the 'Surprising Behaviors' at the bottom? Pretty funny

https://openai.com/blog/emergent-tool-use/#surprisingbehavio...

haylel6y ago· 1 in thread

Looks awesome. I tried coding up a multi-agent system for my CS degree and it was incredibly complicated. I was trying to implement an algorithm I found to give each agent emotions of fear, anger, happiness and sadness in order to change their behaviours... it was way more difficult than I expected but you can read more about it here if you're also interested in this stuff. The 3D graphics in this example are way cooler than my 2D shapes.

https://medium.com/@dshields/working-with-emotional-models-i...

teabee896y ago

This is really interesting, thank you for sharing!

mooneater6y ago· 1 in thread

Finally Auotcirricula gets some love! Discussed in some detail in https://www.talkrl.com/episodes/natasha-jaques

mooneater6y ago

Natasha Jaques explains the idea at about 39:50

lettergram6y ago

Even my work with basic circuits for sea slugs led to “cooperative” behavior:

https://austingwalters.com/modeling-and-building-robotic-sea...

I think sometimes we see what we want to see. Not saying it’s not interesting work, just that it’s less round breaking than you may think.

sebringj6y ago

I'm completely amazed by that. The hint of a simulated world seems so matrix-like as well, imagine some intelligent thing evolving out of that. Wow.

YeGoblynQueenne6y ago

This is visually very impressive, of course, but what is the significance of this work? I am not very familiar with intelligent agents research so I don't understand to what extent learning cooperative tool use in an adversarial environment (if I understand correctly what is shown) represents an important advancement of the state of the art in intelligent agents research, or not.

In any case this is a simulation- so it's basically impossible to take the learned model and use it immediately in a real-world environment with true physics and arbitrary elements, let alone with unrestricted dimensions (the agents in the article are for the most part restricted to a limited play area). So if I understand this correctly the trained model is only good for the specific simulated environment and would not work as well under even slightly different conditions.

rkagerer6y ago

I love how the 3D visualization and game selection make their research immediately relatable - right down to the cute little avatars!

"We’ve shown that agents can learn sophisticated tool use in a high fidelity physics simulator"

I always suspected to evolve intelligence you need an environment rich in complexity. Intelligence we're familiar with (e.g. humans) evolved in a primordial soup packed with possibilities and building blocks (e.g. elaborate rules of physics, amino acids, etc). It's great to see this concept being explored.

It reminds me of Adrian Thompson's experiments in the 90's running generational genetic algorithms on a real FPGA instead of mere simulations [1].

After 5000 generations he coaxed out a perfect tone recognizer. He was able to prune 70% of the circuit (lingering remnants of earlier mutations?) to find it still worked with only 32 gates - an unimaginable feat! Engineers were baffled when they reverse-engineered what remained: if I recall correctly, transistors were run outside of saturation mode, and EM effects were being exploited between adjacent components. In short, the system took a bunch of components designed for digital logic but optimized them using the full range of analog quirks they exhibited.

More recent attempts to recreate his work have reportedly been hampered by modern FPGA's which make it harder to exploit those effects as they don't allow reconfiguration at the raw wiring level [2].

In Thompson's own words:

"Evolution has been free to explore the full repertoire of behaviours available from the silicon resources provided, even being able to exploit the subtle interactions between adjacent components that are not directly connected.... A 'primordial soup' of reconfigurable electronic components has been manipulated according to the overall behavior it exhibits"

---

[1] Paper: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=669...

Less technical article: https://www.damninteresting.com/on-the-origin-of-circuits/

[2] https://www.reddit.com/r/MachineLearning/comments/2t5ozk/wha...

markkat6y ago

Has any intelligence arisen without multi-agent interaction?

Probably belongs in our definition of intelligence.

ReDeiPirati6y ago

Great viz, design & structure! But for the first time, I had the impression that you didn't report anything new or different. All the takeaways of this work were pretty obvious given the last couple of years research. Am I missing anything?

cr0sh6y ago

I have a friend who observed similar emergent behavior in an a-life (gene-based from what I understand) simulation he created, in an environment of "tanks in a maze" (or something like that).

The "genes" consisted a simplified assembler (run on a VM) that could describe a program the tank would use to control itself - it could sense other tanks within line-of-sight to a certain degree, it could sense walls, it could fire its cannon, move in a particular direction, sense when another tank had a bearing (cannon pointed) on itself, etc.

He set up 100 random tanks (with random "genes"/programs) and let the simulation run. Top scorers (who had the most kills) would be used to seed the next "generation", using a form of sexual "mating" and (pseudo-) random mutation. Then that generation would run.

He said he ran the simulation for days at a time. One day he noticed something odd. He started to notice that certain tanks had "evolved" the means to "teleport" from location to location on the map. He didn't design this possibility in - what had happened was (he later determined) that a bug he had left in the VM was being exploited to allow the tanks to instantaneously move within their environment. He thought it was interesting, so he left it as-is and let the simulation continue.

After a long period of running, my friend then noticed something very odd. Some tanks were "wiggling" their turrets - other tanks would "wiggle" in a similar fashion. After a while all he could deduce was that in some manner, they were communicating with each other, similar to "bee dancing", and starting to form factions against each other...

...it was at that point he decided things were getting much too strange, and he stopped the experiment.

Sadly, he no longer has a copy of this software, but I believe his story, simply because I have seen quite a bit of other code and have worked closely with him on various projects since (as an adult) to know that such a system was well within his capability of creating.

At the time, he was probably only 16 or 17 years old, the computer was a 386, and this was sometime in the early 1990s. I believe the software was likely a combination of QuickBasic 4.5 and 8086 assembler running under DOS, as that was his preferred environment at the time.

I've often considered recreating the experiment, using today's technology, just to see what would happen (at the time he related this to me, as an adult, he asked me how difficult it would be to make a more physical version of this "game"; I'm still not sure if he meant scale model tanks, or full-sized - knowing him, though, he would have loved to play with the latter).

eiopa6y ago

I dig the fine-tuning tests!

Did you end up using this as a way to estimate how "healthy" the agents are, or was this explored after the system was already working well?

fedebehrens6y ago

Does anyone know if there are some accessible GitHub projects that can do something similar to this? Would like to set up a new project with my nephew :)

westurner6y ago

I, for one, really appreciate the raytracing in these visualizations. I wish for more box surfing examples.

Leary6y ago

Anyone thinks the hiders will learn to box the seekers in entirely before the rounds start?

adamnemecek6y ago

This is just adjoint functors. Pls work out automatic integration. Dual numbers is where the path starts.

j / k navigate · click thread line to collapse

61 comments

50 comments · 24 top-level

brianpgordon6y ago· 5 in thread

This is incredible. The various emergent behaviors are fascinating. I remember being amazed a decade ago by the primitive graphics in artificial life simulators like Polyworld:

https://en.wikipedia.org/wiki/Polyworld

https://www.youtube.com/watch?v=_m97_kL4ox0&t=9m43s

visarga6y ago

> The next step to make this even cooler would be to use physical, robotic agents learning to overcome challenges in real meatspace!

That's one of the main challenges - how to learn safely and with fewer than millions of trials, so it can be feasible to do in the real world.

lopmotr6y ago

> real meatspace!

bryanrasmussen6y ago

CodeGlitch6y ago

1 more reply

ismail6y ago

Asimov 3 rules as the final policy when making decisions should sort this problem out. This assumes that the rules cannot be changed by the AI.

2 more replies

breck6y ago· 4 in thread

What is the size of these “strategies”, measured in weights,bytes, or whatever measurement you look at?

tlb6y ago

1.6 million parameters. There are some details in section 5 and appendix B.7 of https://d4mucfpksywv.cloudfront.net/emergent-tool-use/paper/...

breck6y ago

XCSme6y ago

breck6y ago

Like there are simple primitives in civil engineering like the level and pulley, I expect we find a zoo of tiny primitives which are the things that these agents are learning.

corey_moncure6y ago· 3 in thread

godelski6y ago

> I wonder why this strategy is not arrived at.

minimaxir6y ago

There are multiple seekers, and the seekers may not be placed close together.

PhasmaFelis6y ago

Inufu6y ago· 2 in thread

Nice visualizations and explanation!

You might want to make it clearer that the agents don't actually receive any visual observations, but rather directly the xy positions of all other agents and objects.

This also seems very similar to "Capture the Flag: the emergence of complex cooperative agents" (https://deepmind.com/blog/article/capture-the-flag-science)?

Regarding the conclusion:

gdbOP6y ago

Good catch! Will update the post to be explicit that there are many pre-existing awesome results in this vein.

cscurmudgeon6y ago

Any possibility of releasing the simulation environment? Looks quite cool!

1 more reply

tlb6y ago· 2 in thread

visarga6y ago

boardwaalk6y ago

They're talking about the visualization, not the physics. The agents aren't getting visual input. That would make things much, much slower.

1 more reply

The_rationalist6y ago· 2 in thread

Am I misunderstanding something?

The_Amp_Walrus6y ago

- Bitter Lesson essay: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

- A lecture of his on temporal difference learning, which is a "model-free" method of reinforcement learning: https://www.youtube.com/watch?v=LyCpuLikLyQ

I personally don't agree with his emphasis on model-free learning, but it's not the case that people are building model-free RL agents because they don't understand the trade off that they're making.

nitrogen6y ago

homieg336y ago· 2 in thread

I wonder if it’s possible to incorporate a monkey see monkey do aspect to the learning algorithm that could observe human’s playing the game and incorporate that information into its models?

visarga6y ago

homieg336y ago

Interesting, ideally it uses the observed human behaviors to seed/inform it’s own attempts as a shortcut to advanced behavior without the many millions of generations needed.

jpetrucc6y ago· 2 in thread

As always, crazy interesting stuff coming out of OpenAI!

This is the type of stuff that amazes me - I really wish I had more of an opportunity to play with AI/ML in my day to day work.

gdbOP6y ago

> I really wish I had more of an opportunity to play with AI/ML in my day to day work.

Anyone who feels this way — we're hiring :)! https://openai.com/jobs/.

(Also if I can answer any questions about OpenAI, feel free to ping me at gdb@openai.com.)

shubidubi6y ago

Anything remote friendly? Unfortunately all jobs are in SF only.

dooglius6y ago· 1 in thread

[0] https://www.youtube.com/watch?v=BN-34JfUrHY

nicklovescode6y ago

After the idea is polished it looks clever, but it may have been invented through a series of mostly random steps

SmooL6y ago· 1 in thread

Amazing. Very cool to see this sort of emergent behavior.

I also very much enjoyed this section:

Along with the videos, I can't help but get a very 'Portal' vibe from it all. "Thank you for helping us help you help us all." - GLaDOS

jcims6y ago

Did you see the 'Surprising Behaviors' at the bottom? Pretty funny

https://openai.com/blog/emergent-tool-use/#surprisingbehavio...

haylel6y ago· 1 in thread

https://medium.com/@dshields/working-with-emotional-models-i...

teabee896y ago

This is really interesting, thank you for sharing!

mooneater6y ago· 1 in thread

Finally Auotcirricula gets some love! Discussed in some detail in https://www.talkrl.com/episodes/natasha-jaques

mooneater6y ago

Natasha Jaques explains the idea at about 39:50

lettergram6y ago

Even my work with basic circuits for sea slugs led to “cooperative” behavior:

https://austingwalters.com/modeling-and-building-robotic-sea...

I think sometimes we see what we want to see. Not saying it’s not interesting work, just that it’s less round breaking than you may think.

sebringj6y ago

I'm completely amazed by that. The hint of a simulated world seems so matrix-like as well, imagine some intelligent thing evolving out of that. Wow.

YeGoblynQueenne6y ago

rkagerer6y ago

I love how the 3D visualization and game selection make their research immediately relatable - right down to the cute little avatars!

"We’ve shown that agents can learn sophisticated tool use in a high fidelity physics simulator"

It reminds me of Adrian Thompson's experiments in the 90's running generational genetic algorithms on a real FPGA instead of mere simulations [1].

More recent attempts to recreate his work have reportedly been hampered by modern FPGA's which make it harder to exploit those effects as they don't allow reconfiguration at the raw wiring level [2].

In Thompson's own words:

---

[1] Paper: http://citeseerx.ist.psu.edu/viewdoc/download;jsessionid=669...

Less technical article: https://www.damninteresting.com/on-the-origin-of-circuits/

[2] https://www.reddit.com/r/MachineLearning/comments/2t5ozk/wha...

markkat6y ago

Has any intelligence arisen without multi-agent interaction?

Probably belongs in our definition of intelligence.

ReDeiPirati6y ago

cr0sh6y ago

I have a friend who observed similar emergent behavior in an a-life (gene-based from what I understand) simulation he created, in an environment of "tanks in a maze" (or something like that).

...it was at that point he decided things were getting much too strange, and he stopped the experiment.

eiopa6y ago

I dig the fine-tuning tests!

Did you end up using this as a way to estimate how "healthy" the agents are, or was this explored after the system was already working well?

fedebehrens6y ago

Does anyone know if there are some accessible GitHub projects that can do something similar to this? Would like to set up a new project with my nephew :)

westurner6y ago

I, for one, really appreciate the raytracing in these visualizations. I wish for more box surfing examples.

Leary6y ago

Anyone thinks the hiders will learn to box the seekers in entirely before the rounds start?

adamnemecek6y ago

This is just adjoint functors. Pls work out automatic integration. Dual numbers is where the path starts.

j / k navigate · click thread line to collapse