undefined | Better HN

0 pointssnovv_crash3y ago0 comments

Decision problems are a subset of learning problems. As soon as someone can simulate your environment there is no negative consequence to further exploring the solution space via differentiable evaluation methods which allow efficiently training an optimal player.

0 comments

JoshCole3y ago

Your intuitions are steering you wrong. Think about this from first principles in light of some of the corrections I'm going to provide:

> which allow efficiently training an optimal player.

Training an optimal player is not possible in practice. We know and have known the mathematics for optimal play for decades. Since we know it we are able to calculate the amount of space such a solution would take up in memory. Again this is a studied thing. Here is Peter Norvig in Artificial Intelligence: A Modern Approach to tell you the same thing: Page 173. "Because calculating optimal decisions in complex games is intractable, all algorithms must make some assumptions and approximations."

> Decision problems are a subset of learning problems.

This framing has some benefits - it makes generalization simpler. It has some downsides too - in complicated environments it will only approximate the solution and because of that there will be times where it gets things wrong.

In theory you have at first an intractable problem at your initial training time. Then when the game begins and play has progresses you have a more tractable problem because the information available to you eliminates parts of the game tree from consideration. The result of this is that we actually have two learning problems - not one learning problem. One is computed prior to the game. The other is computed during the game.

This theoretical issue has been studied and found to exist in practice by DeepMind. They tried training agents that didn't use tree search and just used the learned heuristic. These lost to agents that also used tree search.

Here is a section from a talk by Noam Brown - he briefly covers your intuition and why it breaks down.

1. https://youtu.be/cn8Sld4xQjg?t=2241

Here is another talk in which he goes over the research results of DeepMind and shows the data which stands against your thesis:

2. https://youtu.be/cn8Sld4xQjg?t=1886

This is also something you can see without reference to theory by looking at the physical progress on optimal solutions. Chess solving for example has the solutions via the end game tables, but they only have them for the more specific instances you reach near the end of the game tree. It is widely understood that we don't have enough memory to store the full solution to the game.

> As soon as someone can simulate your environment there is no negative consequence

This is a non-physical claim. There is obviously a cost to computation. It consumes both energy and time. Our best understanding is that we have a finite amount of these. Your theoretical approach isn't physically real.

> As soon as someone can simulate your environment...

It doesn't become easy at this point. It remains intractable.

A very simple example of why it doesn't get easy is the halting problem from computer science.

A more complicated example that you will have to really think about in order to understand is the nature of the equilibrium adversarial strategy. It is defined with a respect to an oracle - something which would be able to perfectly simulate its strategy. And it is trying to not lose to an oracle; it is assuming you have a very good map.

You've got to remember - your simulation is your map - it isn't the territory. When you play, you aren't playing on your map. You are playing in the territory via your map. The equilibrium strategies were already assuming you had a map. So they aren't trying to make it easy for your map to give you the right answer. They are trying to make some places un-mappable.

Again - remember the real world. Do I know your password? Why not? And what is my password, if it is so easy to know it?

j / k navigate · click thread line to collapse