Longer answer: The concept of self-play isn't new in any sense. All chess players use this technique to some degree. None use only this technique. The advantage of self play is that there's no risk of accidentally picking up someone else's an incorrect assumption. Since you're deriving everything from scratch. Some people take this to extremes, there's a math professor who doesn't read any math papers so that he's deriving everything from first principles and not "contaminating his mind" it works quite well for him but unfortunately I'm blanking on his name. However, commitment to this technique removes one of the major advantages that humans have which is their ability to communicate knowledge amongst themselves in a compact, abstract way with language. Humans also have a pretty good way to mitigate the faulty assumption risk: skepticism. We can reevaluate our assumptions, and, if we deem it necessary, excise them from our mental model. AlphaZero could in theory do the same thing, the reality for AlphaZero though is that there's not much point, it has no use for the sum total of human knowledge on chess, it's capable of recreating that and much more in a few hours.
If there is something to be learned from AlphaZero's training it's that you should always be skeptical of your assumptions, that's not anything new, but it's always worth reiterating. It's pretty obviously not feasible to take this to the extremes of AlphaZero though, humans need other humans to learn. Even the math professor who doesn't read papers needed a lot of interfacing with other humans to learn to get to the point where he could derive things from first principles.
John Nash (supposedly) had this mindset? Is that who you're thinking about?
I think you have two separate points, one with which I agree and one with which I disagree.
First, I agree (and other commentators about AlphaZero seem to as well) that human learning "algorithms" still beat AlphaZero's on per-game ROI.
On the other hand, I disagree that AlphaZero's self-play is no more interesting than a human playing someone better and learning from them. AlphaGo, AlphaZero's predecessor, followed a strategy more like what you described, learning from a large corpus of existing expert chess matches. AlphaZero, on the other hand, requires no training beyond an encoding of the basic rules of chess that it can understand. From there, it bootstraps its understanding of chess without input from experts.
This is the piece I find most interesting, see as potentially useful for the future of human learning, and believe differs from practice with an expert teacher. And so I wonder, can we design learning environments where the learner bootstraps their own understanding from a limited input without continuous feedback from an expert or teacher?
You can't put information into people. You can present it to them, but they need to teach themselves, so to speak.
If you have what I think is a good schooling system, it will recognize and emphasize the self-teach aspect - students are encouraged to figure things out on their own.
For instance, where I studied CS most of the time was allocated to doing semester projects where we'd be a small self-organized team of 3-7 students working on something with very little external input.
You can find similar ideas for schools, e.g. Sudbury schools. I think the Waldorf school has some aspects of it too. I'm sending my children to such a school.
Why would you remove continuous feedback from expert or teacher? Would that make human learning "faster" and more "efficient"? That approach works for AI because unlike human, AI remembers every single data point with 100% accuracy and can iterate repeatedly without fatigue. It also does not suffer from issues such as boredom and it doesn't require motivation either.
By the way, human already learn from experience by bootstraping their own understanding, teachers and experts exist to fast track the beginning phase so a kid doesn't have to play ten thousand games just to reach beginner skill level.
Yes, and we do it all the time.
We can just do a lot better with continuous feedback. (And AI probably could, too, if experts that could communicate fast enough not to be a huge drag on the AIs training cycles were available. But since with current technology once we've trained an AI of the type we can make today, we can replicate it, that's not really important; if ever developed AIs that depend on reconfigurable hardware without trivially extractable state, that may change.)
AlphaZero plays millions of games against itself with a low per-game ROI in much less time than it takes a human playing against an expert with a high ROI. In this way AlphaZero has more work to do than the human to achieve a certain skill level, after which it is probably doing a similar amount of work to the human to continue to improve but can do it in much larger numbers.
On the other hand, I think I've heard of experts at chess playing games against themselves but I can't seem to find a reference at the moment.
Alpha Zero can play millions of games against itself. I can't.
I can imagine that a healthy dose of probability theory (and probably more advanced stuff I don't know about[1]) might improve (1), but (2) is going to keep computer scientists and philosophers and ethicists arguing for quite a long time. :)
[1] get the joke, eh? eh? eh?
I'm not sure why this matters? Everyone plays chess with perfect information. Both players see the entire board and all possibilities unlike, say, Scrabble or poker.
This is why AlphaGo leveled up into AlphaZero playing Chess, and didn't learn to play Starcraft (yet).
Sure, it reached peak skill after 4 hours of learning, but how many games did it play during those 4 hours? How many moves did it memorize perfectly and analyzed? Are those numbers even achievable by a human in one's lifetime?
Even with AlphaZero's efficiency, it still evaluates 80000 moves per second, which is by far more moves than a human grandmaster evaluates in an entire game. If we cut AlphaZero's "processing power" to that of a human, can it still beat a top level human player, let alone other AIs?
To me it seems like there is still a long way to go to improve in this space.
Now that I think about it though, one might argue that human learning in a given discipline starts as isolated with feedback only coming from the outside world. This is what we typically call research. But the magic of our education system, when it works, is that we compress the output of this slow process into a faster one and feed it to learners, allowing them to build understanding of knowledge which originally took generations to discover. Riffing off Matt Might's illustrated depiction of a PhD (http://matt.might.net/articles/phd-school-in-pictures/), expanding the circle of knowledge is exponentially slower than getting close to the edge.
For example, there's this interesting discussion: https://www.reddit.com/r/chess/comments/7ibzq4/stockfish_vs_...
Because Alphazero did not learn from human games, it looks at the different pieces without attaching values like we do. It has no problems sacrificing a higher "valued" piece for the sake of its strategy.
Something like "Here's a board position. It looks utterly hopeless but the problem says "Black to mate in 7 moves". How can you get there from here without relying on White making any beginner's mistakes?" is pretty much self-play.