I’m curious to explore such fun programming code. But I’m also curious to explore what knowledgeable humans consider to be both “ground truth” as well as “absolutely ridiculous” to create within the usual time constraints.
Stockfish is a superhuman chess program. It's routinely used in chess analysis as "ground truth": if Stockfish says you've made a mistake, it's almost certain you did in fact make a mistake[0]. Also, because it's incomparably stronger than even the very best humans, sometimes the moves it suggests are extremely counterintuitive and it would be unrealistic to expect a human to find them in tournament conditions.
Obviously software development in general is way more open-ended, but if we restrict ourselves to puzzles and competitions, which are closed game-like environments, it seems plausible to me that a similar skill level could be achieved with an agent system that's RL'd to death on that task. If you have base models that can get there, even inconsistently so, and an environment where making a lot of attempts is cheap, that's the kind of setup that RL can optimize to the moon and beyond.
I don't predict the future and I'm very skeptical of anybody who claims to do so, correctly predicting the present is already hard enough, I'm just saying that given the progress we've already made I would find plausible that a system like that could be made in a few years. The details of what it would look like are beyond my pay grade.
---
[0] With caveats in endgames, closed positions and whatnot, I'm using it as an example.
Do you have any links? I haven't seen any such (forget GM, not even Magnus), barring the opponent making mistakes.
That said, chess is such a great human invention. (Go is up there too. And texas no-limit hold'em poker. Those are my top 3 votes for "best human tabletop games ever invented". They're also, perhaps not uncoincidentally, the hardest for computers to be good at. Or, were.)
If you look on Youtube there are many channels where strong players analyze these games. As Demis Hassabis once put it, it's like chess from another dimension.
If you want to see this against someone like Magnus, it is rare as super GMs do not spend a lot of time playing engines publicly.
But if you want to see them against a normal chess master somewhere between master and international master, it is every where. For e.g. this guy analyses his every match afterwards and you frequently here "oh I would never see that line":
https://www.youtube.com/playlist?list=PLp7SLTJhX1u6zKT5IfRVm...
(start watching around 1000+ for frequently seeing those moments)
> it suggests are extremely counterintuitive and it would be unrealistic to expect a human to find them...
> ... in tournament conditions.
I'm suggesting that I'd like to see the ones that humans have found - outside of tournament conditions. Perhaps the gulf between us arises from an unspoken reference to solutions "unrealistic to expect a human to find" without the window-of-time qualifier?