story
...It is in a format that resembles a published article because it is going to be a published article? "This is a preprint of a chapter that will appear in the book Designing an Intelligence, published by MIT Press." on the first page.
> As for the graph, it’s too generic
A history of RL from DQN to AlphaProof/LLM computer use in Gemini is not 'generic', and could not be.
> it doesn’t provide any real value
It provides value to people who were not around then and not familiar with how RL attention peaks and crests, and a similar chart about TD-Gammon and Deep Blue, say, would likewise be useful for the many people who did not actually live through those eras, and helps contextualize material from back then. (I did, and maybe you did, and so it's not useful to us, but there exist other, younger people in the world, who are not us{{citation needed}}.) And the fact that these cycles exist is something worth reflecting on - Karpathy and others have reflected on how there were expectations of DRL leading to AGI in the 2015-2020 period, which wound up being swamped by self-supervised learning and DRL relegated to a backwater (and contributed very directly to many major events like how OA and DM became like they are now - and why Sutton is at Keen rather than DM with Silver), but now suddenly becoming super-relevant again.