Now we can create new samples and evals for more complex tasks to train up the next gen, more planning, decomp, context, agentic oriented
OpenAI has largely fumbled their early lead, exciting stuff is happening elsewhere
From what I understand, nobody has done any real scaling since the GPT-4 era. 4.5 was a bit larger than 4, but not as much as the orders of magnitude difference between 3 and 4, and 5 is smaller than 4.5. Google and Anthropic haven't gone substantially bigger than GPT-4 either. Improvements since 4 are almost entirely from reasoning and RL. In 2026 or 2027, we should see a model that uses the current datacenter buildout and actually scales up.