Why would reasoning have plateau’d?
I think reasoning ability is not the largest bottleneck for improvement in usefulness right now. Cost is a bigger one IMO.
Running these models as agents is hella expensive, and agents or agent-like recurrent reasoning (like humans do) is the key to improved performance if you look at any type of human intelligence.
Single-shot performance only gets you so far.
For example- If it can write code 90% of the way, and then debug in a loop, it’d be much more performant than any single shot algorithm.
And OpenAI has these huge models in their basement probably. But they might not be much more useful than GPT-4 when used as single-shot. I mean, what could it do what we can’t do today with gpt-4?
It’s agents and recurrent reasoning we need for more usefulness.
At least- That’s my humble opinion as an amateur neuroscientist that plays around with these models.