Are the models that exist today a "true scottsman" for you?
Yes, but the claims do not. When the hypemen were shouting that GPT-3 was near-AGI, it still turned out to be absolute shit. When the hypemen were claiming that GPT-3.5 was thousands of times better than GPT-3 and beating all highschool students, it turned out to be a massive exaggeration. When the hypemen claimed that GPT-4 was a groundbreaking innovation and going to replace every single programmer, it still wasn't any good.
Sure, AI is improving. Nobody is doubting that. But you can only claim to have a magical unicorn so many times before people stop believing that this time you might have something different than a horse with an ice cream cone glued to its head. I'm not going to waste a significant amount of my time evaluating Unicorn 5.0 when I already know I'll almost certainly end up disappointed.
Perhaps it'll be something impressive in a decade or two, but in the meantime the fact that Big Tech keeps trying to shove it down my throat even when it clearly isn't ready yet is a pretty good indicator to me that it is still primarily just a hype bubble.
I agree it will probably be something in a decade, but right now, it has some interesting concepts but I do notice upon successive iterations of chat responses that its got a ways to go.
It remind me of Tesla car owners buying into the self-driving terminology. Yes the drive assistant technology has improved quite a bit since cruise control, but its a far cry from self-driving.
For example, I dismissed AI three years ago because it couldn’t do anything I needed it to. Today I use it for certain things and it’s not quite capable of other things. Tomorrow it might be capable of a lot more.
Yes, priors have to be updated when the ground truth changes and the capabilities of AI change rapidly. This is how chess engines on supercomputers were competitive in the 90s then hybrid systems became the leading edge competitive and then machines took over for good and never looked back.
So I would say the overall service provided is better than it was, thanks to functions being built based on user queries, but not the actual LLM models themselves.
It is also true that the tooling and context management has gotten more sophisticated (often using models by the way). That doesn’t negate that the models themselves have gotten better at reliable tool calling so that the LLM is driving more of the show rather than purpose built coordination into the LLM and that the codegen quality is higher than it used to be.