I am so floored that at least half of this community, usually skeptical to a fault, evangelizes LLMs so ardently. Truly blows my mind.
I’m open to them becoming more than a statistical token predictor, and I think it would be really neat to see that happen.
They’re nowhere close to anything other than a next-token-predictor.
What exactly do you mean by that? I've seen this exact comment stated many times, but I always wonder:
What limitations of AI chat bots do you currently see that are due to them using next token prediction?
It’s kind of like you’re saying “prove god doesn’t exist” when it’s supposed to be “prove god exists.”
If a problem isn’t documented LLMs simply have nowhere to go. It can’t really handle the knowledge boundary [1] at all, since it has no reasoning ability it just hallucinates or runs around in circles trying the same closest solution over and over.
It’s awesome that they get some stuff right frequently and can work fast like a computer but it’s very obvious that there really isn’t anything in there that we would call “reasoning.”
I don't want to address directly your claim about lack of generalization, because there's a more basic issue with the GP statement. Even though I will say, today's models do seem to generalize quite a bit better than you make it sound.
But more importantly, you and GP don't mention any evidence for why that is due to specifically using next token prediction as a mechanism.
Why would it not be possible for a highly generalizing model to use next token prediction for its output?
That doesn't follow to me at all, which is why the GP statement reads so weird.
If I have never eaten a hamburger but own a McDonald’s franchise, am I making an authentic American hamburger?
If I have never eaten fries before and I buy some frozen ones from Walmart, heat them up, and throw them in the trash, did I make authentic fries?
Obviously the answer is yes and these questions are completely irrelevant to my sentience.
I'm more shocked that so many people seem unable to come to grips with the fact that something can be a next token predictor and demonstrate intelligence. That's what blows my mind, people unable to see that something can be more than the sum of its parts. To them, if something is a token predictor clearly it can't be doing anything impressive - even while they watch it do I'm impressive things.
Except LLMs have not shown much intelligence. Wisdom yes, intelligence no. LLMs are language models, not 'world' models. It's the difference of being wise vs smart. LLMs are very wise as they have effectively memorized the answer to every question humanity has written. OTOH, they are pretty dumb. LLMs don't "understand" the output they produce.
> To them, if something is a token predictor clearly it can't be doing anything impressive
Shifting the goal posts. Nobody said that a next token predictor can't do impressive things, but at the same time there is a big gap between impressive things and other things like "replace very software developer in the world within the next 5 years."