undefined | Better HN

0 pointsfamouswaffles17d ago0 comments

People just overstate their understanding and knowledge, the usual human stuff. The same user has a comment in this thread that contains:

'If you actually know what models are doing under the hood to product output that...'

Any one that tells you they know 'what models are dong under the hood' simply has no idea what they're talking about, and it's amazing how common this is.

0 comments

sulam17d ago

Fair, I should define what I mean by under the hood. By “under the hood” I mean that models are still just being fed a stream of text (or other tokens in the case of video and audio models), being asked to predict the next token, and then doing that again. There is no technique that anyone has discovered that is different than that, at least not that is in production. If you think there is, and people are just keeping it secret, well, you clearly don’t know how these places work. The elaborations that make this more interesting than the original GPT/Attention stuff is 1) there is more than one model in the mix now, even though you may only be told you’re interacting with “GPT 5.4”, 2) there’s a significant amount of fine tuning with RLHF in specific domains that each lab feels is important to be good at because of benchmarks, strategy, or just conviction (DeepMind, we see you). There’s also a lot work being put into speeding up inference, as well as making it cheaper to operate. I probably shouldn’t forget tool use for that matter, since that’s the only reason they can count the r’s in strawberry these days.

None of that changes the concept that a model is just fundamentally very good at predicting what the next element in the stream should be, modulo injected randomness in the form of a temperature. Why does that actually end up looking like intelligence? Well, because we see the model’s ability to be plausibly correct over a wide range of topics and we get excited.

Btw, don’t take this reductionist approach as being synonymous with thinking these models aren’t incredibly useful and transformative for multiple industries. They’re a very big deal. But OpenAI shouldn’t give up because Opus 4.whatever is doing better on a bunch of benchmarks that are either saturated or in the training data, or have been RLHF’d to hell and back. This is not AGI.

stavros17d ago

Everybody says "but they just predict tokens" as if that's not just "I hope you won't think too much about this" sleight of hand.

Why does predicting the next token mean that they aren't AGI? Please clarify the exact logical steps there, because I make a similar argument that human brains are merely electrical signals propagating, and not real intelligence, but I never really seem to convince people.

conception17d ago

More take an episode like Loops from Radiolab where a person’s memory resets back to a specific set of inputs/state and pretty responds the same way over and over again - very much like predicting the next token. Almost all human interaction is reflexive not thoughtful. Even now as you read this and process it, there’s not a lot of thought - but a whole lot of prediction and pattern matching going on.

ACCount3717d ago

"Predict next token" describes an interface. That tells you very little of what actually goes on inside the thing.

You can "predict next token" using a human, an LLM, or a Markov chain.

sulam17d ago

Because there are some really fundamental things they cannot do with next token prediction. For instance, their memory is akin to someone who reads the phone book and memorizes the entire thing, but can't tell you what a phone number is for. Moreover, they can mimic semantic knowledge, because they have been trained on that knowledge, but take them out of their training distribution and they get into a "creative story-telling" mode very quickly. They can quote me all the rules of chess, but when it comes to actually making a chess move they break those rules with abandon simply because they didn't actually understand the rules. Chess is instructive in another way, too, in that you can get them to play a pretty solid opening game, maybe 10, 15 moves in, but then they start forgetting pieces, creating board positions that are impossible to reach, etc. They have memorized the forms of a board, know the names of the pieces, but they have no true understanding of what a chess game is. Coding is similar, they're fine when you give them Python or Bash shell scripts to write, they've been heavily trained on those, but ask them to deal with a system that has a non-standard stack and they will go haywire if you let their context get even medium sized. Something else they lack is any kind of learning efficiency as you or I would understand the concept. By this I mean the entire Internet is not sufficient to train today's models, the labs have to synthesize new data for models to train on to get sufficient coverage of a given area they want the model to be knowledgeable about. Continuous learning is a well-known issue as well, they simply don't do it. The labs have created memory, which is just more context engineering, but it's not the same as updating as you interact with them. I could go on.

At the end of the day next token prediction is a sleight of hand. It produces amazingly powerful affects, I agree. You can turn this one magic trick into the illusion of reasoning, but what it's doing is more of a "one thing after another" style story-telling that is fine for a lot of things, but doesn't get to the heart of what intelligence means. If you want to call them intelligent because they can do this stuff, fine, but it's an alien kind of intelligence that is incredibly limited. A dog or a cat actually demonstrate more ability to learn, to contextualize, and to make meaning.

3 more replies

famouswafflesOP17d ago

Next-token prediction is just the training objective. I could describe your reply to me as “next-word prediction” too, since the words necessarily come out one after another. But that framing is trivial. It tells you what the system is being optimized to do, not how it actually does it.

Model training can be summed up as 'This what you have to do (objective), figure it out. Well here's a little skeleton that might help you out (architecture)'.

We spend millions of dollars and months training these frontier models precisely because the training process figures out numerous things we don't know or understand. Every day, Large Language Models, in service of their reply, in service of 'predicting the next token', perform sophisticated internal procedures far more complex than anything any human has come up with or possesses knowledge of. So for someone to say that they 'know how the models work under the hood', well it's all very silly.

heavyset_go17d ago

> Btw, don’t take this reductionist approach as being synonymous with thinking these models aren’t incredibly useful and transformative for multiple industries. They’re a very big deal. But OpenAI shouldn’t give up because Opus 4.whatever is doing better on a bunch of benchmarks that are either saturated or in the training data, or have been RLHF’d to hell and back. This is not AGI.

It's sad that you have to add this postscript lest you be accused of being ignorant or anti-AI because you acknowledge that LLMs are not AGI.

torginus17d ago

If you typed your comment by reading all the others' in the chain, then you responded by typing your response in one go, then you 'just' did next-token prediction based on textual input.

I would still argue that does not prevent you from having intelligence, so that's why this argument is silly.

j / k navigate · click thread line to collapse

0 comments

sulam17d ago

stavros17d ago

Everybody says "but they just predict tokens" as if that's not just "I hope you won't think too much about this" sleight of hand.

conception17d ago

ACCount3717d ago

"Predict next token" describes an interface. That tells you very little of what actually goes on inside the thing.

You can "predict next token" using a human, an LLM, or a Markov chain.

sulam17d ago

3 more replies

famouswafflesOP17d ago

Model training can be summed up as 'This what you have to do (objective), figure it out. Well here's a little skeleton that might help you out (architecture)'.

heavyset_go17d ago

It's sad that you have to add this postscript lest you be accused of being ignorant or anti-AI because you acknowledge that LLMs are not AGI.

torginus17d ago

If you typed your comment by reading all the others' in the chain, then you responded by typing your response in one go, then you 'just' did next-token prediction based on textual input.

I would still argue that does not prevent you from having intelligence, so that's why this argument is silly.

j / k navigate · click thread line to collapse