Simple Explanation of LLMs (opens in new tab)

(blog.oedemis.io)

94 pointsoedemis1y ago31 comments

31 comments

20 comments · 8 top-level

A_D_E_P_T1y ago· 5 in thread

It's all prediction. Wolfram has been saying this from the beginning, I think. It hasn't changed and it won't change.

But it could be argued that the human mind is fundamentally similar. That consciousness is the combination of a spatial-temporal sense with a future-oriented simulating function. Generally, instead of simulating words or tokens, the biological mind simulates physical concepts. (Needless to say, if you imagine and visualize a ball thrown through the air, you have simulated a physical and mathematical concept.) One's ability to internally form a representation of the world and one's place in it, coupled with a subjective and bounded idea of self in objective space and time, results in what is effectively a general predictive function which is capable of broad abstraction.

A large facet of what's called "intelligence" -- perhaps the largest facet -- is the strength and extensibility of the predictive function.

I really need to finish my book on this...

mdp20211y ago

With the critical difference that predicting facts and predicting verisimility are massively different operations.

A_D_E_P_T1y ago

I don't think that anybody predicts "facts" -- there are no oracles, and if you predict a physical concept, it's very easy to get things wrong. Outcomes are, in some cases, almost statistical.

(A physical concept could be something as simple as how to catch a frisbee, or, alternatively, imagine a cat trying to predict how best to swipe at a fleeing mouse. If the mouse zigs when it could have zagged, the cat, for all its well-honed instincts, may miss. It may have predicted wrongly.)

Predicting tokens is really quite similar. I really think that it's the same type of thing.

Getting facts right is a matter of error correction and knowledgebase utilization, which is why "reasoning models" with error correction layers and RAG are so good.

1 more reply

nurettin1y ago

> Wolfram has been saying this from the beginning, I think.

Wolfram has been distinguishing between probabilistic output and deterministic output from a neural network since the beginning? Trying to monopolize on such basic concepts doesn't make much sense. It's like saying he has been thinking of sporks since the beginning.

Delomomonl1y ago

Besides that I don't think that the prediction thing is a bad thing, there should be an argument that depending on the architecture there can be a self discovery of rules though compression.

The compression leads to rules which could feel like understanding.

People say 'ah it's just a parrot repeating statically most common words' like this alone makes it unimpressive, which it doesn't. Not when an LLM responds to you like it does

If that basic thing talks like a human, why would be a human be something different?

Intelligence isn't that also correlated with speed of connections? At least when you do an IQ test, speed is factored in.

mdp20211y ago

> If that basic thing talks like a human, why would be a human be something different?

Because properly intelligent humans actually think instead of being thinking simulators, as is apparent from the quality of the LLM outputs.

> parrot ... like this alone makes it unimpressive

"What could possibly go wrong".

1 more reply

hegx1y ago· 4 in thread

Warning: these "fundamentals" will become obsolete faster than you can wrap your head around them.

raincole1y ago

They really don't tho. Transformer was invented in 2017.

bicepjai1y ago

It’s been only 7 years :)

amelius1y ago

It would be nice if there was a place (e.g. github repo) that tracked the best resources for learning this stuff.

esafak1y ago

https://news.ycombinator.com/item?id=43241826

DebtDeflation1y ago· 2 in thread

Would love to see a similar explanation of how "reasoning" versions of LLMs are trained. I understand that OpenAI was mum about how they specifically trained o1/o3 and that people are having to reverse engineer from the DeepSeek paper which may or may not be a different approach, but would like to see a coherent explanation which is not just an regurgitation of Chain of Thought or handwavy "special reasoning tokens give the model more time to think".

Philpax1y ago

This may be useful: https://www.interconnects.ai/p/deepseek-r1-recipe-for-o1

but the tl;dr of the idea is that we can use reinforcement learning on a strong base model (i.e. one that hasn't been fine tuned) to elicit the generation of tokens that help the model reach a result that can be verified to be correct. That is, if we have a way of verifying that a specific output is correct, the model can be trained to consistently produce tokens that will lead to that result for a given input, and that this facility generalises the more problems you train it on.

There are some more nuances (the Interconnects article goes into that), but that's the fundamental idea of Reinforcement Learning from Verifiable Rewards.

UltraSane1y ago

This paper [1] even claims that "models primed with incorrect solutions containing proper reasoning patterns achieve comparable performance to those trained on correct solutions."

[1] https://arxiv.org/abs/2503.01307

oedemisOP1y ago· 1 in thread

Hello, tried to explain Large Language Models with some visualizations, especially the attention mechanism.

itronitron1y ago

You should probably mention that embeddings are just a renaming of text vectors, aka vector space model, which have probably been used since before neural networks.

antonkar1y ago

Here’s an interpretability idea you may find interesting:

Let's Turn AI Model Into a Place. The project to make AI interpretability research fun and widespread, by converting a multimodal language model into a place or a game like the Sims or GTA.

Imagine that you have a giant trash pile, how to make a language model out of it? First you remove duplicates of every item, you don't need a million banana peels, just one will suffice. Now you have a grid with each item of trash in each square, like a banana peel in one, a broken chair in another. Now you need to put related things close together and draw arrows between related items.

When a person "prompts" this place AI, the player themself runs from one item to another to compute the answer to the prompt.

For example, you stand near the monkey, it’s your short prompt, you see around you a lot of items and arrows towards those items, the closest item is chewing lips, so you step towards them, now your prompt is “monkey chews”, the next closest item is a banana, but there are a lot of other possibilities around, like an apple a bit farther away and an old tire far away on the horizon (monkeys rarely chew tires, so the tire is far away).

You are the time-like chooser and the language model is the space-like library, the game, the place. It’s static and safe, while you’re dynamic and dangerous.

rco87861y ago

I'm not sure if I would call this "simple" but I appreciated the walk through. I understood a lot of it at a high level before reading, and this helped solidify my understanding a bit more. Though it also serves to highlight just how complex LLMs actually are.

noodletheworld1y ago

While I appreciate the pictures, really at the end of the day all you have is a glossary and slightly more detailed arbitrary hand waving.

What specific architecture is used to build a basic model?

Why is that specific combination of basic building blocks used?

Why does it work when other similar ones don’t?

I generally approve of simplifications, but these LLM simplifications are too vague and broad to be useful or meaningful.

Here my challenge: take that article and write an LLM.

No?

How about an article on raytracing?

Anyone can do a raytracer in a weekend.

Why is building an LLM miles of explanation of concepts and nothing concrete you can actually build?

Where’s my “LLM in a weekend” that covers the theory and how to actually implement one?

The distinction between this and something like https://github.com/rasbt/LLMs-from-scratch is stark.

My hot take is, if you haven’t built one, you don’t actually understand how they work, you just have a kind of vague kind-of-heard of it understanding, which is not the same thing.

…maybe that’s harsh, and unfair. I’ll take it, maybe it is; but I’ve seen a lot of LLM explanations that conveniently stop before they get to the hard part of “and how do you actually do it?”, and another one? Eh.

betto1y ago

Why don't you come on my podcast to explain LLMs? I would love it.

https://www.youtube.com/@CouchX-SoftwareTechexplain-k9v

j / k navigate · click thread line to collapse

31 comments

20 comments · 8 top-level

A_D_E_P_T1y ago· 5 in thread

It's all prediction. Wolfram has been saying this from the beginning, I think. It hasn't changed and it won't change.

A large facet of what's called "intelligence" -- perhaps the largest facet -- is the strength and extensibility of the predictive function.

I really need to finish my book on this...

mdp20211y ago

With the critical difference that predicting facts and predicting verisimility are massively different operations.

A_D_E_P_T1y ago

I don't think that anybody predicts "facts" -- there are no oracles, and if you predict a physical concept, it's very easy to get things wrong. Outcomes are, in some cases, almost statistical.

Predicting tokens is really quite similar. I really think that it's the same type of thing.

Getting facts right is a matter of error correction and knowledgebase utilization, which is why "reasoning models" with error correction layers and RAG are so good.

1 more reply

nurettin1y ago

> Wolfram has been saying this from the beginning, I think.

Delomomonl1y ago

Besides that I don't think that the prediction thing is a bad thing, there should be an argument that depending on the architecture there can be a self discovery of rules though compression.

The compression leads to rules which could feel like understanding.

People say 'ah it's just a parrot repeating statically most common words' like this alone makes it unimpressive, which it doesn't. Not when an LLM responds to you like it does

If that basic thing talks like a human, why would be a human be something different?

Intelligence isn't that also correlated with speed of connections? At least when you do an IQ test, speed is factored in.

mdp20211y ago

> If that basic thing talks like a human, why would be a human be something different?

Because properly intelligent humans actually think instead of being thinking simulators, as is apparent from the quality of the LLM outputs.

> parrot ... like this alone makes it unimpressive

"What could possibly go wrong".

1 more reply

hegx1y ago· 4 in thread

Warning: these "fundamentals" will become obsolete faster than you can wrap your head around them.

raincole1y ago

They really don't tho. Transformer was invented in 2017.

bicepjai1y ago

It’s been only 7 years :)

amelius1y ago

It would be nice if there was a place (e.g. github repo) that tracked the best resources for learning this stuff.

esafak1y ago

https://news.ycombinator.com/item?id=43241826

DebtDeflation1y ago· 2 in thread

Philpax1y ago

This may be useful: https://www.interconnects.ai/p/deepseek-r1-recipe-for-o1

There are some more nuances (the Interconnects article goes into that), but that's the fundamental idea of Reinforcement Learning from Verifiable Rewards.

UltraSane1y ago

This paper [1] even claims that "models primed with incorrect solutions containing proper reasoning patterns achieve comparable performance to those trained on correct solutions."

[1] https://arxiv.org/abs/2503.01307

oedemisOP1y ago· 1 in thread

Hello, tried to explain Large Language Models with some visualizations, especially the attention mechanism.

itronitron1y ago

You should probably mention that embeddings are just a renaming of text vectors, aka vector space model, which have probably been used since before neural networks.

antonkar1y ago

Here’s an interpretability idea you may find interesting:

Let's Turn AI Model Into a Place. The project to make AI interpretability research fun and widespread, by converting a multimodal language model into a place or a game like the Sims or GTA.

When a person "prompts" this place AI, the player themself runs from one item to another to compute the answer to the prompt.

You are the time-like chooser and the language model is the space-like library, the game, the place. It’s static and safe, while you’re dynamic and dangerous.

rco87861y ago

noodletheworld1y ago

While I appreciate the pictures, really at the end of the day all you have is a glossary and slightly more detailed arbitrary hand waving.

What specific architecture is used to build a basic model?

Why is that specific combination of basic building blocks used?

Why does it work when other similar ones don’t?

I generally approve of simplifications, but these LLM simplifications are too vague and broad to be useful or meaningful.

Here my challenge: take that article and write an LLM.

No?

How about an article on raytracing?

Anyone can do a raytracer in a weekend.

Why is building an LLM miles of explanation of concepts and nothing concrete you can actually build?

Where’s my “LLM in a weekend” that covers the theory and how to actually implement one?

The distinction between this and something like https://github.com/rasbt/LLMs-from-scratch is stark.

My hot take is, if you haven’t built one, you don’t actually understand how they work, you just have a kind of vague kind-of-heard of it understanding, which is not the same thing.

betto1y ago

Why don't you come on my podcast to explain LLMs? I would love it.

https://www.youtube.com/@CouchX-SoftwareTechexplain-k9v

j / k navigate · click thread line to collapse