undefined | Better HN

0 pointsemp173445mo ago0 comments

Maybe LLMs always fail to generalize outside their data set, and it’s just less noticeable with written language.

0 comments

This is it. They’re language models which predict next tokens probabilistically and a sampler picks one according to the desired ”temperature”. Any generalization outside their data set is an artifact of random sampling: happenstance and circumstance, not genuine substance.

cluckindan5mo ago

However: do humans have that genuine substance? Is human invention and ingenuity more than trial and error, more than adaptation and application of existing knowledge? Can humans generalize outside their data set?

A yes-answer here implies belief in some sort of gnostic method of knowledge acquisition. Certainly that comes with a high burden of proof!

sophrosyne425mo ago

Yes. Humans can perform abduction, extrapolating given information to new information. LLMs cannot, they can only interpolate new data based on existing data.

dawidloubser5mo ago

Yes

1 more reply

phire5mo ago

Most image models are diffusion models, not LLMs, and have a bunch of other idiosyncrasies.

So I suspect it's more that lessons from diffusion image models don't carry over to text LLMs.

And the Image models which are based on multi-mode LLMs (like Nano Banana) seem to do a lot better at novel concepts.

Gormo5mo ago

But the clocks in this demo aren't images.

phire5mo ago

Yes, but they are reasoning within their dataset, which will contain multiple example of html+css clocks.

They are just struggling to produce good results because they are language models and don’t have great spatial reasoning skills, because they are language models.

Their output normally has all the elements, just not in the right place/shape/orientation.

IshKebab5mo ago

They definitely don't completely fail to generalise. You can easily prove that by asking them something completely novel.

Do you mean that LLMs might display a similar tendency to modify popular concepts? If so that definitely might be the case and would be fairly easy to test.

Something like "tell me the lord's prayer but it's our mother instead of our father", or maybe "write a haiku but with 5 syllables on every line"?

Let me try those ... nah ChatGPT nailed them both. Feels like it's particular to image generation.

immibis5mo ago

They used to do poorly with modified riddles, but I assume those have been added to their training data now (https://huggingface.co/datasets/marcodsn/altered-riddles ?)

Like, the response to "... The surgeon (who is male and is the boy's father) says: I can't operate on this boy! He's my son! How is this possible?" used to be "The surgeon is the boy's mother"

The response to "... At each door is a guard, each of which always lies. What question should I ask to decide which door to choose?" would be an explanation of how asking the guard what the other guard would say would tell you the opposite of which door you should go through.

j / k navigate · click thread line to collapse

0 comments

cluckindan5mo ago

A yes-answer here implies belief in some sort of gnostic method of knowledge acquisition. Certainly that comes with a high burden of proof!

sophrosyne425mo ago

Yes. Humans can perform abduction, extrapolating given information to new information. LLMs cannot, they can only interpolate new data based on existing data.

dawidloubser5mo ago

Yes

1 more reply

phire5mo ago

Most image models are diffusion models, not LLMs, and have a bunch of other idiosyncrasies.

So I suspect it's more that lessons from diffusion image models don't carry over to text LLMs.

And the Image models which are based on multi-mode LLMs (like Nano Banana) seem to do a lot better at novel concepts.

Gormo5mo ago

But the clocks in this demo aren't images.

phire5mo ago

Yes, but they are reasoning within their dataset, which will contain multiple example of html+css clocks.

They are just struggling to produce good results because they are language models and don’t have great spatial reasoning skills, because they are language models.

Their output normally has all the elements, just not in the right place/shape/orientation.

IshKebab5mo ago

They definitely don't completely fail to generalise. You can easily prove that by asking them something completely novel.

Do you mean that LLMs might display a similar tendency to modify popular concepts? If so that definitely might be the case and would be fairly easy to test.

Something like "tell me the lord's prayer but it's our mother instead of our father", or maybe "write a haiku but with 5 syllables on every line"?

Let me try those ... nah ChatGPT nailed them both. Feels like it's particular to image generation.

immibis5mo ago

They used to do poorly with modified riddles, but I assume those have been added to their training data now (https://huggingface.co/datasets/marcodsn/altered-riddles ?)

Like, the response to "... The surgeon (who is male and is the boy's father) says: I can't operate on this boy! He's my son! How is this possible?" used to be "The surgeon is the boy's mother"

j / k navigate · click thread line to collapse