undefined | Better HN

0 pointsumeshunni1y ago0 comments

> in that a distilled model of an LLM is like a JPEG of a photo

That's an interesting analogy, because I've always thought of the hidden states (and weights and biases) of an LLMs as a compressed version of the training data.

0 comments

timschmidt1y ago

And what is compression but finding the minimum amount of information required to reproduce a phenomena? I.e. discovering natural laws.

t_mann1y ago

Finding minimum complexity explanations isn't what finding natural laws is about, I'd say. It's considered good practice (Occam's razor), but it's often not really clear what the minimal model is, especially when a theory is relatively new. That doesn't prevent it from being a natural law, the key criterion is predictability of natural phenomena, imho. To give an example, one could argue that Lagrangian mechanics requires a smaller set of first principles than Newtonian, but Newton's laws are still very much considered natural laws.

timschmidt1y ago

Maybe I'm just a filthy computationalist, but the way I see it, the most accurate model of the universe is the one which makes the most accurate predictions with the fewest parameters.

The Newtonian model makes provably less accurate predictions than Einsteinian (yes, I'm using a different example), so while still useful in many contexts where accuracy is less important, the number of parameters it requires doesn't much matter when looking for the one true GUT.

My understanding, again as a filthy computationalist, is that an accurate model of the real bonafide underlying architecture of the universe will be the simplest possible way to accurately predict anything. With the word "accurately" doing all the lifting.

As always: https://www.sas.upenn.edu/~dbalmer/eportfolio/Nature%20of%20...

I'm sure there are decreasingly accurate, but still useful, models all the way up the computational complexity hierarchy. Lossy compression is, precisely, using one of them.

t_mann1y ago

The thing is, Lagrangian mechanics makes exactly the same predictions as Newtownian, and it starts from a foundation of just one principle (least action) instead of three laws, so it's arguably a sparser theory. It just makes calculations easier, especially for more complex systems, that's its raison d'être. So in a world where we don't know about relativity yet, both make the best predictions we know (and they always agree), but Newton's laws were discovered earlier. Do they suddenly stop being natural laws once Lagrangian mechanics is discovered? Standard physics curricula would not agree with you btw, they practically always teach Newtownian mechanics first and Lagrangian later, also because the latter is mathematically more involved.

3 more replies

homarp1y ago

hence https://www.newyorker.com/tech/annals-of-technology/chatgpt-... (by Ted Chiang)

(discussed here: https://news.ycombinator.com/item?id=34724477 )

kedarkhand1y ago

Well, JPEG can be thought of as an compression of the natural world of whose photograph was taken

bloomingkales1y ago

And we can answer the question why quantization works with a lossy format, since quantization just drops accuracy for space but still gives us a good enough output, just like a lossy jpeg.

Reiterating again, we can lose a lot of data (have incomplete data) and have a perfectly visible jpeg (or MP3, same thing).

j / k navigate · click thread line to collapse

0 comments

timschmidt1y ago

And what is compression but finding the minimum amount of information required to reproduce a phenomena? I.e. discovering natural laws.

t_mann1y ago

timschmidt1y ago

Maybe I'm just a filthy computationalist, but the way I see it, the most accurate model of the universe is the one which makes the most accurate predictions with the fewest parameters.

As always: https://www.sas.upenn.edu/~dbalmer/eportfolio/Nature%20of%20...

I'm sure there are decreasingly accurate, but still useful, models all the way up the computational complexity hierarchy. Lossy compression is, precisely, using one of them.

t_mann1y ago

3 more replies

homarp1y ago

hence https://www.newyorker.com/tech/annals-of-technology/chatgpt-... (by Ted Chiang)

(discussed here: https://news.ycombinator.com/item?id=34724477 )

kedarkhand1y ago

Well, JPEG can be thought of as an compression of the natural world of whose photograph was taken

bloomingkales1y ago

And we can answer the question why quantization works with a lossy format, since quantization just drops accuracy for space but still gives us a good enough output, just like a lossy jpeg.

Reiterating again, we can lose a lot of data (have incomplete data) and have a perfectly visible jpeg (or MP3, same thing).

j / k navigate · click thread line to collapse