undefined | Better HN

0 pointsMR4D1y ago0 comments

I like the analogy of compression, in that a distilled model of an LLM is like a JPEG of a photo. Pretty good, maybe very good, but still lossy.

The question I hear you raising seems to be along the lines of, can we use a new compression method to get better resolution (reproducibility of the original) in a much smaller size.

0 comments

9 comments · 4 top-level

umeshunni1y ago· 5 in thread

> in that a distilled model of an LLM is like a JPEG of a photo

That's an interesting analogy, because I've always thought of the hidden states (and weights and biases) of an LLMs as a compressed version of the training data.

timschmidt1y ago

And what is compression but finding the minimum amount of information required to reproduce a phenomena? I.e. discovering natural laws.

t_mann1y ago

Finding minimum complexity explanations isn't what finding natural laws is about, I'd say. It's considered good practice (Occam's razor), but it's often not really clear what the minimal model is, especially when a theory is relatively new. That doesn't prevent it from being a natural law, the key criterion is predictability of natural phenomena, imho. To give an example, one could argue that Lagrangian mechanics requires a smaller set of first principles than Newtonian, but Newton's laws are still very much considered natural laws.

1 more reply

homarp1y ago

hence https://www.newyorker.com/tech/annals-of-technology/chatgpt-... (by Ted Chiang)

(discussed here: https://news.ycombinator.com/item?id=34724477 )

kedarkhand1y ago

Well, JPEG can be thought of as an compression of the natural world of whose photograph was taken

bloomingkales1y ago

And we can answer the question why quantization works with a lossy format, since quantization just drops accuracy for space but still gives us a good enough output, just like a lossy jpeg.

Reiterating again, we can lose a lot of data (have incomplete data) and have a perfectly visible jpeg (or MP3, same thing).

cmgriffing1y ago

This brings up an interesting thought too. A photo is just a lossy representation of the real world.

So it's lossy all the way down with LLMs, too.

Reality > Data created by a human > LLM > Distilled LLM

ziofill1y ago

What you say makes sense, but is there the possibility that because it’s compressed it can generalize more? In the spirit of bias/variance.

fennecfoxy1y ago

Yeah but it does seem that they're getting high % numbers for the distilled models accuracy against the larger model. If the smaller model is 90% as accurate as the larger, but uses much < 90% of the parameters, then surely that counts as a win.

j / k navigate · click thread line to collapse