undefined | Better HN

0 pointscrystal_revenge1y ago0 comments

> They are in no way compression algorithms.

I'm sorry, but this a fundamentally incorrect view of machine learning (including, but not limited to transformers).

From an information theoretic perspective the two are essentially identical with the exception that standard compression algorithms do not have a proper "loss" function other than just trying to minimize reconstruction loss with the resulting compression size.

Here's a link to the section on the Wikipedia for more information if you'd like [0]. MacKay's Information Theory, Inference and Learning Algorithms is the standard full text treatment of this topic [1]. Ted Chiang's article "ChatGPT is a Blurry JPEG of the web" is pretty good "pop sci" exploration of this topic if you don't want to get too into the mathematics [2].

0. https://en.wikipedia.org/wiki/Data_compression#Machine_learn...

1. https://www.inference.org.uk/itprnn/book.pdf

2. https://www.newyorker.com/tech/annals-of-technology/chatgpt-...

0 comments

Workaccount21y ago

>They can be modeled like that in the same way you can model humans as lossy compression algorithms

Humans are totally capable of data compression. This will just devolved into a semantics game of what a data compressor is.

LLMs were not developed to be, do not function as, and are not use as data compression utilities. Please, come knocking when a service provider exists that will use LLM's to compactly store your company data.

zerd1y ago

> It has long been established that predictive models can be transformed into lossless compressors and vice versa. Incidentally, in recent years, the machine learning community has focused on training increasingly large and powerful self-supervised (language) models. Since these large language models exhibit impressive predictive capabilities, they are well-positioned to be strong compressors. In this work, we advocate for viewing the prediction problem through the lens of compression and evaluate the compression capabilities of large (foundation) models. We show that large language models are powerful general-purpose predictors and that the compression viewpoint provides novel insights into scaling laws, tokenization, and in-context learning. For example, Chinchilla 70B, while trained primarily on text, compresses ImageNet patches to 43.4% and LibriSpeech samples to 16.4% of their raw size, beating domain-specific compressors like PNG (58.5%) or FLAC (30.3%), respectively.

https://arxiv.org/pdf/2309.10668

Transformers are also used in the top algorithm right now on the Large Text Compression Benchmark. https://bellard.org/nncp/nncp.pdf

crystal_revengeOP1y ago

> LLMs were not developed to be, do not function as, and are not use as data compression utilities.

Again, from a information theoretic view point, this is exactly what they are doing, how they where developed and how they function.

I don't know any serious researcher in ML that would find this claim even remotely controversial. It's really not just "a semantics game", its a part of a foundational understanding of the topic. If you want to understand LLMs from this perspective, a good place to start is with an auto-encoder which does try to learn a standard compression algorithm, the move on to more sophisticated embedding models (found in a lot of recommender systems) which try to learn an additional objective on top of minimizing reconstruction error. You'll then see that Transformers and all other major NN architectures fall out of these basic principles.

> Please, come knocking when a service provider exists that will use LLM's to compactly store your company data.

This is literally what every vectordb company does right now, as well as all "chat with your docs" type startups.

Workaccount21y ago

VectorDBs are not LLMs or SQL replacements and RAG is not data compression. Again this is just going to dwindle into semantics and where one draws boundaries. I can randomly remove bits from my HDD and call it compression. If you think humans are data compressors then I have no argument.

Can you get transformers to regurgitate information verbatim? Yes.

Would anyone in their right mind rely on a transformer to do so? No.

Would anyone in their right mind rely on a vectorDB to do so? No.

Would anyone in their right mind use a vectorDB/RAG/SQL/transformer combo to do so? Yes.

Is youtube going to drop VP9 for GeminiEncode to save google billions in bandwidth? No.

j / k navigate · click thread line to collapse

0 comments

Workaccount21y ago

>They can be modeled like that in the same way you can model humans as lossy compression algorithms

Humans are totally capable of data compression. This will just devolved into a semantics game of what a data compressor is.

zerd1y ago

https://arxiv.org/pdf/2309.10668

Transformers are also used in the top algorithm right now on the Large Text Compression Benchmark. https://bellard.org/nncp/nncp.pdf

crystal_revengeOP1y ago

> LLMs were not developed to be, do not function as, and are not use as data compression utilities.

Again, from a information theoretic view point, this is exactly what they are doing, how they where developed and how they function.

> Please, come knocking when a service provider exists that will use LLM's to compactly store your company data.

This is literally what every vectordb company does right now, as well as all "chat with your docs" type startups.

Workaccount21y ago

Can you get transformers to regurgitate information verbatim? Yes.

Would anyone in their right mind rely on a transformer to do so? No.

Would anyone in their right mind rely on a vectorDB to do so? No.

Would anyone in their right mind use a vectorDB/RAG/SQL/transformer combo to do so? Yes.

Is youtube going to drop VP9 for GeminiEncode to save google billions in bandwidth? No.

j / k navigate · click thread line to collapse