There is an excellent talk by Jack Rae called “compression for AGI”, where he shows (what I believe to be) a little known connection between transformers and compression;
In one view, you can view LLMs as SOTA lossless compression algorithms, where the number of weights don’t count towards the description length. Sounds crazy but it’s true.
A transformer that doesn't hallucinate (or knows what is a hallucination) would be the ultimate compression algorithm. But right now that isn't a solved problem, and it leaves the output of LLMs too untrustworthy to use over what are colloquially known as compression algorithms.