How do you read my question and interpret simply it as "lets send all of the images in full and then give their index and call it compression?"?? What I suggest is that we take a standard encoding technique like Huffman, or some modification, but rather than creating a table based on data in an individual image, build this code table by analyzing many, many images.
I have read the Wikipedia article on Huffman coding before. However, the details are not really important in regards to my point.
What I am suggesting is that rather than looking at just the bits in individual images and using them to construct a Huffman table or some other kind of reference, look at the bits on many, many images and create a larger reference table. And then of course you may need a local table for things in the image that don't quite correspond to the larger table.
Earlier compression techniques were much more constrained in terms of processing power, RAM, network connectivity etc. and so distributing and using a large table for compression was not practical. I am suggesting that someone who has knowledge of compression engineer a system where 10MB, 50MB, or 100MB of RAM is used and a large common bits file is transmitted, rather than starting with the idea that almost all of the data or all of the data has to be contained in one file. I am not suggesting that an existing compression algorithm could be translated directly into this general concept. I am suggesting an engineering effort starting with different constraints and trade-offs.