undefined | Better HN

0 pointsilaksh12y ago0 comments

I am aware that compression algorithms kind of work on this general idea of referencing common bits. Of course I am aware of that.

How do you read my question and interpret simply it as "lets send all of the images in full and then give their index and call it compression?"?? What I suggest is that we take a standard encoding technique like Huffman, or some modification, but rather than creating a table based on data in an individual image, build this code table by analyzing many, many images.

I have read the Wikipedia article on Huffman coding before. However, the details are not really important in regards to my point.

What I am suggesting is that rather than looking at just the bits in individual images and using them to construct a Huffman table or some other kind of reference, look at the bits on many, many images and create a larger reference table. And then of course you may need a local table for things in the image that don't quite correspond to the larger table.

Earlier compression techniques were much more constrained in terms of processing power, RAM, network connectivity etc. and so distributing and using a large table for compression was not practical. I am suggesting that someone who has knowledge of compression engineer a system where 10MB, 50MB, or 100MB of RAM is used and a large common bits file is transmitted, rather than starting with the idea that almost all of the data or all of the data has to be contained in one file. I am not suggesting that an existing compression algorithm could be translated directly into this general concept. I am suggesting an engineering effort starting with different constraints and trade-offs.

0 comments

3 comments · 2 top-level

pit12y ago· 1 in thread

I thought about doing something like that for audio, but I think it gets really inefficient with higher sample rates because you end up with so little overlap.

Like this: imagine I've got an array of shorts representing audio data. If I've got two files with similar segments, I've saved one short:

[1, 2, 3, 2, 1, 2, 5] [7, 2, 3, 6, 8, 9, 3]

So I can say that [2, 3] is represented by a new value (z), and can shave a short off of both streams. Then what happens if a new stream comes along with no similarities:

[8, 2, 7, 1, 7, 3, 7]

...you still have to send each value.

Maybe I've just demonstrated that I don't know anything about compression, but I would be interested in working with you on this.

ilakshOP12y ago

I think I would want to find an existing sparse autoencoder implementation, ideally a project already setup for encoding audio, and start from there. http://www.stanford.edu/class/cs294a/sparseAutoencoder.pdf

phillmv12y ago

*shrug

Lots of different people on this forum, no offense was intended, and like all nerds I get excited when I get to share knowledge.

>How do you read my question and interpret simply it as "lets send all of the images in full and then give their index and call it compression?"??

Because it struck me as analogous, and yeah I'd call that compression - the message length for one is immensely improved.

Okay, so here's my admittedly piss poor understanding of most compression: you either find more intelligent ways to strip bits from the source in ways the consumer won't mind, or you find more intelligent ways to build reference tables given your problem domain.

I'm sure you know this, but the first one is why jpeg/various mpegs are successful: they have complex quantization models that eliminate gradients we won't notice or frequencies we can't hear.

The way you achieve better results is through building better models for how information in your problem domain is related. If we're compressing text and we know the language we can start referencing letter frequencies and index along that and so on.

The way this works in video to my knowledge is, amongst many other complicated things, they take NxN blocks of images and store only the deltas between Y numbers of frames.

So - perhaps you "image reference blob" could build a reference table for all 16x16 px blocks and transmit only the indices for them, and we're back at my original comment. But to my (again, please correct me) understanding those are kind of the only alternatives? Encoding and decoding is an interesting topic.

I very infrequently have to think in binary and I had perhaps too much fun counting to 10,000 on my fingers; my undergraduate was a long time ago and my knowledge on the topic sparse, so I'm interested in hearing more about it.

j / k navigate · click thread line to collapse