undefined | Better HN

0 pointsokdood642y ago0 comments

Can you dumb this down to ELI16 for me? I'm intrigued but am not following.

0 comments

4 comments · 3 top-level

boolemancer2y ago· 1 in thread

There is a good video[1] from the Computerphile channel describing the concept in the context of JPEG.

[1] https://youtu.be/Q2aEzeMDHMA

kfarr2y ago

Thanks that was the first time I almost actually understood DCT, excellent video!

jiggawatts2y ago

Imagine you’re trying to encode the topography of a hilly landscape faithfully.

If you use 8 bits to represent the height you can have 256 distinct levels, which is reasonably smooth — like a Minecraft map.

If you tried to cut this down to just half - 4 bits - then now you would have only 16 levels! This is very ugly, with big blocky staircases instead of nice smooth hills.

So what to do to compress this data better?

One approach is to find the nearest sine waves to the shape of the hills. Use a big sine wave for the big hills and then add on small sine waves to represent the smaller bumps and ridges.

There is a way to do this so that you end up with roughly the same amount of data as the original 8-bit encoding and have the same output.

Now if you throw away half of this sine wave data you still get the original smooth shapes because sine waves are inherently smooth! Instead of turning blocky the map becomes slightly incorrect. Hills stay roughly the same but they shift around and might loose some fine detail.

Essentially, humans are sensitive to staircase compression — even small amounts are very noticeable, but insensitive to the sine wave compression. We can exploit this to squeeze more bits out of the data before this compression becomes visible.

This works for audio, images, and motion.

1 more reply

ReactiveJelly2y ago

(For JPEG - Newer codecs may differ) The codec has these "Basis functions", 64 of them, which are used to encode and decode each 8x8 block of pixels. https://en.wikipedia.org/wiki/Discrete_cosine_transform#/med...

Sine and cosine waves have a property that you can approximate a signal by just taking the dot product with these basis functions to get a list of coefficients, and then you multiply those coefficients with the basis functions to get the original signal back. Not all functions are basis functions.

You can see that the upper-left one is all white, that's the "DC" (Direct Current) basis. As you go right and down, they increase in frequency.

So the encoder gets all the coefficients and then it quantizes the high-frequency ones to save bits. That's why JPEGs often have ringing / rippling artifacts where an edge will be sharp but have waves coming out on either side.

If you quantize the coefficients enough, then some of those bottom-right ones end up quantizing to zero. So JPEG encoders run a lossless compression step on the coefficients to squish all the zeroes and small values together. You can crunch a JPEG smaller by replacing this lossless compression with a newer algorithm.

And the decoder just inflates those coefficients and multiplies them by the same basis functions to get the bitmap back.

There's details I don't understand in the middle like loop filters and de-blocking filters to hide the 8x8 block artifacts, but the heart of it is just "take a dot product with these functions to encode, multiply those dots with the same functions to decode".

j / k navigate · click thread line to collapse