undefined | Better HN

0 pointsavianlyric1y ago0 comments

I agree that compression is all about increasing entropy per bit, which makes the output of a good compressor highly unpredictable.

But that doesn’t mean the process of compression involves significant amounts of unpredictable branching operations. If for no other reason than it would be extremely slow and inefficient, because many branching operations means you’re either processing input pixel-by-pixel, or your SIMD pipeline is full of dead zones that you can’t actually re-schedule, because it would desync your processing waves.

Video compression is mostly very clever signal processing built on top of primitives like convolutions. You’re taking large blocks of data, and performing uniform mathematical operations over all the data to perform what is effectively statistical analysis of that data. That analysis can then be used to drive a predictor, then you “just” need to XOR the predictor output with the actual data, and record the result (using some kind of variable length encoding scheme that lets you remove most of the unneeded bytes).

But just like computing the median of a large dataset can be done with no branches, regardless of how random or the large the input is. Video compression can also largely be done the same way, and indeed has to be done that way to be performant. There’s no other way to cram up to 4k * 3bytes per frame (~11MB) through a commercial CPU to perform compression at a reasonable speed. You must build your compressor on top of SIMD primitives, which inherently makes branching extremely expensive (many orders of magnitude more expensive than branching SISD operations).

0 comments

astrange1y ago

> You’re taking large blocks of data, and performing uniform mathematical operations over all the data to perform what is effectively statistical analysis of that data.

It doesn't behave this way. If you're thinking of the DCT it uses that's mostly 4x4 which is not very large. As for motion analysis there are so many possible candidates (since it's on quarter-pixels) that it can't try all of them and very quickly starts trying to filter them out.

fc417fc8021y ago

> it uses that's mostly 4x4 which is not very large

That's 16x32 which is AVX512. What other size would you suggest using and (more importantly) what commercially available CPU architecture are you running it on?

astrange1y ago

4x4 is 4x4, not 16x32?

1 more reply

j / k navigate · click thread line to collapse

0 comments

astrange1y ago

> You’re taking large blocks of data, and performing uniform mathematical operations over all the data to perform what is effectively statistical analysis of that data.

fc417fc8021y ago

> it uses that's mostly 4x4 which is not very large

That's 16x32 which is AVX512. What other size would you suggest using and (more importantly) what commercially available CPU architecture are you running it on?

astrange1y ago

4x4 is 4x4, not 16x32?

1 more reply

j / k navigate · click thread line to collapse