Extreme video compression with prediction using pre-trainded diffusion models (opens in new tab)

(github.com)

144 pointsjohn_g2y ago88 comments

88 comments

57 comments · 13 top-level

Animats2y ago· 23 in thread

Extreme compression will be when you put in a movie and get a SORA prompt back that regenerates something close enough to the movie.

aqme282y ago

Where’s that quote? Something like “AI is just compression, and compression is indistinguishable from AI”

daveguy2y ago

I'm not sure the quote, but you're probably thinking something related to the Hutter Prize:

https://en.m.wikipedia.org/wiki/Hutter_Prize

A lossless compression contest to encourage research in AI. It's lossless, I think just to standardize scoring, but I always thought a lossy version would be better for AI -- our memories are definitely lossy!

1 more reply

visarga2y ago

> AI is just compression, and compression is indistinguishable from AI

Almost. Compression and AI both revolve around information processing, but their core objectives diverge. Compression is focused on efficient representation, while AI is built for flexibility and the ability to navigate the unpredictable aspects of real-world data.

Compression learns a representation from the same data it encodes, like "testing on the training set". AI models have different training and test data. There are no surprises in compression.

3 more replies

pyinstallwoes2y ago

Intelligence is compressing information into irreducible representation.

5kg2y ago

Language Modeling Is Compression: https://arxiv.org/abs/2309.10668

sandkoan2y ago

Ilya says this here: https://www.youtube.com/watch?v=AKMuA_TVz3A

Rygian2y ago

How does that make sense? Compression is deterministic (for same prompt, same output is algorithmically guaranteed). AI is only deterministic in corner cases.

1 more reply

kzrdude2y ago

The compression competitions include the decompression program size in the size of the output. Must be a large series of movies compressed to win, then.

4gotunameagain2y ago

If one model can "compress"/"decompress" all movies and series, its fraction of the size becomes negligible, but yes I agree with you since it still has to be distributed.

squokko2y ago

I can imagine that in under 5 years, the movie's script plus one example still photo for each scene could do the job.

bsenftner2y ago

If we get anywhere close to that, coming up with a new economics model is going to be the prompt we'll be giving the AGI when it's ready. We'll need it.

ben_w2y ago

Ah, so you have long timelines then? :P

_the_inflator2y ago

In what sense? How about reproducibility? Is stored memory, the connection between the prompt and the exact output, really compression or simply a retrieval of a compressed file then stored as factual knowledge ingrained in its Neural Network?

I like your sentiment, it is technically inspiring.

cfn2y ago

Given the same prompt and the same seed (and algorithm) the resulting movie/output will always be the same. This is the case for AI image generation now.

polemic2y ago

How big is the SORA model itself?

orblivion2y ago

I can show you an algorithm that compresses an entire 2 hour movie to a single bit, but it only works on one movie.

2 more replies

jasonjmcghee2y ago

You only need one copy of it - even if it is 100GB. If it's baked into every OS... and storage / RAM keeps getting cheaper... just might work.

1 more reply

est2y ago

it depends which resolution you want to see.

jebarker2y ago

This isn't really any different to what they've done is it?

IgorPartola2y ago

“Alexa show me Star Wars but with Dustin Hoffman as Luke”.

gedy2y ago

I actually would really like this flexibility. "Star Wars, but in Korean with k-pop stars cast", etc.

3 more replies

EGreg2y ago

“I’m sorry Dave. I can’t do that. As an Amazon Large Langauge model, I need you to up your subscription to Amazon Prime first.”

“On the other hand, I can generate endless amounts of Harlan Coben miniseries… :-P”

2 more replies

gcanyon2y ago

Ha, I've commented almost exactly this twice now on HN. We'll see how long before it's a reality -- probably better measured in months rather than years.

ToJans2y ago· 3 in thread

Ahhh, Sloot's digital coding system [1] is finally here ;).

[1] https://en.m.wikipedia.org/wiki/Sloot_Digital_Coding_System

CamperBob22y ago

In the [Sloot Digital Coding System], it is claimed that no movies are stored, only basic building blocks of movies, such as colours and sounds. So, when a number is presented to the SDCS, it uses the number to fetch colours and sounds, and constructs a movie out of them. Any movie. No two different movies can have the same number, otherwise they would be the same movie. Every possible movie gets its own unique number. Therefore, I should be able to generate any possible movie by loading some unique number in the SDCS.

Guy named Borges already patented that, I'm afraid.

numlock862y ago

You just need an index and length within Pi's digits, duh.

didntcheck2y ago

It sounds almost like someone explained content-addressed storage to him and he misunderstood (where you can uniquely identify a movie by number, down to some hopefully a negligible collision likelihood, but you're merely indexing known data)

IshKebab2y ago· 3 in thread

> It can be observed that our model outperforms them at low bitrates

It can? Maybe I'm misunderstanding the graphs but it doesn't look like it to me?

astrange2y ago

Graphs (especially PSNR) aren't a good way to judge video compression. It's better to just watch the video.

Many older/commercial video codecs optimized for PSNR, which results in the output being blurry and textureless because that's the best way to minimize rate for the same PSNR.

userbinator2y ago

Many older/commercial video codecs optimized for PSNR, which results in the output being blurry and textureless because that's the best way to minimize rate for the same PSNR.

Even with that, showing H.265 having lower PSNR than H.264 is odd --- it's the former which has often looked blurrier to me.

1 more reply

ThisIsMyAltAcct2y ago

Someone should train a model to evaluate video compression quality

2 more replies

holoduke2y ago· 3 in thread

Back in 2005 there was a collegue at my first job writing video format converters software. He was considered a genius and the stereo type of an introvert software developer. He claimed that one day an entire movie could be compressesed on a single floppydisk. Everybody laughed and thought he was weird. He might be right after all.

HarHarVeryFunny2y ago

Well, as a reality check, even the soundtrack of a 1hr movie would be 50x floppy size (~50MB vs 1MB) if MP3 compressed.

I guess where this sort of generative video "compression" is headed is that the video would be the prompt, and you'd need a 100GB decoder (model) to render it.

No doubt one could fit a prompt to generate a movie similar to something specific in a floppy size ("dude gets stuck on mars, grows potatoes in his own shit"). However, 1MB is only enough to hold the words of a book, and one could imagine 100's of movie adaptations (i.e. visualizing the "prompt") of any given book that would all be radically different, so it seems a prompt of this size would only be enough to generate one of these "prompt movie adaptations".

fasa992y ago

I used to work with a guy like that in 1997, during the bubble, Higgins was his name. He'd claim you could fit every movie ever onto a CD-ROM, at least one day in the future it would be possible. Higgins was weird. I can still recall old Higgins getting out every morning and nailing a fresh load of tadpoles to that old board of his. Then he'd spin it round and round, like a wheel of fortune, and no matter where it stopped he'd yell out, "Tadpoles! Tadpoles is a winner!" We all thought he was crazy but then we had some growing up to do.

djmips2y ago

A deep thought.

resolutebat2y ago· 3 in thread

Here's the research behind this: https://arxiv.org/html/2402.08934v1

As a casual non-scholar, non-AI person trying to parse this though, it's infuriatingly convoluted. I was expecting a table of "given source file X, we got file size Y with quality loss Z", but while quality (SSIM/LPIPS) is compared to standard codecs like H.264, for the life of me I can't find any measure of how efficient the compression is here.

Applying AI to image compression has been tried before though, with distinctly mediocre results: some may recall the Xerox debacle about 10 years, when it turned out copiers were helpfully "optimizing" images by replacing digits with others in invoices, architectural drawings, etc.

https://www.theverge.com/2013/8/6/4594482/xerox-copiers-rand...

lifthrasiir2y ago

> [S]ome may recall the Xerox debacle about 10 years, when it turned out copiers were helpfully "optimizing" images by replacing digits with others in invoices, architectural drawings, etc.

This is not even AI. JBIG2 allows a reuse of once-decoded image patches because it's quite reasonable for bi-level images like fax documents. It is true that similar glyphs may be incorrectly groupped into the same patch, but such error is not specific to patch-based compression methods (quantization can often lead to the same result). The actual culprit was Xerox's bad implementation of JBIG2 that incorrectly merged too many glyphs into the same patch.

daemonologist2y ago

I believe they're using "bpp" (bits per pixel) to indicate compression efficiency, and in the section about quality they're holding it constant at 0.06 bpp. The charts a bit further down give quality metrics as a function of compression level (however, they seem to indicate that h.264 is outperforming h.265 in their tests which would be surprising to me).

sdenton42y ago

It turns out that compression, especially for media platform, is trading off file size, quality, and compute. (And typically we care more about compute for decoding.) This is hard to represent in a two dimensional chart.

Furthermore, it's pretty common in compression research to focus on the size/quality trade-off, and leave optimization of compute for real-world implementations.

zaptrem2y ago· 3 in thread

Can you share example videos?

yonixw2y ago

Googling gave me the article: https://www.arxiv.org/abs/2402.08934

Which have examples in it.

resolutebat2y ago

Direct link to HTML article: https://arxiv.org/html/2402.08934v1

Unfortunately it only contains still images with teeny thumbnails: https://arxiv.org/html/2402.08934v1/x2.png

az2262y ago

Images are hard to evaluate the quality of a video compression. Because it’s diffusing, will it have a bunch of diffuse-jitter.

userbinator2y ago· 2 in thread

How fast is this and how big is the decoder/encoder? The model weights are not accessible.

From the description, it looks like it's only being tested with 128x128 frames, which implies that the speed is very low.

newaccount7g2y ago

Why would you expect those kind of details in a paid commercial?

resolutebat2y ago

It's a link to a Github repo, not a "paid commercial".

sbalamurugan2y ago· 2 in thread

It’s uncanny how much of the current stuff has been predicted by the sitcom -“Silicon Valley”

iwontberude2y ago

Yeah curious to hear what the Weissman score of this latest algorithm is going to be.

xxs2y ago

that prize goes to zstandard, though.

LeoPanthera2y ago· 2 in thread

It's important to remember that any compression gains must include the size of the decompressor which, I assume, will include an enormous diffusion model.

ec1096852y ago

Can’t that be amortized across all videos (e.g. if YouTube had a decompressor they downloaded once)?

LeoPanthera2y ago

Yes, absolutely, it's just important to keep in mind when thinking of these decompressors as "magic". If every laptop shipped with a copy of Wikipedia, then you could compress Wikipedia, and any text that looks similar to Wikipedia, really well.

smerik2y ago

Does anyone remember the https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System?

hulitu2y ago

> Extreme video compression with prediction using pre-trainded diffusion models

Is this more extreme than youtube ?

mjevans2y ago

I wonder how effective a speed focused variation could be for quality among 264, 265, and AV1.

hoseja2y ago

Middle-out.

j / k navigate · click thread line to collapse

88 comments

57 comments · 13 top-level

Animats2y ago· 23 in thread

Extreme compression will be when you put in a movie and get a SORA prompt back that regenerates something close enough to the movie.

aqme282y ago

Where’s that quote? Something like “AI is just compression, and compression is indistinguishable from AI”

daveguy2y ago

I'm not sure the quote, but you're probably thinking something related to the Hutter Prize:

https://en.m.wikipedia.org/wiki/Hutter_Prize

1 more reply

visarga2y ago

> AI is just compression, and compression is indistinguishable from AI

Compression learns a representation from the same data it encodes, like "testing on the training set". AI models have different training and test data. There are no surprises in compression.

3 more replies

pyinstallwoes2y ago

Intelligence is compressing information into irreducible representation.

5kg2y ago

Language Modeling Is Compression: https://arxiv.org/abs/2309.10668

sandkoan2y ago

Ilya says this here: https://www.youtube.com/watch?v=AKMuA_TVz3A

Rygian2y ago

How does that make sense? Compression is deterministic (for same prompt, same output is algorithmically guaranteed). AI is only deterministic in corner cases.

1 more reply

kzrdude2y ago

The compression competitions include the decompression program size in the size of the output. Must be a large series of movies compressed to win, then.

4gotunameagain2y ago

If one model can "compress"/"decompress" all movies and series, its fraction of the size becomes negligible, but yes I agree with you since it still has to be distributed.

squokko2y ago

I can imagine that in under 5 years, the movie's script plus one example still photo for each scene could do the job.

bsenftner2y ago

If we get anywhere close to that, coming up with a new economics model is going to be the prompt we'll be giving the AGI when it's ready. We'll need it.

ben_w2y ago

Ah, so you have long timelines then? :P

_the_inflator2y ago

I like your sentiment, it is technically inspiring.

cfn2y ago

Given the same prompt and the same seed (and algorithm) the resulting movie/output will always be the same. This is the case for AI image generation now.

polemic2y ago

How big is the SORA model itself?

orblivion2y ago

I can show you an algorithm that compresses an entire 2 hour movie to a single bit, but it only works on one movie.

2 more replies

jasonjmcghee2y ago

You only need one copy of it - even if it is 100GB. If it's baked into every OS... and storage / RAM keeps getting cheaper... just might work.

1 more reply

est2y ago

it depends which resolution you want to see.

jebarker2y ago

This isn't really any different to what they've done is it?

IgorPartola2y ago

“Alexa show me Star Wars but with Dustin Hoffman as Luke”.

gedy2y ago

I actually would really like this flexibility. "Star Wars, but in Korean with k-pop stars cast", etc.

3 more replies

EGreg2y ago

“I’m sorry Dave. I can’t do that. As an Amazon Large Langauge model, I need you to up your subscription to Amazon Prime first.”

“On the other hand, I can generate endless amounts of Harlan Coben miniseries… :-P”

2 more replies

gcanyon2y ago

Ha, I've commented almost exactly this twice now on HN. We'll see how long before it's a reality -- probably better measured in months rather than years.

ToJans2y ago· 3 in thread

Ahhh, Sloot's digital coding system [1] is finally here ;).

[1] https://en.m.wikipedia.org/wiki/Sloot_Digital_Coding_System

CamperBob22y ago

Guy named Borges already patented that, I'm afraid.

numlock862y ago

You just need an index and length within Pi's digits, duh.

didntcheck2y ago

IshKebab2y ago· 3 in thread

> It can be observed that our model outperforms them at low bitrates

It can? Maybe I'm misunderstanding the graphs but it doesn't look like it to me?

astrange2y ago

Graphs (especially PSNR) aren't a good way to judge video compression. It's better to just watch the video.

Many older/commercial video codecs optimized for PSNR, which results in the output being blurry and textureless because that's the best way to minimize rate for the same PSNR.

userbinator2y ago

Many older/commercial video codecs optimized for PSNR, which results in the output being blurry and textureless because that's the best way to minimize rate for the same PSNR.

Even with that, showing H.265 having lower PSNR than H.264 is odd --- it's the former which has often looked blurrier to me.

1 more reply

ThisIsMyAltAcct2y ago

Someone should train a model to evaluate video compression quality

2 more replies

holoduke2y ago· 3 in thread

HarHarVeryFunny2y ago

Well, as a reality check, even the soundtrack of a 1hr movie would be 50x floppy size (~50MB vs 1MB) if MP3 compressed.

I guess where this sort of generative video "compression" is headed is that the video would be the prompt, and you'd need a 100GB decoder (model) to render it.

fasa992y ago

djmips2y ago

A deep thought.

resolutebat2y ago· 3 in thread

Here's the research behind this: https://arxiv.org/html/2402.08934v1

https://www.theverge.com/2013/8/6/4594482/xerox-copiers-rand...

lifthrasiir2y ago

> [S]ome may recall the Xerox debacle about 10 years, when it turned out copiers were helpfully "optimizing" images by replacing digits with others in invoices, architectural drawings, etc.

daemonologist2y ago

sdenton42y ago

Furthermore, it's pretty common in compression research to focus on the size/quality trade-off, and leave optimization of compute for real-world implementations.

zaptrem2y ago· 3 in thread

Can you share example videos?

yonixw2y ago

Googling gave me the article: https://www.arxiv.org/abs/2402.08934

Which have examples in it.

resolutebat2y ago

Direct link to HTML article: https://arxiv.org/html/2402.08934v1

Unfortunately it only contains still images with teeny thumbnails: https://arxiv.org/html/2402.08934v1/x2.png

az2262y ago

Images are hard to evaluate the quality of a video compression. Because it’s diffusing, will it have a bunch of diffuse-jitter.

userbinator2y ago· 2 in thread

How fast is this and how big is the decoder/encoder? The model weights are not accessible.

From the description, it looks like it's only being tested with 128x128 frames, which implies that the speed is very low.

newaccount7g2y ago

Why would you expect those kind of details in a paid commercial?

resolutebat2y ago

It's a link to a Github repo, not a "paid commercial".

sbalamurugan2y ago· 2 in thread

It’s uncanny how much of the current stuff has been predicted by the sitcom -“Silicon Valley”

iwontberude2y ago

Yeah curious to hear what the Weissman score of this latest algorithm is going to be.

xxs2y ago

that prize goes to zstandard, though.

LeoPanthera2y ago· 2 in thread

It's important to remember that any compression gains must include the size of the decompressor which, I assume, will include an enormous diffusion model.

ec1096852y ago

Can’t that be amortized across all videos (e.g. if YouTube had a decompressor they downloaded once)?

LeoPanthera2y ago

smerik2y ago

Does anyone remember the https://en.wikipedia.org/wiki/Sloot_Digital_Coding_System?

hulitu2y ago

> Extreme video compression with prediction using pre-trainded diffusion models

Is this more extreme than youtube ?

mjevans2y ago

I wonder how effective a speed focused variation could be for quality among 264, 265, and AV1.

hoseja2y ago

Middle-out.

j / k navigate · click thread line to collapse