Perceptually lossless (talking head) video compression at 22kbit/s (opens in new tab)

(mlumiste.com)

225 pointsskandium1y ago140 comments

140 comments

84 comments · 19 top-level

LeoPanthera1y ago· 21 in thread

This is very impressive, but “perceptually lossless” isn’t a thing and doesn’t make sense. It means “lossy”.

It may sound like marketing wank, but it does a appear to be an established term of art in academia as far back as 1997 [1]

It just means that a person can't readily distinguish between the compressed image and the uncompressed image. Usually because it takes some aspect(s) of the human visual system into account.

[1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C22&q=per...

tatersolid1y ago

I read “perceptually lossless” to be equivalent to “transparent”, a more common phrase used in the audio/video codec world. It’s the bitrate/quality at which some large fraction of human viewers can’t distinguish a losslessly-encoded sample and the lossy-encoded sample, for some large fraction of content (constants vary in research papers).

As an example, crf=18 in libx264 is considered “perceptually lossless” for most video content.

Ladsko1y ago

Can you propose a better term for the concept then? Perceiving something as lossless is a real world metric that has a proper use case. "Perceptually lossless" does not try to imply that it is not lossy.

ComplexSystems1y ago

The term for this is "transparency." A codec is "transparent" if people can't tell the difference between the original and the compressed version.

3 more replies

high_byte1y ago

why not? if you change one pixel by one pixel brightness unit it is perceptually the same.

for the record, I found liveportrait to be well within the uncanny valley. it looks great for ai generated avatars, but the difference is very perceptually noticeable on familiar faces. still it's great.

LegionMammal9781y ago

For one, it doesn't obey the transitive property like a truly lossless process should: unless it settles into a fixed point, a perceptually lossless copy of a copy of a copy, etc., will eventually become perceptually different. E.g., screenshot-of-screenshot chains, each of which visually resembles the previous one, but which altogether make the original content unreadable.

1 more reply

codeflo1y ago

GP is correct, that’s the definition of “lossy”. We don’t need to invent ever new marketing buzzwords for well-established technical concepts.

3 more replies

Brian_K_White1y ago

It means what it already says for itself, and does not need correcting into incorrectness.

"no perceived loss" is a perfectly internally consistent and sensible concept and is actually orthogonal to whether it's actually lossless or lossy.

For instance an actually lossless block of data could be perceptually lossy if displayed the wrong way.

In fact, even actual lossless data is always actually lossy, and only ever "perceptually lossless", and there is no such thing as actually lossless, because anything digital is always only a lossy approximation of anything analog. There is loss both at the ADC and at the DAC stage.

If you want to criticize a term for being nonsense misleading dishonest bullshit, then I guess "lossless" is that term, since it never existed and never can exist.

unshavedyak1y ago

Similar to your points, i also expect `perceptually lossless` to be a valid term in the future with respect to AI. Ie i can imagine a compression which destroys detail, but on the opposite end it uses "AI" to reconstruct detail. Of course though, the AI is hallucinating the detail, so objectively it is lossy but perceptibly it is lossless because you cannot know which detail is incorrect if the ML is doing a good job.

In that scenario it certainly would not be `transparent` ie visually without any lossy artifacts. But your perception of it would look lossless.

The future is going to be weird.

rowanG0771y ago

Why don't you think it's a thing? A trivial example is audio. A ton of audio speakers can produce frequencies people cannot hear. If you have an unprocessed audio recording from a high end microphone one of the first compressions things you can do is clip of imperceptible frequencies. A form of compression.

ranger_danger1y ago

As there are several patents, published studies, IEEE papers and thousands of google results for the term, I think it's safe to say that many people do not agree with your interpretation of the term.

"As a rule, strong feelings about issues do not emerge from deep understanding." -Sloman and Fernbach

lifthrasiir1y ago

It is definitely a thing given a good perceptual metric. The metric even doesn't have to be very accurate if the distortion is highly bounded, like only altering the lowermost bit. It is unfortunate that most commonly used distortion metrics like PSNR are not really that, though.

rini171y ago

But that's mathematically impossible, to restore signal from extremely low bitrate stream with any highly bounded distortion. Perhaps only if you have highly restricted set of posible input, which online meetings aren't.

1 more reply

_ZeD_1y ago

also are .mp3, yet they are hardly discernible from the originals

bityard1y ago

Ability to tell MP3 from the original source was always dependent on encoder quality, bitrate, and the source material. In the mid 2000's, I tried to encode all of my music as MP3. Most of it sounded just fine because pop/rock/alt/etc are busy and "noisy" by design. But some songs (particularly with few instruments, high dynamic range, and female vocals) were just awful no matter how high I cranked the bitrate. And I'm not even an "audiophile," whatever that means these days.

No doubt encoders and the codecs themselves have improved vastly since then. It would be interesting to see if I could tell the difference in a double-blind test today.

2 more replies

Dwedit1y ago

Lossy audio formats suddenly become very discernible once you subtract the left channel from the right channel. Try that with Lossless audio vs MP3, Vorbis, Opus, AAC, etc. You're listening to only the errors at that point.

rini171y ago

not at 22kbit :)

rob741y ago

Yeah, all lossy compression could be called "perceptually lossless" if the perception is bad enough...

bux931y ago

A family member of mine didn't see the point of 1080p. Turned out they needed cataract surgery and got fancy replacement lenses in their eyes. After that, they saw the point.

Dylan168071y ago

Needing to define "perception" is a much weaker criticism than "isn't a thing and doesn't make sense".

It's easy enough to specify an average person looking very closely, or a 99th percentile person, or something like that, and show the statistics backing it up.

k__1y ago

Is this the real-time discussion all over again?

andrewstuart1y ago· 10 in thread

The more magic AI makes, the less magical the world becomes.

EarlKing1y ago

Clearly Sauron is a jealous ringmaker and doesn't like hobbits using his ring to shitpost.

Joel_Mckay1y ago

Probably just disappointed at the wasted bandwidth:

24fps * 52 facial 3D marker * 16bit packed delta planar projected offsets (x,y) = 19.968 kbps

And this is done in Unreal games on a potato graphics card all the time:

https://apps.apple.com/us/app/live-link-face/id1495370836

I am sure calling modern heuristics "AI" gets people excited, but it doesn't seem "Magical" when trivial implementations are functionally equivalent. =3

1 more reply

psychoslave1y ago

The greatest feat ever: let magic disappear before wonder of understanding.

xyzsparetimexyz1y ago

Oh shut up. There's plenty of awful uses for ai but this isn't one of them

HPsquared1y ago

This is the power of numerical methods.

andrewstuart1y ago

There’s a finite amount of magic and if AI borrows it here then it must be repaid there.

andai1y ago

What did you mean by this?

satvikpendem1y ago

> Any sufficiently advanced technology is indistinguishable from magic.

- Arthur C. Clarke

andai1y ago

Why am I downvoted for asking parent to clarify? Was I impolite for not using a full sentence?

gwd1y ago· 8 in thread

This reminds me of a scene in "A Fire Upon the Deep" (1992) where they're on a video call with someone on another spaceship; but something seems a bit "off". Then someone notices that the actual bitrate they're getting from the other vessel is tiny -- far lower than they should be getting given the conditions -- and so most of what they're seeing on their own screens isn't actual video feed, but their local computer's reconstruction.

Rebelgecko1y ago

Was that the same book that had the concept of (paraphrasing using modern terminology) doing interstellar communications by sending back and forth LLMs trained on the people who wanted to talk, prompted to try and get a good business deal or whatever?

alex-robbins1y ago

That happened in Redemption Ark by Alastair Reynolds (2002), though of course the idea may also have been used before or since.

DoneWithAllThat1y ago

This idea was also used in The Algebraist.

miohtama1y ago

And also it was a deep fake.

BTW This is the best sci-fi book ever.

Retric1y ago

Might be better if you like space opera style really soft science fiction. I really didn’t enjoy it.

4 more replies

jf1y ago

I beg to differ. A Deepness In The Sky is the best sci-fi book ever.

1 more reply

_kb1y ago

At least for audio, that dystopia is already shipping in end-user product: https://blog.webex.com/collaboration/hybrid-work/next-level-...

janandonly1y ago

I came here to reply just this exactly and found a fellow geek beat me to it. Indeed a brilliant book.

red0point1y ago· 8 in thread

> But one overlooked use case of the technology is (talking head) video compression.

> On a spectrum of model architectures, it achieves higher compression efficiency at the cost of model complexity. Indeed, the full LivePortrait model has 130m parameters compared to DCVC’s 20 million. While that’s tiny compared to LLMs, it currently requires an Nvidia RTX 4090 to run it in real time (in addition to parameters, a large culprit is using expensive warping operations). That means deploying to edge runtimes such as Apple Neural Engine is still quite a ways ahead.

It’s very cool that this is possible, but the compression use case is indeed .. a bit far fetched. A insanely large model requiring the most expensive consumer GPU to run on both ends and at the same time being limited in bandwidth so much (22kbps) is a _very_ limited scenario.

gambiting1y ago

One cool use would be communication in space - where it's feasible that both sides would have access to high-end compute units but have a very limited bandwidth between each other.

bliteben1y ago

Wonder if its better than a single color channel hologram though

bityard1y ago

Bandwidth is not the limitation in space comms, latency is.

1 more reply

JamesLeonis1y ago

Increasingly mobile networks are like this. There are all kinds of bandwidth issues, especially when customers are subject to metered pricing for data.

loa_in_1y ago

Staying in contact with someone for hours on metered mobile internet connection comes to mind. Low bandwidth translates to low total data volume over time. If I could be video chatting on one of those free internet SIM cards that's a breakthrough.

omh1y ago

One use case might be if you have limited bandwidth, perhaps only a voice call, and want to join a video conference. I could imagine dialling in to a conference with a virtual face as an improvement over no video at all.

1 more reply

jl61y ago

130m parameters isn’t insanely large, even for smartphone memory. The high GPU usage is a barrier at the moment, but I wouldn’t put it past Apple to have 4090-level GPU performance in an iPhone before 2030.

loudmax1y ago

The trade-off may not be worth it today, but the processing power we can expect in the coming years will make this accessible to ordinary consumers. When your laptop or phone or AR headset has the processing power to run these models, it will make more efficient use of limited bandwidth, even if more bandwidth is available. I don't think available bandwidth will scale at the same rate as processing power, but even if it does, the picture be that much more realistic.

zbobet20121y ago· 5 in thread

These sorts of models pop here quite a bit, and they ignore fundamental facts of video codecs (video specific lossy compression technologies).

Traditional codecs have always focused on trade offs among encode complexity, decode complexity, and latency. Where complexity = compute. If every target device ran a 4090 at full power, we could go far below 22kbps with a traditional codec techniques for content like this. 22kbps isn't particularly impressive given these compute constraints.

This is my field, and trust me we (MPEG committees, AOM) look at "AI" based models, including GANs constantly. They don't yet look promising compared to traditional methods.

Oh and benchmarking against a video compression standard that's over twenty years old isn't doing a lot either for the plausibility of these methods.

skandiumOP1y ago

This is my field as well, although I come from the neural network angle.

Learned video codecs definitely do look promising: Microsoft's DCVC-FM (https://github.com/microsoft/DCVC) beats H.267 in BD-rate. Another benefit of the learned approach is being able to run on soon commodity NPUs, without special hardware accommodation requirements.

In the CLIC challenge, hybrid codecs (traditional + learned components) are so far the best, so that has been a letdown for pure end to end learned codecs, agree. But something like H.267 is currently not cheap to run either.

zbobet20121y ago

Winning in bd rate though isn't hard. You need to win in bd rate and have a hardware implementable, power efficient, cheap decoder.

Agreed hybrid presents real opportunity.

AzzyHN1y ago

Did you mean H.266? Or is there some secret H.267 that hasn't been agreed upon yet

smokel1y ago

Why so sour? This particular article doesn't seem to ignore a lot, it even references the Nvidia work that inspired it, as well as a recent benchmark.

Someone was just having fun here, it's not as if they present it as a general codec.

AzzyHN1y ago

Really? Would there be a way to replicate this with currently available encoders? I'd like to try it

up2isomorphism1y ago· 5 in thread

“Perceptually lossless” is an oxymoron.

ranger_danger1y ago

As there are several patents, published studies, IEEE papers and thousands of google results for the term, I think it's safe to say that many people do not agree with your interpretation of the term.

hinkley1y ago

You’re still listening to vinyl, arntcha?

Lossiness definitely matters when you’re doing forensics. But not for consumers.

If you just want to bop to Taylor who the fuck cares. The iPod ended that argument. Yes I can be a perfectionist, or I can have one thousand songs in my pocket. That was more than half of your collection for many people at the time.

up2isomorphism1y ago

Calm down dude. It is just a marketing term for something lossy.

esafak1y ago

It means you don't perceive the loss. What are you arguing; that you can perceive any loss?

Brian_K_White1y ago

There is no oxymoron in "no perceived loss".

antiquark1y ago· 3 in thread

Not quite lossless... look at the bicycle seat behind him. When he tilts his head, the seat moves with his hair.

manmal1y ago

His gaze also doesn’t quite match.

hinkley1y ago

Why is nobody noticing the eyes?? This is important!

I feel like I’m taking crazy pills.

1 more reply

metaphor1y ago

Very noticeable jitter in bicycle front tire too.

initramfs1y ago· 2 in thread

nice feature for low bandwidth 4G cell systems.

Reminds me of the video chat in Metal Gear Solid 1 https://youtu.be/59ialBNj4lE?t=21

dormento1y ago

Now that you mention it, it never occurred to me that Snake's radio transmitted video as well. "Did you like my new sunglasses?"

If you could reserve a small portion of the radio bandwidth to broadcast a thumbnail + low bandwidth compressed representation of the face movements, you could technically have something similar without encoding any video (think low res, eye + mouth movements).

hinkley1y ago

Nice feature for many to one video conferencing as well. Though I don’t know if the organizers will agree.

hinkley1y ago· 1 in thread

The second example shown is not perceptually lossless, unless you’re so far on the spectrum you won’t make eye contact even with a picture of a person. The reconstructed head doesn’t look in the same direction as the original.

However is does raise an interesting property in that if you are on the spectrum or have ADHD, you only need one headshot of yourself staring directly at the camera and then the capture software can stop you from looking at your taskbar or off into space.

DCH34161y ago

> unless you’re so far on the spectrum you won’t make eye contact even with a picture of a person.

I don't know. I think you'd be surprised.

That's already kind of an issue with vloggers. Often they're looking just left or right of the camera at a monitor or something.

pastelsky1y ago· 1 in thread

Did not expect to see Emraan Hashmi in this post!

shaan71y ago

Indeed! Bollywood makes it to HN xD

stuaxo1y ago· 1 in thread

Bit off putting that it's Musk for some reason, maybe it's just overexposure to his bullshit, I could quite happily never see him again.

Maybe there is a custom web filter in there somewhere that could block particular people and images of them.

Separo1y ago

We could run something like TensorFlow.js in a Chrome extension to identify the person in the image and replace it in the dom. A little resource intensive for inference on every image in but probably worth it in this case.

Vecr1y ago

Fire Upon the Deep had more or less this. Story important, so I won't say more. That series in general had absolutely brutal bandwidth limitations.

MayeulC1y ago

I like how the saddle in the background moves with the reconstructed head; it probably works better with uncluttered backgrounds.

This is interesting tech, and the considerations in the introduction are particularly noteworthy. I never considered the possibility of animating 2D avatars with no 3D pipeline at all.

AndrewVos1y ago

Elon weirdly looks more human than usual in the AI version!

JimDabell1y ago

I got some interesting replies when I suggested this technique here:

https://news.ycombinator.com/item?id=22907718

jacobgorm1y ago

userbinator1y ago

the only information that needs to be transmitted is the change in expression, pose and facial keypoints

Does anyone else remember the weirder (for lack of a better term) features of MPEG-4 part 2, like face and body animation? It did something like that, but as far as I know nearly no one used that feature for anything.

https://en.wikipedia.org/wiki/Face_Animation_Parameter

and in the worst, trust on the internet will be heavily undermined

...as long as the model doesn't include data to put a shoe on one's head.

accra4rx1y ago

why does he has to deep fake Imran Hashmi the serial kisser

tommiegannert1y ago

Now that we're moving towards context-specific compression algorithms, can we please use WASM as the file header for these media files, instead of inventing something new. :)

j / k navigate · click thread line to collapse

140 comments

84 comments · 19 top-level

LeoPanthera1y ago· 21 in thread

This is very impressive, but “perceptually lossless” isn’t a thing and doesn’t make sense. It means “lossy”.

Bjartr1y ago

It may sound like marketing wank, but it does a appear to be an established term of art in academia as far back as 1997 [1]

It just means that a person can't readily distinguish between the compressed image and the uncompressed image. Usually because it takes some aspect(s) of the human visual system into account.

[1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C22&q=per...

tatersolid1y ago

As an example, crf=18 in libx264 is considered “perceptually lossless” for most video content.

Ladsko1y ago

ComplexSystems1y ago

The term for this is "transparency." A codec is "transparent" if people can't tell the difference between the original and the compressed version.

3 more replies

high_byte1y ago

why not? if you change one pixel by one pixel brightness unit it is perceptually the same.

LegionMammal9781y ago

1 more reply

codeflo1y ago

GP is correct, that’s the definition of “lossy”. We don’t need to invent ever new marketing buzzwords for well-established technical concepts.

3 more replies

Brian_K_White1y ago

It means what it already says for itself, and does not need correcting into incorrectness.

"no perceived loss" is a perfectly internally consistent and sensible concept and is actually orthogonal to whether it's actually lossless or lossy.

For instance an actually lossless block of data could be perceptually lossy if displayed the wrong way.

If you want to criticize a term for being nonsense misleading dishonest bullshit, then I guess "lossless" is that term, since it never existed and never can exist.

unshavedyak1y ago

In that scenario it certainly would not be `transparent` ie visually without any lossy artifacts. But your perception of it would look lossless.

The future is going to be weird.

rowanG0771y ago

ranger_danger1y ago

As there are several patents, published studies, IEEE papers and thousands of google results for the term, I think it's safe to say that many people do not agree with your interpretation of the term.

"As a rule, strong feelings about issues do not emerge from deep understanding." -Sloman and Fernbach

lifthrasiir1y ago

rini171y ago

1 more reply

_ZeD_1y ago

also are .mp3, yet they are hardly discernible from the originals

bityard1y ago

No doubt encoders and the codecs themselves have improved vastly since then. It would be interesting to see if I could tell the difference in a double-blind test today.

2 more replies

Dwedit1y ago

rini171y ago

not at 22kbit :)

rob741y ago

Yeah, all lossy compression could be called "perceptually lossless" if the perception is bad enough...

bux931y ago

A family member of mine didn't see the point of 1080p. Turned out they needed cataract surgery and got fancy replacement lenses in their eyes. After that, they saw the point.

Dylan168071y ago

Needing to define "perception" is a much weaker criticism than "isn't a thing and doesn't make sense".

It's easy enough to specify an average person looking very closely, or a 99th percentile person, or something like that, and show the statistics backing it up.

k__1y ago

Is this the real-time discussion all over again?

andrewstuart1y ago· 10 in thread

The more magic AI makes, the less magical the world becomes.

EarlKing1y ago

Clearly Sauron is a jealous ringmaker and doesn't like hobbits using his ring to shitpost.

Joel_Mckay1y ago

Probably just disappointed at the wasted bandwidth:

24fps * 52 facial 3D marker * 16bit packed delta planar projected offsets (x,y) = 19.968 kbps

And this is done in Unreal games on a potato graphics card all the time:

https://apps.apple.com/us/app/live-link-face/id1495370836

I am sure calling modern heuristics "AI" gets people excited, but it doesn't seem "Magical" when trivial implementations are functionally equivalent. =3

1 more reply

psychoslave1y ago

The greatest feat ever: let magic disappear before wonder of understanding.

xyzsparetimexyz1y ago

Oh shut up. There's plenty of awful uses for ai but this isn't one of them

HPsquared1y ago

This is the power of numerical methods.

andrewstuart1y ago

There’s a finite amount of magic and if AI borrows it here then it must be repaid there.

andai1y ago

What did you mean by this?

satvikpendem1y ago

> Any sufficiently advanced technology is indistinguishable from magic.

- Arthur C. Clarke

andai1y ago

Why am I downvoted for asking parent to clarify? Was I impolite for not using a full sentence?

gwd1y ago· 8 in thread

Rebelgecko1y ago

alex-robbins1y ago

That happened in Redemption Ark by Alastair Reynolds (2002), though of course the idea may also have been used before or since.

DoneWithAllThat1y ago

This idea was also used in The Algebraist.

miohtama1y ago

And also it was a deep fake.

BTW This is the best sci-fi book ever.

Retric1y ago

Might be better if you like space opera style really soft science fiction. I really didn’t enjoy it.

4 more replies

jf1y ago

I beg to differ. A Deepness In The Sky is the best sci-fi book ever.

1 more reply

_kb1y ago

At least for audio, that dystopia is already shipping in end-user product: https://blog.webex.com/collaboration/hybrid-work/next-level-...

janandonly1y ago

I came here to reply just this exactly and found a fellow geek beat me to it. Indeed a brilliant book.

red0point1y ago· 8 in thread

> But one overlooked use case of the technology is (talking head) video compression.

gambiting1y ago

One cool use would be communication in space - where it's feasible that both sides would have access to high-end compute units but have a very limited bandwidth between each other.

bliteben1y ago

Wonder if its better than a single color channel hologram though

bityard1y ago

Bandwidth is not the limitation in space comms, latency is.

1 more reply

JamesLeonis1y ago

Increasingly mobile networks are like this. There are all kinds of bandwidth issues, especially when customers are subject to metered pricing for data.

loa_in_1y ago

omh1y ago

1 more reply

jl61y ago

loudmax1y ago

zbobet20121y ago· 5 in thread

These sorts of models pop here quite a bit, and they ignore fundamental facts of video codecs (video specific lossy compression technologies).

This is my field, and trust me we (MPEG committees, AOM) look at "AI" based models, including GANs constantly. They don't yet look promising compared to traditional methods.

Oh and benchmarking against a video compression standard that's over twenty years old isn't doing a lot either for the plausibility of these methods.

skandiumOP1y ago

This is my field as well, although I come from the neural network angle.

zbobet20121y ago

Winning in bd rate though isn't hard. You need to win in bd rate and have a hardware implementable, power efficient, cheap decoder.

Agreed hybrid presents real opportunity.

AzzyHN1y ago

Did you mean H.266? Or is there some secret H.267 that hasn't been agreed upon yet

smokel1y ago

Why so sour? This particular article doesn't seem to ignore a lot, it even references the Nvidia work that inspired it, as well as a recent benchmark.

Someone was just having fun here, it's not as if they present it as a general codec.

AzzyHN1y ago

Really? Would there be a way to replicate this with currently available encoders? I'd like to try it

up2isomorphism1y ago· 5 in thread

“Perceptually lossless” is an oxymoron.

ranger_danger1y ago

As there are several patents, published studies, IEEE papers and thousands of google results for the term, I think it's safe to say that many people do not agree with your interpretation of the term.

hinkley1y ago

You’re still listening to vinyl, arntcha?

Lossiness definitely matters when you’re doing forensics. But not for consumers.

up2isomorphism1y ago

Calm down dude. It is just a marketing term for something lossy.

esafak1y ago

It means you don't perceive the loss. What are you arguing; that you can perceive any loss?

Brian_K_White1y ago

There is no oxymoron in "no perceived loss".

antiquark1y ago· 3 in thread

Not quite lossless... look at the bicycle seat behind him. When he tilts his head, the seat moves with his hair.

manmal1y ago

His gaze also doesn’t quite match.

hinkley1y ago

Why is nobody noticing the eyes?? This is important!

I feel like I’m taking crazy pills.

1 more reply

metaphor1y ago

Very noticeable jitter in bicycle front tire too.

initramfs1y ago· 2 in thread

nice feature for low bandwidth 4G cell systems.

Reminds me of the video chat in Metal Gear Solid 1 https://youtu.be/59ialBNj4lE?t=21

dormento1y ago

Now that you mention it, it never occurred to me that Snake's radio transmitted video as well. "Did you like my new sunglasses?"

hinkley1y ago

Nice feature for many to one video conferencing as well. Though I don’t know if the organizers will agree.

hinkley1y ago· 1 in thread

DCH34161y ago

> unless you’re so far on the spectrum you won’t make eye contact even with a picture of a person.

I don't know. I think you'd be surprised.

That's already kind of an issue with vloggers. Often they're looking just left or right of the camera at a monitor or something.

pastelsky1y ago· 1 in thread

Did not expect to see Emraan Hashmi in this post!

shaan71y ago

Indeed! Bollywood makes it to HN xD

stuaxo1y ago· 1 in thread

Bit off putting that it's Musk for some reason, maybe it's just overexposure to his bullshit, I could quite happily never see him again.

Maybe there is a custom web filter in there somewhere that could block particular people and images of them.

Separo1y ago

Vecr1y ago

Fire Upon the Deep had more or less this. Story important, so I won't say more. That series in general had absolutely brutal bandwidth limitations.

MayeulC1y ago

I like how the saddle in the background moves with the reconstructed head; it probably works better with uncluttered backgrounds.

This is interesting tech, and the considerations in the introduction are particularly noteworthy. I never considered the possibility of animating 2D avatars with no 3D pipeline at all.

AndrewVos1y ago

Elon weirdly looks more human than usual in the AI version!

JimDabell1y ago

I got some interesting replies when I suggested this technique here:

https://news.ycombinator.com/item?id=22907718

jacobgorm1y ago

userbinator1y ago

the only information that needs to be transmitted is the change in expression, pose and facial keypoints

https://en.wikipedia.org/wiki/Face_Animation_Parameter

and in the worst, trust on the internet will be heavily undermined

...as long as the model doesn't include data to put a shoe on one's head.

accra4rx1y ago

why does he has to deep fake Imran Hashmi the serial kisser

tommiegannert1y ago

Now that we're moving towards context-specific compression algorithms, can we please use WASM as the file header for these media files, instead of inventing something new. :)

j / k navigate · click thread line to collapse