BTW This is the best sci-fi book ever.
Traditional codecs have always focused on trade offs among encode complexity, decode complexity, and latency. Where complexity = compute. If every target device ran a 4090 at full power, we could go far below 22kbps with a traditional codec techniques for content like this. 22kbps isn't particularly impressive given these compute constraints.
This is my field, and trust me we (MPEG committees, AOM) look at "AI" based models, including GANs constantly. They don't yet look promising compared to traditional methods.
Oh and benchmarking against a video compression standard that's over twenty years old isn't doing a lot either for the plausibility of these methods.
Learned video codecs definitely do look promising: Microsoft's DCVC-FM (https://github.com/microsoft/DCVC) beats H.267 in BD-rate. Another benefit of the learned approach is being able to run on soon commodity NPUs, without special hardware accommodation requirements.
In the CLIC challenge, hybrid codecs (traditional + learned components) are so far the best, so that has been a letdown for pure end to end learned codecs, agree. But something like H.267 is currently not cheap to run either.
Agreed hybrid presents real opportunity.
Someone was just having fun here, it's not as if they present it as a general codec.
It just means that a person can't readily distinguish between the compressed image and the uncompressed image. Usually because it takes some aspect(s) of the human visual system into account.
[1] https://scholar.google.com/scholar?hl=en&as_sdt=0%2C22&q=per...
As an example, crf=18 in libx264 is considered “perceptually lossless” for most video content.
for the record, I found liveportrait to be well within the uncanny valley. it looks great for ai generated avatars, but the difference is very perceptually noticeable on familiar faces. still it's great.
"no perceived loss" is a perfectly internally consistent and sensible concept and is actually orthogonal to whether it's actually lossless or lossy.
For instance an actually lossless block of data could be perceptually lossy if displayed the wrong way.
In fact, even actual lossless data is always actually lossy, and only ever "perceptually lossless", and there is no such thing as actually lossless, because anything digital is always only a lossy approximation of anything analog. There is loss both at the ADC and at the DAC stage.
If you want to criticize a term for being nonsense misleading dishonest bullshit, then I guess "lossless" is that term, since it never existed and never can exist.
In that scenario it certainly would not be `transparent` ie visually without any lossy artifacts. But your perception of it would look lossless.
The future is going to be weird.
"As a rule, strong feelings about issues do not emerge from deep understanding." -Sloman and Fernbach
No doubt encoders and the codecs themselves have improved vastly since then. It would be interesting to see if I could tell the difference in a double-blind test today.
It's easy enough to specify an average person looking very closely, or a 99th percentile person, or something like that, and show the statistics backing it up.
This is interesting tech, and the considerations in the introduction are particularly noteworthy. I never considered the possibility of animating 2D avatars with no 3D pipeline at all.
> On a spectrum of model architectures, it achieves higher compression efficiency at the cost of model complexity. Indeed, the full LivePortrait model has 130m parameters compared to DCVC’s 20 million. While that’s tiny compared to LLMs, it currently requires an Nvidia RTX 4090 to run it in real time (in addition to parameters, a large culprit is using expensive warping operations). That means deploying to edge runtimes such as Apple Neural Engine is still quite a ways ahead.
It’s very cool that this is possible, but the compression use case is indeed .. a bit far fetched. A insanely large model requiring the most expensive consumer GPU to run on both ends and at the same time being limited in bandwidth so much (22kbps) is a _very_ limited scenario.
However is does raise an interesting property in that if you are on the spectrum or have ADHD, you only need one headshot of yourself staring directly at the camera and then the capture software can stop you from looking at your taskbar or off into space.
I don't know. I think you'd be surprised.
That's already kind of an issue with vloggers. Often they're looking just left or right of the camera at a monitor or something.
Reminds me of the video chat in Metal Gear Solid 1 https://youtu.be/59ialBNj4lE?t=21
If you could reserve a small portion of the radio bandwidth to broadcast a thumbnail + low bandwidth compressed representation of the face movements, you could technically have something similar without encoding any video (think low res, eye + mouth movements).
Maybe there is a custom web filter in there somewhere that could block particular people and images of them.
Does anyone else remember the weirder (for lack of a better term) features of MPEG-4 part 2, like face and body animation? It did something like that, but as far as I know nearly no one used that feature for anything.
https://en.wikipedia.org/wiki/Face_Animation_Parameter
and in the worst, trust on the internet will be heavily undermined
...as long as the model doesn't include data to put a shoe on one's head.
Lossiness definitely matters when you’re doing forensics. But not for consumers.
If you just want to bop to Taylor who the fuck cares. The iPod ended that argument. Yes I can be a perfectionist, or I can have one thousand songs in my pocket. That was more than half of your collection for many people at the time.
24fps * 52 facial 3D marker * 16bit packed delta planar projected offsets (x,y) = 19.968 kbps
And this is done in Unreal games on a potato graphics card all the time:
https://apps.apple.com/us/app/live-link-face/id1495370836
I am sure calling modern heuristics "AI" gets people excited, but it doesn't seem "Magical" when trivial implementations are functionally equivalent. =3
- Arthur C. Clarke