In summary, rather than actually streaming video to the person you're chatting with, you send a keyframe, and then 'compressed' video is sent over the wire, and 'decompressed' at the receiver end.
I'm putting 'compression' in quotations because to me I'm not sure I'm comfortable calling it compression. Basically, you're remotely controlling an avatar of yourself.
While the obvious usage of this is reducing bandwidth used (in their example, an h264 stream at ~100KB/frame can be compressed to 0.1KB/frame, literally a thousandth of the bandwidth), it opens up some VERY interesting possibilities for a company like Meta (check from about 1:55 onwards in the video below).
You can view someone's face from any angle, not just the angle they're speaking from (as you might in a VR world), or you can even map the key points onto a completely different keyframe, allowing for hyper-realistic avatars or next-level virtual backgrounds (imagine: you send a keyframe of you sitting at your desk and hop on a video conference from the beach, and no-one's any the wiser as long as the sea is quiet enough)