I don't think this will age well.
It's a matter of simple compute power to advance from realistic text/token prediction, to realistic synthesis of stuff like human (or animal) body movement, for all kinds of situations, including realistic facial/body language, moods, and so on. Of course perfect voice synthesis. Coupled with good enough robotics, you can see where I'm going with this, and that's only because my imagination is limited to sci-fi movie tropes. I think this is going to be wilder than we can imagine, while still just copying training sets.