I think it's possible AI models will generate dynamic UI for each client and stream the UI to clients (maybe eventually client devices will generate their UI on the fly) similar to Google Stadia. Maybe some offset of video that allows the remote to control it. Maybe Wasm based - just stream wasm bytecode around? The guy behind VLC is building a library for ulta low latency: https://www.kyber.video/techology.