I mean vision in the most general sense, not just a GUI.
Imagine OpenAI can not only read the inflection in your voice, but also nuances in your facial expressions and how you're using your hands to understand your state of mind.
And instead of merely responding as an audio stream, a real-time avatar.