Instead of writing code with exacting parameters, future developers will write human-language descriptions for AI to interpret and convert into a machine representation of the intent. Certainly revolutionary, but not true AGI in the sense of the machine having truly independent agency and consciousness.
In ten years, I expect the primary interface of desktop workstations, mobile phones, etc will be voice prompts for an AI interface. Keyboards will become a power-user interface and only used for highly technical tasks, similar to the way terminal interfaces are currently used to access lower-level systems.
And I was like, "But that's not a complete replacement, right? What about the times when you don't want to broadcast what you're writing to the entire room?"
And then there was a big reveal that AI has mastered lip-reading, so even then, people would just put their lips up to the camera and mouth out what they wanted to write.
With that said, as the owner of tyrannyofthemouse.com, I agree with the importance of the keyboard as a UI device.
I think the more surprising thing is that people don't use voice to access deeply nested features, like adding items to calendars etc which would otherwise take a lot of fiddly app navigation.
I think the main reason we don't have that is because Apple's Siri is so useless that it has singlehandedly held back this entire flow, and there's no way for anyone else to get a foothold in smartphone market.
When you have a nice mic or headset and multiple monitors and your own private space, it's totally the next step to just begin working with the computer with voice. Voice has not been a staple feature of people's workflow, but I think all that is about to change (Voice as an interface, not as a communication tool, that's been around since 1876.
But-- that means "not pivotal any more, just hugely important."
Wow, I've always felt the keyboard is the pinnacle of input devices. Everything else feels like a toy in comparison.
That aside, keyboard is an excellent input device for humans specifically because it is very much designed around the strengths of our biology - those dextrous fingers.
That said, voice is the original social interface for humans. We learn to speak much earlier than we learn to read/write.
Better voice UIs will be built to make new workflows with AI feel natural. I'm thinking along the lines of a conversational companion, like the "Jarvis" AI in the Iron Man movies.
That doesn't exist right now, but it seems inevitable that real-time, voice-directed AI agent interfaces will be perfected in coming years. Companies, like [Eleven Labs](https://elevenlabs.io/), are already working on the building blocks.
I'm sure it helps that it's not getting outside of well-established facts, and is asking for facts and not novel design tasks.
I'm not sure but it also seems to adopt a more intimate tone of voice as they get deeper into a topic, very cozy. The voice itself is tuned to the conversational context. It probably infers that this is kid stuff too.
I wonder if we'll have smart-lens glasses where our eyes 'type' much faster than we could possibly talk. Predicative text keyboards tracking eyeballs is something that already exists. I wonder if AI and smartglasses is a natural combo for a future formfactor. Meta seems to be leaning that way with their RayBan collaboration and rumors of adding a screen to the lenses.
The problem with voice input to me is mainly knowing when to start processing. When humans listen, we stream and process the words constantly and wait until either a detection that the other person expects a response (just enough of a pause, or a questioning tone), or as an exception, until we feel we have justification to interrupt (e.g. "Oh yeah, Jane already briefed me on the Johnson project")
Even talking to ChatGPT which embarrasses those old voice bots, I find that it is still very bad at guessing when I'm done when I'm speaking casually, and then once it's responded with nonsense based on a half sentence, I feel it's a polluted context and I probably need to clear it and repeat myself. I'd rather just type.
I think there's not much need to stream the spoken tokens into the model in realtime given that it can think so fast. I'd rather it just listen, have a specialized model simply try to determine when I'm done, and then clean up and abridge my utterance (for instance, when I correct myself) and THEN have the real LLM process the cleaned-up query.
I doubt it. The keyboard and mouse are fit predators, and so are programming, query, and markup languages. I wouldn't dismiss them so easily. This guy has a point: https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...
Perhaps brain interface, or even better, it's so predictive it just knows what I want most of the time. Imagine that, grunting and getting what I want.
Oh, I know! Let's call it... "requirements management"!
Dijkstra has more thoughts on this
https://www.cs.utexas.edu/~EWD/transcriptions/EWD06xx/EWD667...
A BCI able to capture sufficient nuance to equal voice is probably further out than the lifespan of anyone commenting here.