Except for the large majority of people who read, type, and click way faster than they can talk. Especially for visual things it’s way faster to drag a rectangle than to describe what you want.
A lot of us also aren’t linear verbal thinkers. It would take minutes to hours to verbalize concepts we can grasp visually/schematically in seconds.
Great book on the topic: https://www.goodreads.com/book/show/60149558-visual-thinking
I usually convey the same meaning with 80wpm typing. Makes it faster to read too
Maybe I’m just slightly adhd – listening to people talk drives my crazy. Get to the point! Much easier if they type it out
Also, I doubt DeepMind is designing for existing programmers and savvy computer users. They are thinking about the other billions of people in the world. Speech is the skill people will already have, not typing.
Neither typing speed nor dictation speed is a true bottleneck, but editing speech seems like it'd be harder than editing text.
Though there may be some hybrid approach that can work well.
You don't have to think about the design of your app. You just say what you want and the AI makes it appear. If you don't like something, you tell the AI to change it. You iterate live until you get the final result you want.
This is what writing docs has become for me. I have the agent make a draft, then tell it which sections to rewrite, combine, etc. I tell it the ideas I forgot to include. I manually make certain word choice changes. The question is how do you extend this flow to non-pure-text scenarios. For most people, just talking about what you see if probably the easiest.
I hadn’t realized until just now how accurate that is for me as well. Thank you.