story
You've laid out some good criteria though. I wouldn't say voice interfaces have really "made it" until it gets to the point where you don't have to ask how to ask it to do something (discoverability). You just ask it to do something and it does it. Although that's just one of many criteria.
The food menu problem is interesting, but pretty much everything that prints out on a ticket in a kitchen is structured data–it should be able to be efficiently conversationalized (preference notwithstanding, of course). Certainly there are many ways you could talk to someone about a menu: what kinds of dishes are there? Appetizers, grilled entrees, pasta, salads, desserts. What kind of entrees? Vegetarian, pork, beef, seafood. OK, but what styles of cuisine? Jamaican, Italian, Szechuan. There's probably an analog to the 5 Why's for figuring out what someone wants to eat! Asking yesterday's weather, though, is a specific case that could probably be solved by an intern, provided that data is easy to find on the Internet (FWIW, I've searched for the very same thing many times and it's much harder to find vs forecasts).
I concede that there will always be a need for graphical interfaces. How do you "speak" a map, or a CAD model? I guess I was just thinking of things that can accomplished with a keyboard. You can speak anything you can type, even if it's as rudimentary as today, where you have to say "period newline newline" to end a sentence at the end of a paragraph while dictating.
I agree it might seem tough to multitask. But consider WiFi routers serving multiple computers, or hell, even CPUs serving different processes, "simultaneously." If voice recognition and NLP become sufficiently sophisticated I could foresee being able to isolate multiple overlapping voices in an audio sample. If not, consider that you could ask it to look something up, immediately followed by your wife dictating an email to send–or one of you could even interrupt the other–and it could be able to handle the context switching and queuing at speed.
And I understand there's a lot I don't know, and I do remain skeptical that this could ever be perfected. Would it really be able to dictate poetry? Would the forms I create or creatively destroy in free verse just totally confuse the voice interface? Would it be smart enough to side step the confusion via some pseudo-meta-cognitive process and ask me what the hell I'm doing?