If this doesn’t seem obvious in 1yr I’d say the hypothesis is likely wrong.
I don't think voice is a good interface. "It chats like a human" is the lowest possible hanging fruit in terms of product design, and bets everything on the smarts of the tech that's behind it.
We are so used to tooling faster than voice. Keyboards and taps are very, very fast. I want digital assistants as smart AND as fast as that, not something smart but incredibly slow to interact with because it needs to dumb itself down to human speech I/O.
To me, this is also not about modality or making it more generic. I just don't want an anthropomorphized smart-ass assistant. I want smart tools that actually assist me directly, no chat.
People hate call centers. I don't want to have a conversation with some human or human-parity AI at my airline, I want to change my flight in 3 clicks. In retail and fast food, self-checkout and online or kiosk ordering is popular for a reason.
I've found myself feeling those same feelings when being forced to have conversations with a chatbot. Chatting with ChatGPT just for fun can be fun, but it can be just as painful as a call center if you need to get something specific done. In order to cancel a hotel reservation, I was having to chat with some bot. It made it into a whole conversation, with brief pauses. It should have been 3 clicks.
LLMs can plausibly solve both.