The only problem with this being a next abstraction level, is that it actually leads to more "coding", because of general spoken language being less informationly dense as any programming language.
Before by switching from binary to assembly to higher level languages to frameworks/libraries, you generally reduce amount of "code" being written after each step, with voice programming this seems to be the opposite.