1. How do you check the output of the voice to code step? If you need as much expertise as you do now to actually review the code, then the voice to code step is just a layer that adds confusion
2. How would debugging work? Again, would you still need to be able to understand the code? Same issue.
3. What if you have to pause and think? This will affect how the voice to code interface interprets your speech.
4. How would you make a precise edit to your source audio using a voice interface?
5. How would you make changes which touch multiple components across the project? How would you coordinate this?
6. Precisely defining interfaces between components and using correct references to specific symbols is very difficult to do in natural speech, which typically uses context to resolve ambiguous references. The language you would be using would still have to resemble the strictness of a programming language even when spoken, but you have replaced a reliable checkable channel (input through keyboard, transfer as-as to text buffer, feedback from visual view of source) with an unreliable channel (input through microphone, transfer through complex signal processing and multiple neural network language models, through multiple representations, where you have to be able to check multiple representations for feedback about the structure of your program (initial speech-to-text step, text to source))