When I did hands-free coding, I named my variables things that I could say as words. So you'd be saying 'copy file-num-one file-num-two' or something, rather than spelling it out letter by letter. I actually ended up naming things more verbose names because I didn't have to type it all out. So it might be:
enunciating: 'copy snake-geary-street-financial-report snake-divisadero-street-financial-report'
versus typing: 'cp gearyStreetFinancialReport divisaderoStreetFinancialReport'
If you're trying to exactly replicate something designed (and named) for text input, you're absolutely right, but I thought we were talking about hypothetical designed-for-voice systems.