True. When I first read the article, the first use case that popped into my head was "I need to go to work, I'll request the audio version of this article and listen during my commute". While I was totally wrong about your intended use case, so will some other people. I'd suggest looking into machine-learning-based TTS, it might give you a faster turnaround with similar quality of sound.