Amazing work. Listening a bit to the HN podcast, I'm impressed by the natural-sounding pronunciation of technical terms with non-obvious phonetics like 'postgres'. Have you had to tweak a lot of these manually to get them sounding so good or is your model mostly getting them right?
Very cool. By accumulating lots of these tweaks, I feel like you're also going to have an opportunity to backdoor your way into a great text-to-speech API product as well if you have any interest in going that direction. It seems like the main challenge there is ironing out all the edge cases and you've created an excellent feedback loop for accomplishing that.