What, you overshoot by 50%? :-) But more seriously, the job that has been done is simply low-quality, even disregarding the choppy edit style—probably a mixture of shoddy work (from comparing it with some of the others), and the creator not being capable of better as regards the speech—most aren’t particularly aware of how to improve. Much better is readily possible.
> Also, it will become almost literally impossible to hear the difference in the near future
I am an excellent reader, highly regarded for my diction and prosody and for conveying the sense of a matter. So far, I haven’t heard a TTS demo, hand-picked or otherwise, that would stand a chance against a skilled reader: the veriest dunce would savvy which was which within a couple of sentences. (And at least a couple of the demos were deliberately trying to address these sorts of shortcomings in inflection and such.) Certainly they’ve reached the stage where you often can’t reliably identify the computer versus the mediocre reader (and there are an awful lot of them), but I don’t think we’ll see computers beating skilled readers and speakers any time soon, when I consider how poor a job AI is still doing on coherent longform writing, and then add the degrees of nuance and information conveyed in speech. (Supplant them? Sure. But you don’t have to be better than something to supplant it, just cheaper, or some such thing. My favourite example of this is book binding where hot melt glue is vastly inferior to cold glue, but it’s much faster and thus cheaper to produce with, and it’s good enough that you’ll struggle to find cold glue ever used in production now.)
No comments yet.