The absolute best project I've found thus far in terms of output is https://github.com/neonbjb/tortoise-tts but it A) requires an enormous amount of GPU horsepower and B) even with that, a single autoregressive sample, whose text input is less than the length of a tweet, takes 5-10 minutes of compute time to produce.
So my question is, what projects have you all come across that strike that balance between good enough quality and fast time to result?
No comments yet.