undefined | Better HN

0 pointsKeplerBoy1y ago0 comments

Don't end to end trained models already do this to some extent? Like raising the pitch towards a question mark, like a human would.

TortoiseTTS has a few examples under prompt engineering on their demo site: https://nonint.com/static/tortoise_v2_examples.html

0 comments

2 comments · 1 top-level

micw1y ago· 1 in thread

That's a bit of basic and random. Some models have the features you describe. From the better models you get a slightly different voice for text in quotes.

But the difference to good audio books is that you have * different voices for the narrator and each character * different emotions and/or speed in certain situations.

I guess you could use a LLM to "understand" and annotate an existing book if there's a markup and then use TTS to create an audio book from it and so automate most of the the process.

micw1y ago

Edit: I actually tried this. I prompted in ChatGPT:

"Annotate the following text with speakers and emotions so that it can be turned into an audiobook via TTS", followed by a short text from "The Hobbit" (The "Good morning scene"). The result is very good.

j / k navigate · click thread line to collapse