undefined | Better HN

0 pointswondercraft3y ago0 comments

You're right that it's a script, but it does have some intricacies that require a lot of testing to get it right. Examples:

Using LLMS:

- formulate the right prompts for the intro and outro generation

- pass the content of a post in segments while maintaining history, as if you do in one go you will exceed token limit

- figure out how to integrate comments properly

- turn the summary into spoken format, not condensed written

Using TTS: - train the right voice, one that fits the content. Not all voices of a TTS engine have the same characteristics.

- understand the bugs of the TTS engine. For example Elevenlabs that we're using (and its beyond amazing overall and the team fantastic), is struggling when given this "$2.5". It will read it out "dollar 2(long pause) 5".

- a few more things

Overall:

- Figure out how to connect all of the different segments, music intros, outros etc

0 comments

2 comments · 2 top-level

m00dy3y ago

It is still a kid's play and more importantly having a low barrier to get into this is scary as hell

jasonjmcghee3y ago

Oh man, these edge cases are frustrating.

I ran into “it’s a 50…………50 chance”, apparently it reads a hyphen as (long pause) too.

I’m bullish on being able to give cues which are not read like Bark is doing. Their audio quality isn’t quite as polished as eleven labs, but it’s convincing / uncanny valley in other ways - laughs, throat clears, stutters, pauses

j / k navigate · click thread line to collapse