Let us know what other content channels you'd like to receive as Podcasts and we'll get on it.
Read more about our learnings here → https://wondercraft.ai/blog/learnings-from-1-month-of-ai-pod...
"Uh. Generally even tech insiders don't say this kind of thing anymore...it makes life even harder for..."
"Male majority though, oh and give them a tech voice too"
"Hold on, we may actually wish to study your brain, to see how it responds to stimuli from this century as compared to the last"
Using LLMS:
- formulate the right prompts for the intro and outro generation
- pass the content of a post in segments while maintaining history, as if you do in one go you will exceed token limit
- figure out how to integrate comments properly
- turn the summary into spoken format, not condensed written
Using TTS: - train the right voice, one that fits the content. Not all voices of a TTS engine have the same characteristics.
- understand the bugs of the TTS engine. For example Elevenlabs that we're using (and its beyond amazing overall and the team fantastic), is struggling when given this "$2.5". It will read it out "dollar 2(long pause) 5".
- a few more things
Overall:
- Figure out how to connect all of the different segments, music intros, outros etc
on the one hand i like it. she sounds like a real podcast host and person with a nice, professional voice.
on the other hand it's weird. like why does it need to do that other than to pretend to be a person.
Actually, some think the radical opposite.
Music is put everywhere "in some territories", but some people refuse radically to try and focus on content while other stimula are present. Some people find it distracting and senseless.
(In fact, some of us consider the use of ML to remove it from content that bewilderingly decided that you should "be helped to feel" during the fruition of intellectual material like documentaries.)
then there are the absolute best dramas that do not need music to have an emotional impact and on the flip side the absolute worst documentaries that do to be of interest
And some people who prefer music OR no music, depending on the moment and mindset, are ... The same person!
I honestly don't recall being this amazed at technology in quite a while. This is the future
I wonder if you could try different voices for different comments to make it seem like a conversation ;-)
I want this on my pocket to help me navigate the HUGE amounts of information we have nowadays. I don't even care that I don't get the exact subset of that information that I would personally highlight if I were to sift through all that hits my inbox and screen on any given day--I am happy to outsource that to the model even if it's only 80% accurate
You said the cost is about $2 an episode, is that mostly for the summarization or audio generation?
Excited to see competition to drive the cost down here.
I’m under the impression their margins are crazy.
I did not use music but did orchestrate multi host show. Based on chat gpt, eleven lab and a bit of ffmpeg scripting.
It was posted a few days ago and got 0 comments. https://news.ycombinator.com/item?id=35751065
I have a similar project idea in mind but for a different audience.
Purpose: learning a new language (kid and beginner friendly)
Idea:
- Take a reliable publisher of news stories in target language. (either text or audio)
- Grab top three headlines daily.
- Translate the headlines into English.
- Create an audio with headlines in both languages alternately.
This will help listeners connect current affairs and the words / grammar used to describe them in sentences of target language .. and learn those concepts better. Likely hood of encountering newer words and ideas that dont feel forced.
I am hoping to find some guidance or sample projects that I can adapt for this use case. Either myself or work along with my kids as a hobby project for the summer.
If any one can point me to interesting open-source projects like these or OPs, I'd appreciate.
Here is the news podcast in target language (Kannada) that gave me this idea -- it is published 2-3 times a day currently: https://www.prajavani.net/podcast
Having tried this kind of text to speech a few times in the past I’m impressed.
I think having specific subreddits would be a great option.
More importantly though I’d like to be able to just specify my own url list and have it generate the recap of those.
More generally this feels like the missing puzzle piece in personalized voice Q&A service
An alternative is the semi-personalisation (e.g. HN, or subreddits), where the content might not be hyper personal, but still close enough, and the cost is absorbed by community.
Some people are even against being posted on Hackernews in the first place.
And I don't know if all commentators agree if their comments are used.
An alternative is the semi-personalisation (e.g. HN, or subreddits), where the content might not be hyper personal, but still close enough, and the cost is absorbed by community.
Looking forward to see you in _Google Podcasts_ soon.
:P
This is fantastic work, keep up the great job.
What was the technical effort to create such a podcast?
The technical work is split in three pieces: 1) LLM prompt and chaining for script generation 2) workarounds for some TTs bugs (not that many, elevenlabs is amazing) 3) Stitching all moving pieces together
Nothing advanced for tech, but requires a lot of experimentation as neither LLMs nor TTS are deterministic which create headaches for prod.
Otherwise: wow.
Going to delay the idea and try out windercraft, this has blown my mind!