Skip to content

Top Best Ask Show New Jobs

Show HN: The HN Recap – AI generated daily HN podcast (opens in new tab)

(hackernewsrecap.buzzsprout.com)

177 pointswondercraft3y ago88 comments

We've been running The HN Recap for a month to make it easier to consume Hacker News. While this was a PoC in understanding adoption for AI-generated podcasts, we now plan to keep this going, since lots of people are now listening to this daily.

Let us know what other content channels you'd like to receive as Podcasts and we'll get on it.

Read more about our learnings here → https://wondercraft.ai/blog/learnings-from-1-month-of-ai-pod...

88 comments

84 comments · 33 top-level

cubefox3y ago· 6 in thread

Impressive. How about using a male voice? I think it would fit the character of HN since the vast majority here seem to be men. (Optimally it would be someone who sounds roughly like a tech person, if that makes sense, though I guess there isn't so much choice in voices.)

mcmcmc3y ago

Read your comment again and ask yourself why the tech sphere is often perceived as sexist

I don't think this is sexist, e.g. even as a man in some predominantly female community I would agree a female voice would be a somewhat better fit.

wondercraftOP3y ago

We have a selection of voices, we just thought that Anna fit this role very well. You can also clone your own voice if you'd like. https://app.wondercraft.ai/

themodelplumber3y ago

"Hi. One hardcore stereotype please."

"Uh. Generally even tech insiders don't say this kind of thing anymore...it makes life even harder for..."

"Male majority though, oh and give them a tech voice too"

"Hold on, we may actually wish to study your brain, to see how it responds to stimuli from this century as compared to the last"

I feel sorry for the mindset which would prompt such a response.

chipgap983y ago

Tech people can’t be women?

maxverse3y ago· 5 in thread

This is really impressive. Is anyone else really freaked out by that? I'm having an uncanny-valley type feeling, because the audio is 99.9% convincing, and the only thing that gives it away is inconsistent, not-always-correct pronunciation. But I'm not picking at the technical details. I'm freaked out by how good it is, and how easily it could pass for some person reading a human-written podcast. I know speech generation has been getting progressively better, but I guess I haven't heard it in a while (compare to the stock TikTok voice, for example.) Coupled with an LLM, this is too close for comfort.

wondercraftOP3y ago

Thanks! Yeah the quality of this audio was what made us automate this process. With LLMs doing the leg work on the content curation as well, it's fairly straightforward. We built a UI around it on https://app.wondercraft.ai/ if you wanna check it out. The TTS engine used is elevenlabs btw.

chaxor3y ago

It's good - but the even more crazy thing is that this can be done by a script kiddy in a few hours - not an expert who spends months or years trying to whittle at some part of the process as it was several years ago.

wondercraftOP3y ago

You're right that it's a script, but it does have some intricacies that require a lot of testing to get it right. Examples:

Using LLMS:

- formulate the right prompts for the intro and outro generation

- pass the content of a post in segments while maintaining history, as if you do in one go you will exceed token limit

- figure out how to integrate comments properly

- turn the summary into spoken format, not condensed written

Using TTS: - train the right voice, one that fits the content. Not all voices of a TTS engine have the same characteristics.

- understand the bugs of the TTS engine. For example Elevenlabs that we're using (and its beyond amazing overall and the team fantastic), is struggling when given this "$2.5". It will read it out "dollar 2(long pause) 5".

- a few more things

Overall:

- Figure out how to connect all of the different segments, music intros, outros etc

skaushik923y ago

For me the intonation of words and the pauses between them seem quite off from natural speech as well.

williamstein3y ago

It sounds like unnatural "podcast speech".

goblinux3y ago· 5 in thread

is anyone else weirded out that it simulates breath? like this is a robot, not a real woman, but it pauses to take a breath to mimic human sound.

on the one hand i like it. she sounds like a real podcast host and person with a nice, professional voice.

on the other hand it's weird. like why does it need to do that other than to pretend to be a person.

beenHot4003y ago

Probably helps with engagement, as it's more natural.

themodelplumber3y ago

I agree...I really want to know if it could do the kind of verbal-keepalive feedback you hear in conversational Japanese, between two people. It'd be amazing to hear an AI group podcast like that, just for technical amazement purposes.

anigbrowl3y ago

Most people don't like listening to overtly artificial voices for anything long form. Breath pauses also give the listener time to chunk things.

wondercraftOP3y ago

if you think about how these models were trained it makes sense. When a human reads an audiobook, they would breathe. So the model has learned how to make the breathing sound.

LightBug13y ago

Yes. That's what I found most impressive. Incredible.

JanSt3y ago· 5 in thread

Which text to speech platform do you use? Sound really good

moritonal3y ago

Going to take a guess it's their own, https://wondercraft.ai/.

gregsadetsky3y ago

In another comment [0], it's mentioned that the TTS engine is ElevenLabs' [1]

[0] https://news.ycombinator.com/item?id=35832886

[1] https://elevenlabs.io/

It's ElevenLabs [1], according to OP's earlier comments.

[1] https://beta.elevenlabs.io/

mariorojas3y ago

according to link's description it's Wondercraft https://app.wondercraft.ai/

wondercraftOP3y ago

Elevenlabs! Those guys are amazing!

mdp20213y ago· 3 in thread

> Music enhances the experience: Simply put, music makes everything better

Actually, some think the radical opposite.

Music is put everywhere "in some territories", but some people refuse radically to try and focus on content while other stimula are present. Some people find it distracting and senseless.

(In fact, some of us consider the use of ML to remove it from content that bewilderingly decided that you should "be helped to feel" during the fruition of intellectual material like documentaries.)

if I’m watching something for the emotional impact and experience - i.e. drama - then music is absolutely welcome. if I’m watching it because I want to learn about something and make my own opinions - i.e. documentaries - I can’t stand music as I feel like I’m being emotionally manipulated

then there are the absolute best dramas that do not need music to have an emotional impact and on the flip side the absolute worst documentaries that do to be of interest

themodelplumber3y ago

> Some people find it distracting and senseless.

And some people who prefer music OR no music, depending on the moment and mindset, are ... The same person!

dmbche3y ago

Interesting! I'm working in video and qm very interested by this. Would you have somewhere to point me towards to learn more?

airstrike3y ago· 3 in thread

This is so good! I'm amazed it's even possible for this to be done "so easily" (I'm sure it's a lot of work!)

I honestly don't recall being this amazed at technology in quite a while. This is the future

I wonder if you could try different voices for different comments to make it seem like a conversation ;-)

I want this on my pocket to help me navigate the HUGE amounts of information we have nowadays. I don't even care that I don't get the exact subset of that information that I would personally highlight if I were to sift through all that hits my inbox and screen on any given day--I am happy to outsource that to the model even if it's only 80% accurate

anigbrowl3y ago

I have very mixed feelings about this. I produce a podcast and a large (1hr) episode with multiple interviews raw audio sources etc can involve 6-10 hours of editing work. Seeing it generated at the push of a button is technically impressive (and in line with predictions I've made here about the automatability of such things) but also demotivates me from doing my own editing labor, as I can see my own decades' worth of audio editing skills becoming obsolete and economically unsustainable.

wondercraftOP3y ago

Hey! I understand your point but I think as with every technology there's two perspectives: 1. You get demotivated by thinking that the machine will replace you 2. You use all of your experience as springboard and leverage this technology to accelerate your day to day. We already have 3 podcast studios as customers, and they love it as they can create a new podcast in no time, when before it took a while. Message us at (team AT wondercraft.ai) if you want to know more.

wondercraftOP3y ago

Thanks for the nice words! The conversation style pods still need a little bit of work... the interaction between the voices is a little bit unnatural.

aschobel3y ago· 2 in thread

I’m blown away by the audio quality, well done. It’s nice that it also recaps the comments.

You said the cost is about $2 an episode, is that mostly for the summarization or audio generation?

jasonjmcghee3y ago

Unless they are doing something unreasonable, i have to believe it’s audio. ElevenLabs is $0.3/1000 characters. The summarization cost shouldn’t come close.

Excited to see competition to drive the cost down here.

I’m under the impression their margins are crazy.

wondercraftOP3y ago

Yes, it's 90% the audio generation. It's about to get a lot cheaper though.

olup3y ago· 2 in thread

Wow I literally posted the same thing a month ago - check out https://radio-hn.pages.dev

I did not use music but did orchestrate multi host show. Based on chat gpt, eleven lab and a bit of ffmpeg scripting.

There's also this! https://camrobjones.com/hackercast/

It was posted a few days ago and got 0 comments. https://news.ycombinator.com/item?id=35751065

I think people are blown away by the quality of the audio narration of this one, not by the idea or content itself. AWS Polly sounds like the current generation of artificial voice we are used to.

switz3y ago· 2 in thread

will the podcast break when it covers itself tomorrow?

wondercraftOP3y ago

(* laughs in robot *)

lol

dsco3y ago· 2 in thread

I made something similar, https://odysseysplace.buzzsprout.com/ - there’s an episode where an AI interviews an AI too. I thoroughly enjoy the concept of your podcast though - its found a niche!

stockholm3y ago

That’s one of the standard voices from ElevenLabs right?

dsco3y ago

Yeah I just tuned it a bit. The other voice it interviews is a custom made though.

albert_e3y ago· 1 in thread

Wonderful!

I have a similar project idea in mind but for a different audience.

Purpose: learning a new language (kid and beginner friendly)

Idea:

- Take a reliable publisher of news stories in target language. (either text or audio)

- Grab top three headlines daily.

- Translate the headlines into English.

- Create an audio with headlines in both languages alternately.

This will help listeners connect current affairs and the words / grammar used to describe them in sentences of target language .. and learn those concepts better. Likely hood of encountering newer words and ideas that dont feel forced.

I am hoping to find some guidance or sample projects that I can adapt for this use case. Either myself or work along with my kids as a hobby project for the summer.

If any one can point me to interesting open-source projects like these or OPs, I'd appreciate.

Here is the news podcast in target language (Kannada) that gave me this idea -- it is published 2-3 times a day currently: https://www.prajavani.net/podcast

wondercraftOP3y ago

Yeah a few different people doing so currently on the platform. Give it a go, see if it works for you: https://app.wondercraft.ai/ and ping us if you need help

AndrewKemendo3y ago· 1 in thread

I sat in my listening room attempting to find the flaws in the audio and I was left wanting

Having tried this kind of text to speech a few times in the past I’m impressed.

I think having specific subreddits would be a great option.

More importantly though I’d like to be able to just specify my own url list and have it generate the recap of those.

More generally this feels like the missing puzzle piece in personalized voice Q&A service

wondercraftOP3y ago

(same answer to another similar comment) Yeah the hyperpersonalised content is super interesting. For 10' of this audio, compute would cost about $1. So at $1 per daily episode, a realistic price a business would charge for this is $49 a month. Would you pay that? Maybe you could opt in for ads, that would pay that $1 for your attention. After all, the advertiser would know exactly your interests, based on your twitter feed.

An alternative is the semi-personalisation (e.g. HN, or subreddits), where the content might not be hyper personal, but still close enough, and the cost is absorbed by community.

croes3y ago· 1 in thread

Isn't there a risk of copyright violation because you use content from third parties only linked on Hackernews?

Some people are even against being posted on Hackernews in the first place.

And I don't know if all commentators agree if their comments are used.

Maybe in the EU at least? Google News has to license the content it uses from news sources. IANAL but I can't imagine any commenters here would have any licensing rights to their publicly posted comments.

localhost3y ago· 1 in thread

I'd love to have something that could create daily summaries of private Twitter lists that I use to track developments in AI. The audio part would be great to consume during my re-emerging commute to the office. I would imagine this would be a pretty cool thing to have for any information worker as well. The difference is that this would be for an audience of 1.

wondercraftOP3y ago

Yeah the hyperpersonalised content is super interesting. For 10' of this audio, compute would cost about $1. So at $1 per daily episode, a realistic price a business would charge for this is $49 a month. Would you pay that? Maybe you could opt in for ads, that would pay that $1 for your attention. After all, the advertiser would know exactly your interests, based on your twitter feed.

An alternative is the semi-personalisation (e.g. HN, or subreddits), where the content might not be hyper personal, but still close enough, and the cost is absorbed by community.

ceedan3y ago· 1 in thread

I think I would love to have something like this generated from data in github pull requests, closed JIRA tickets, and confluence pages. I would very very willingly spend 15 minutes a day listening and learning about progress on different projects across my organization

outime3y ago

Reminds me of a very recent neat feature from Slack [1] that among other things summarizes the unread channel messages.

[1] https://slack.com/blog/news/introducing-slack-gpt

magdyks3y ago· 1 in thread

A big fan of this project! I would be really interested in seeing if you can select a different voice or select the length you would like

wondercraftOP3y ago

Thanks! Yeah there's a selection of different voices and you can even clone your own. Length is really up to you, no restriction!

vishnuharidas3y ago· 1 in thread

This is really outstanding. Especially the voice - it sounds real human (or is it a real human?)

Looking forward to see you in _Google Podcasts_ soon.

wondercraftOP3y ago

Just enabled!

zvolsky3y ago· 1 in thread

Well executed! I had a similar idea in mind, except doing an independent summary of what went on in the parliament. The data is available at https://parliamentlive.tv . Subtitles are available but the AI would need to remember speaker names and voices to know who said what.

wondercraftOP3y ago

Unless there is a transcript that comes with it? If you have one try creating a summary for at https://app.wondercraft.ai/

chaxor3y ago· 1 in thread

The AI pronounced `sudo` as "su-dough" instead of "su-doo". Literally unwatchable

:P

This is fantastic work, keep up the great job.

wondercraftOP3y ago

hehe thanks!!

Zetice3y ago· 1 in thread

I know your focus is on the podcast/audio side of things (and you nailed this, very very cool), but it would be interesting to me to see how you generated the summaries themselves, if that was available, or at least a brief summary of the code, to understand what it took to get to that point.

wondercraftOP3y ago

Hey! Get in contact (team AT wondercraft.ai), happy to chat!

wayeq3y ago· 1 in thread

It might be fun to add a few different podcasts for the same news day that present the material in different tones, like "Witty", "Dry", and "Sarcastic". I imagine it would just be adding prompt info for the LLM generating the text.

wondercraftOP3y ago

Might need to play with a few different voices as well, to match the tone of the language. But yeah totally plausible. Different languages as well, easily done!

mariorojas3y ago· 1 in thread

such a great idea! I wonder how a podcast with two voices sounds, it would be interesting to hear the AI interaction

wondercraftOP3y ago

Thanks Mario. It gets a bit more challenging when two voices are interacting to be honest.. You can test it out at https://app.wondercraft.ai/ if you want :)

bravura3y ago· 1 in thread

wondercraft, just FYI, when you oauth in the app name is some random firebase app URL. Keep up the good work though!

wondercraftOP3y ago

Yup, thanks! Need to get to that.

kiru_io3y ago· 1 in thread

This is pretty cool! Thank you for sharing.

What was the technical effort to create such a podcast?

wondercraftOP3y ago

Thanks for the nice words!

The technical work is split in three pieces: 1) LLM prompt and chaining for script generation 2) workarounds for some TTs bugs (not that many, elevenlabs is amazing) 3) Stitching all moving pieces together

Nothing advanced for tech, but requires a lot of experimentation as neither LLMs nor TTS are deterministic which create headaches for prod.

sabujp3y ago· 1 in thread

I love it, thanks for 2x voice speed up

wondercraftOP3y ago

Thank you!

MoSattler3y ago· 1 in thread

really good stuff!

wondercraftOP3y ago

Thanks!!

tpmx3y ago

Listening to the May 4th ep: The (amount of) dead air between segments feels a bit off. Feels related to the timing of the muzak. Maybe get some experienced podcast producer/audio engineer/whatever to design the transitions for you?

Otherwise: wow.

hxugufjfjf3y ago

Does this mean we can now get good-sounding audiobooks of pretty much any book available in digital format in the near future?

kyriakosel3y ago

I'm shocked by how realistic it sounds. Can we embed it in our website/blog ?

r0fl3y ago

Wow this is amazing! I was going to launch a similar podcast for a completely different audience this Friday. I was going to use a multi step approach to create the episodes.

Going to delay the idea and try out windercraft, this has blown my mind!

nullsense3y ago

The background music was just so distracting I couldn't focus at all on what was being said and had to turn it off almost immediately. Granted I do have ADHD, so YMMV.

the intro music, it has a circular feel to it. who can tell me about this? why is it used when it's used?

impressive but kind of an onslaught of words

j / k navigate · click thread line to collapse