Weird A.I. Yankovic: a cursed deep dive into the world of voice cloning (opens in new tab)

(waxy.org)

328 pointswaxpancake2y ago198 comments

198 comments

104 comments · 14 top-level

minimaxir2y ago· 28 in thread

This article only covers the musical aspects of AI voice cloning, but there's another dynamic to AI voice cloning that's more complicated: replacing general voice actors in movies/video games/anime (example: https://www.axios.com/2023/07/24/ai-voice-actors-victoria-at... )

Unlike musicians who can't be replaced without significant postprocessing, have enough money to not be impacted by competition, and have legal muscle, voice over artists:

- Can be reproduced with good-enough results from out-of-the-box voice cloning settings on ElevenLabs or an open source equivalent (Bark, VALL-E X)

- Are already underpaid for their work as-is

- Have no legal ownership of their voice since they are contractors, and their voicework is owned by their clients who may not be as incentivised in protecting the VO.

I want to write a blog post about it but I suspect most people on Hacker News won't be interested in a treatise on the cultural impacts of the voicework in Persona 5 and Genshin Impact.

sumtechguy2y ago

What I find interesting is this aspect that eventually, these companies will hire some college kids who needs a couple thousand bucks and a free pizza. Have them read the right scripts. Sign the right 'give everything away' contract and just forever use their voice. Or do it sneaky. Have a voice assistant and in your ToS 'we can use a copy of your voice for anything'.

The existing voice actors will be just out of work. There will be a small cadre of groups that want real voice. But for some projects that will not be that important.

Its going to get crazy.

Legend24402y ago

They don't need that - they already have enough data to generate plausibly human voices that don't sound like anyone in particular.

Voice cloning is a special case, these models are equally good at making new voices.

1 more reply

HappyDaoDude2y ago

I have said this will initially be sold as a feature on things like Audiobooks.

Pick your book, pick your reader and away it goes. The Diary of Anne Frank read by Gilbert Gottfried.

2 more replies

minimaxir2y ago

Recent voice models by OpenAI, Meta, and ElevenLabs all state upfront they work with paid professional voice actors, so this space will get intetesting fast.

hiccuphippo2y ago

Mozilla has a voice data project where people already do it for free(dom) ;)

https://commonvoice.mozilla.org/en

supriyo-biswas2y ago

HN isn't the only community to write for. While most people here seem to be unsympathetic to such job concerns, unconventional articles do hit the front page from time to time.

I'd like to read it, in any case.

pixl972y ago

The get rich at any cost type like to post on these articles at a higher rate I think. When you read a larger and broad range of HN posts you see a substantial part of the population here has concerns about this.

rcarr2y ago

+1, I would also like to read it

1 more reply

ImprobableTruth2y ago

Voices are uncopyrightable, but impersonation isn't legal (see Midler v. Ford, for a notable case), so I don't think the situation is totally clear.

deepsun2y ago

> voice actors are fearing that the ability for generative AI to replicate their voices may cost them work

I'm not sure how to feel about that. I'm against the idea that some people "deserve" being paid for being lucky born with an interesting voice.

On the other hand, the world always worked like that. And, say, hard-working farmer or doctor were also lucky being born with necessary traits to make for their living, while others weren't.

2 more replies

sofixa2y ago

It's always funny to me when people cite old American case law and try to wrangle their heads around how that can apply to a situation which the case's participants couldn't have possibly imagined. Shouldn't the correct way to do this be new legislation being created after consulting interest groups to answer the modern problems which exist due to modern realities, like what the EU is doing? It seems much more sensible of an approach instead of wondering how a 15th century ruling's ruler would have applied his thinking about something they couldn't even dream of.

2 more replies

lazide2y ago

As long as they don’t claim the voice is the original actor (misspell the name perhaps, or the Hollywood classic ‘based on’), they won’t be impersonating no?

1 more reply

GuB-422y ago

Interesting note: many Vocaloids (most notably Hatsune Miku) are sampled from voice actors rather than singers.

Singers didn't want software clones, but voices actors are fair game.

sublinear2y ago

I have a different take on this.

AI voice is cheaper, but it's also a more boring and generic performance. There is zero progress made towards any sort of creative AI that produces good unique work.

The market for this then is small businesses who can't afford a professional voice actor. AI is opening up new markets, not killing the jobs of the truly talented.

chefandy2y ago

This is the case for all generative "art." The people at the high end will still get paid well. The people who specialize in more utilitarian or low budget tasks in higher volume will take the biggest hit. Nobody who'd planned on hiring Morgan Freeman to do a voice over will be tempted to use AI Morgan Freeman instead.

aussieguy12342y ago

The MVP might have the free "good enough" AI voiceover, it takes less money to bootstrap a new product that way.

The real product would have a real voice over actor paid for with VC money.

matteoraso2y ago

>There is zero progress made towards any sort of creative AI that produces good unique work.

It's only been a year. Give it some time and I'm sure AI will have much better results. Right now, you can get some of that unique work by finetuning the AI off of a person's existing portfolio.

zerojames2y ago

I am interested! You should write about what you find interesting; never worry if it will interest a particular group.

foobarian2y ago

It saddens me because of how much impact they had on my family as we played through the story line in Genshin and immersed in the world. At some point we met a few of the voice actors at a convention and they were like stars to us, while I'm sure their circumstances are as you describe.

raytopia2y ago

I'd be interested.

Most likey you'd see a lot of people saying that somehow getting rid of voice actors is good for "progress". Whatever that means.

Random aside someone really needs to make a hackernews that focuses more on game development and other arts so blog posts like your talking about would have a proper community to discuss them with.

Legend24402y ago

Replacing voice actors with text-to-speech is good because it lets you do things voice actors can't:

* Create dynamic new voice lines at runtime, for example game characters reacting to new situations.

* Operate at a scale that's infeasible for humans, for example turning every ebook into an audiobook.

1 more reply

dylan6042y ago

> and their voicework is owned by their clients who may not be as incentivised in protecting the VO.

The work product produced by their voice for fulfilling the contract is owned. No corp owns someone else's voice.

Jeff_Brown2y ago

Porperty is a bundle of rights, and often hard to pin down. In the case of voices, if a company owns enough of your data to train a good simulacrum, and they have the right to do it, then they kind of do own your voice -- or more precisely, a damn good substitute.

2 more replies

minimaxir2y ago

They don't own the voice, but they own the vocal performance, which ends up being a meaningless legal distinction in practice.

It's one reason why VAs rarely take fan requests for a character they voice.

1 more reply

rockemsockem2y ago

No one owns a voice at the moment. There is no mechanism in the US to own a voice, even your own.

1 more reply

aaroninsf2y ago

<raises hand> I am

EGreg2y ago

Please do. Some of us critique capitalism

rcarr2y ago

It's sad if the only way voice actors are going to be able to make a living is by doing stuff like Critical Role on Youtube. I love Critical Role but it likely wouldn't be the same if those guys hadn't spent years honing their craft. Watching people play RPGs online has replaced a lot of my streaming viewing now, but the market is much smaller and I imagine it can only sustain a much smaller pool of creatives than the current voice over market can.

mecredis2y ago· 17 in thread

It's kind of wild that these tools just transfer a copy of these models every time they're spun up (whether it's to a Google Colab notebook or a local machine.)

This must mean Hugging Face's bandwidth bill must be crazy, or am I missing something (maybe they have a peering agreement? heavily caching things?)

satertek2y ago

Their Python module caches the downloads, which is checked before downloading them again...but you're probably not wrong on the crazy bandwidth bill. Looks like they have crazy VC money though, considering the current climate.

minimaxir2y ago

The Colab notebooks are a fresh and independent session with no caching.

1 more reply

civilitty2y ago

Unmetered 10+ gigabit connections were on the order of $1/mbit/mo wholesale over a decade ago when I priced out a custom CDN so for the cost of 100 TB of data transfer out of AWS you could get a 24/7 sustained 10gbit/s (>3 PB per month at 100% utilization).

Bandwidth has always been crazy cheap.

hotnfresh2y ago

Not all connections are created equal. Even some big providers clearly have iffy peering agreements upstream that’ll manifest as terrible performance if you have a widely-geographically-distributed bandwidth-heavy load.

colechristensen2y ago

Indeed. If you're not using a cloud provider bandwidth is extremely cheap.

In fact locally I can get a 10 gbps home internet unmetered connection for $300/mo.

I'm not sure how they'd react if I transferred 1 PB/mo though :)

2 more replies

morkalork2y ago

If you host copies of your data with a few big providers could you do something smart like detect and redirect requests from AWS to an S3 bucket and not pay for bandwidth leaving the provider?

anonylizard2y ago

Huggingface has a strategic partnership with AWS.

1. AWS is far behind Azure and GCP in AI, so they gotta partner up to gain credibility.

2. Huggingface probably does face insane bills compared to github. But AWS can probably develop some optimizations to save bandwidth costs. There's 100% some sort of generalized differential storage method being developed for AI models.

fomine32y ago

AWS egress traffic charge is just outrageous so they can easily offer huge discount without improvement

jandrese2y ago

One doesn't usually opt for AWS when their goal is to reduce transfer costs.

1 more reply

toddmorey2y ago

Is hugging face just a model repository like GitHub is a code repository? Seems you can rent compute both cpu & gpu, but you are right that most models seem to be run elsewhere.

fragmede2y ago

Yes, exactly.

pdntspa2y ago

I really wish I could configure this crap to cache somewhere other than my C: drive

Or better yet, how about asking me where I want to store my models?

thulle2y ago

On linux there's the XDG_CACHE_HOME env variable for pip, but strangely enough there doesn't seem to be an windows equivalent.

callalex2y ago

I haven’t used windows in a while but I thought it supported some form of cross-volume symlink? Or at least mounting an image stored on another volume to an arbitrary path.

1 more reply

jonluca2y ago

You can do a lot of these fully locally with things like RVC web ui or https://tryreplay.io/

echelon2y ago

https://fakeyou.com has unlimited free RVC without an account. The UI needs work, though.

tr33house2y ago

wish they had something for Linux

mckirk2y ago· 10 in thread

My absolute favorite application of this tech so far is The Beach Boys singing 'Hurt'. It's the first time I seriously didn't notice any artifacts, and it just works so well even though it really shouldn't.

Enjoy: https://youtu.be/gmNSFqyg_Z8

dwringer2y ago

I don't know what I was expecting but that isn't Hurt, it's Surfin' USA with Hurt's lyrics that sound extremely jittery and grainy.

I'm curious though if some AI soon could in fact synthesize the Beach Boys' style with the actual chords and melody from the NIN song, possibly with some of the pathos of Johnny Cash as well.

legitster2y ago

I agree. The "x words over y music" can be fun, but isn't really impressive as a true genre parody.

The one that always comes to mind for me is this video of an Eminem interview done from scratch as a Talking Heads song: https://www.youtube.com/watch?v=Kfl3N9nesRg

This is potentially something that generative AI could be good at doing (at least recreating vocals), but this parody of the Talking Heads required a lot of very clever insight into what made a good Talking Heads song and returned a convincing and novel melody. And I think we are still a ways off.

2 more replies

sumtechguy2y ago

The one I found fun was the matrix ice ice baby mashup. That was sort of janky but good enough to be fun.

tracerbulletx2y ago

The name of the channel is "There I Ruined It".. That's the point, the person who created it did it specifically to make you feel like that.

darkerside2y ago

Yeah, I hate it to the point of being personally offended. It has nothing to do with Johnny Cash's rendition. I'd probably feel a bit better, but not much, if it were advertised as a NIN mashup.

3 more replies

danjc2y ago

Where is my mind by Sinatra - https://youtu.be/BO4cKQ8DxYU?si=uzF_TdrQoZOAOlEc

qup2y ago

That's... Really nice.

code_runner2y ago

This account is one of the absolute top tier creators for weird music mixes. The recent deep faking stuff has been shockingly good. I think this is a good example of an "acceptable" use of AI, as long as artists/composers etc rights are all settled.

its always more fun when its a real group of talented people being silly, but I'd listen to an album of weird mashup like this for sure.

hinkley2y ago

The graininess of the recording covers over a lot of potential problems. But given that this attempt keeps the Beach Boy’s tempo and enunciation, I think this technique, whatever it is, would make a much more compelling version of Michael Jackson covering Eat It.

nsbk2y ago

That hurt

satvikpendem2y ago· 10 in thread

What's the best open source text to speech? Eleven Labs and others are interesting but closed source. I want to use them mainly for audiobooks as I have a lot of ePubs and I'm just using the basic Google text to speech voices on my Android, via Moon+ Reader. It works fine but it's still more robotic than state of the art.

entrepy1232y ago

POST-EDIT, CORRECTED ANSWER

I doubt it's currently actually "the best open source text to speech", but the answer I came up with when throwing a couple of hours at the problem some months ago was "ttsprech" [3].

Following the guide, it was pretty trivial to make the model render my sample text in about 100 English "voices" (many of which were similar to each other, and in varying quality). Sampling those, I got about 10 that were pretty "good". And maybe 6 that were the "best ones" (very natural, not annoying to listen to, actually sounded like a person by and large), and maybe 2 made the top (as in, a tossup for the most listenable, all factors considered).

IIRC, the license was free for noncommercial use only. I'm not sure exactly "how open source" they are, but it was simple to install the dependencies and write the basic Python to try it out; I had to write a for loop to try all the voices like I wanted. I ended using something else for the project for other reasons, but this could still be a fairly good backup option for some use cases, IMO.

PRE-EDIT, ERRONEOUS ANSWER

Same as above, but I had said "Silero" [0, 1, 2] originally, which I started trying out too, before switching to a third (less open) option.

  [0] https://github.com/snakers4/silero-models#text-to-speech
  [1] https://silero.ai
  [2] https://github.com/snakers4/silero-models#standalone-use
  [3] https://github.com/Grumbel/ttsprech#usage

lhl2y ago

For neutral sounding very fast/efficient voices, I find Coqui TTS VITS models to be very good. For slower, more expressive voice or voice cloning I think the Coqui TTS XTTS is good (or you can look at the mrq/tortoise-tts).

I'm still awaiting a StyleTTS2 implementation. The audio samples sound top notch: https://styletts2.github.io/

modeless2y ago

You're in luck, the code dropped 6 hours ago :) https://github.com/yl4579/StyleTTS2

Looks promising, I'm going to check it out too! MIT license, even! If it's fast enough for real time, it could be the new best option. The paper claims faster inference than VITS...

1 more reply

NoMoreNicksLeft2y ago

We bought the $300/month plan for a few months earlier this year... and you'd only get 40 hours of audio generation for that. It wasn't really sufficient to our needs.

How many audio books is 40 hours?

Also, while its voice cloning was truly amazing, every once in awhile the voice would get a little nutty and sound like an insect just flew down their throat, or maybe they had an LSD flashback. Normal normal normal then it's some Bobcat Goldthwaite skit. And if you dialed down that parameter (I think it's called stability?) then it goes monotone really quickly.

We're probably several years out from it being something people use personally for audio books.

echelon2y ago

> We bought the $300/month plan for a few months earlier this year... and you'd only get 40 hours of audio generation for that. It wasn't really sufficient to our needs.

All of these AI as a Service (AaaS?) API companies are going to race each other to razor thin margins. Immediately after ElevenLabs raised, five other TTS services raised nearly the same amount of money.

dylan6042y ago

>How many audio books is 40 hours?

Are you reading War & Peace or Cat In The Hat?

2 more replies

modeless2y ago

I've tried a few, not an expert, but I think Coqui's new XTTS models are decent performance and quality wise (just in terms of how the speech sounds, can't speak to the voice cloning fidelity as I don't care about that). Open source code but non-commercial license for the model. They also have a bunch of models with more permissive licenses that aren't as good.

I doubt they're better than Google's TTS though.

ticulatedspline2y ago

Bark seems pretty good

https://github.com/suno-ai/bark Demo at https://huggingface.co/spaces/suno/bark

In the couple samples I tried it was substantially better at picking up meaning compared to VALL-E-X

follower2y ago

> What's the best open source text to speech?

I haven't re-evaluated OSS TTS options for a few months but from my own experience earlier in the year I've been pleased with the results I've gotten from Piper:

* https://github.com/rhasspy/piper

I've primarily used it with the LibriTTS-based voices due to their license but if it's for personal local use you can probably use some of the other even higher quality voices.

The official samples are here: https://rhasspy.github.io/piper-samples/

Here's a small number of pre-rendered samples I've used that were generated from a WIP Piper port of my Dialogue Tool[0] project: https://rancidbacon.gitlab.io/piper-tts-demos/

While it's not perfect & output quality varies for a number of reasons, I've been using it because it's MIT licensed & there's multiple diverse voice options with licenses that suit my purposes.

(Piper and its predecessors Larynx & Mimic3 are significantly ahead of where other FLOSS options had been up until their existence in terms of quality.)

[0] https://rancidbacon.itch.io/dialogue-tool-for-larynx-text-to...

----

Edit to add links to some of my notes related to FLOSS TTS, in case they're of interest:

* https://gitlab.com/RancidBacon/notes_public/-/blob/main/note...

artninja19882y ago

Would also like to know this. Can't seem to find an open source tts engine that works on mobile to read muh books

simonw2y ago· 8 in thread

I did not know about this: "The center of the A.I. cover songs community is a massive 500,000+ member Discord called A.I. Hub, where members trade new tips, tools, techniques, and links to their original and cover songs."

codetrotter2y ago

Me neither. That’s what’s so weird about the internet.

Imagine half a million people out in the streets together. You’d definitely notice that. Meanwhile, we can have these massive online communities and you’d never know unless you accidentally stumbled across it or someone told you about it.

evan_2y ago

more accurate to say that, while 500,000 people joined the discord by clicking a link, some much, much smaller number are actually active on any sort of a regular basis

3 more replies

lmm2y ago

> Imagine half a million people out in the streets together. You’d definitely notice that.

In the streets, sure. Meeting up at out of town conference centers a few times a year, probably not. Most real communities have always been "dark matter" to those outside them; Discord working the same way feels more authentic than most of the internet.

joenot4432y ago

Something I think we're slowly coming to terms with is that the current generation of techies (the ones who can afford to spend hours upon hours tweaking models and sharing results) really prefer Discord over our Web 2.0 forum type communities like this one. Even reddit on, which is lagging in popularity amongst Gen-Z when compared to Discord or TikTok, you can immediately tell upon reading /r/LocalLLMs that a really big chunk of this community are underaged. To be clear, I think this is a good thing!

There was a generation that preferred mailing lists. There was a generation that preferred IRC and BBS, and "my" generation which likes forums and lengthy comment threads. One would be naiive to think this style (the one we're engaging in here) would last forever.

There are definitely very real criticisms of Discord, searchability and discoverability being the most common, but at this point I think the die has been cast. Young people have made their choice.

BandButcher2y ago

Agree, im in my early 30s and jump through most platforms, but very little with tiktok/discord. but i have to admit a lot of newer content (and tech framework support) has migrated to discord channels. Even some YouTube sports talk shows have their own discord for call ins, etc...

These big teleconference apps are usually hit or miss but discord seems to be the winner currently for actual "social networking", also add in its trend in the gaming community

tavavex2y ago

I kind of disagree? I am gen Z myself, and have used reddit extensively. While I like Discord a lot, I strongly disagree with using it to host content, essentially gating non-members from getting what they want (which is what leads to these communities with ludicrously inflated member counts). And this sentiment definitely isn't just me, a lot of the techie "CS major" people I know lean towards using slightly older services - which is also probably why the aforementioned /r/localllama community still has more than 60 thousand members.

That being said, Discord does have some advantages over older forum-type communities - it's usually way better for cultivating smaller communities, and its no-effort-required chat systems means that you can always hop on and discuss things that are on the cutting edge. This is quite important in a field like AI, where it feels like something revolutionary happens every other week.

(Also, I don't know if that implication was intentional, but gen Z and "underaged" haven't meant the same thing for many years now)

1 more reply

ThrowawayTestr2y ago

Are we so out of touch? No, it's the children who are wrong.

jrm42y ago

I poked around there for a while, and my takeaway was "sub-par" all around, which might be the reason for it's relative obscurity? The thing is, I can't tell to what extent it's the tech, and to what extent it's just "very uninteresting source material."

Like, there's a whole lot of "classic song done by presently popular rapper," and I'll be the first to insist that there is nearly nothing vocally interesting at all coming from todays popular hip-hop artists (and I say this as an extreme long-time hip-hop aficionado)

1 more reply

causi2y ago· 6 in thread

AI song covers are incredible, from Goku singing "Don't Stop Me Now" to the cast of Spongebob singing "Ocean Man".

ssalka2y ago

My favorite is the Mr. Krabs cover of "Billie Jean"

https://www.youtube.com/watch?v=CkQ-44PvTs8

civilitty2y ago

Mr Krabs rapping Lose Yourself by Eminem [1] is all the evidence I've ever needed that Clancy Brown should have been a rapper.

[1] https://www.youtube.com/watch?v=d7N6jOziN4E

1 more reply

all22y ago

This is actually good. Hysterically so.

shepherdjerred2y ago

Let me share my favorite: Plankton - Beggin'

https://www.youtube.com/watch?v=tJjhObngcxI

lostlogin2y ago

https://m.youtube.com/watch?v=XzqbhDqAEtw

cm20122y ago

Would have strongly preferred DBZA goku :)

1 more reply

RecycledEle2y ago· 4 in thread

Wow. I just realized any one of us could redo Weird Al's songs with his lyrics, but with the original singer's voice. We could be listening to Michael Jackson singing "Just Eat It" by lunchtime.

I am constantly amazed at how the new AI tech can be used.

Of course this would be illegal under most countries copyright laws.

unnah2y ago

There's also a Weird Al piece "I think I'm a clone now", for which an AI clone voice performance would definitely be fitting. (The original song was "I think we're alone now" by Tommy James and the Shondells, but it seems Weird Al was parodying the cover by Tiffany in the 1980's.)

While Weird Al himself asks for permission, it's well established that parody is not copyright infringement. There should be room for parody performances by AI voices as well, especially if argued by a good lawyer.

mbg7212y ago

Al is very self-aware (that second character is a lower-case ell), he's less concerned with legal entities than with his relationships with musicians.

greenhearth2y ago

How would this be amazing? It just sounds stupid and a waste of time.

RecycledEle2y ago

And...they already did it.

distantsounds2y ago· 4 in thread

The sampled voices sound neither like Michael Jackson nor Weird Al. A good effort, but a professional impersonator could likely do better on either front.

nemo44x2y ago

It sounds like Weird Al trying to be Michael Jackson trying to be Weird Al.

Reventlov2y ago

As a non native speaker, it does sound a bit like Michael Jackson imo…

1 more reply

hinkley2y ago

The best Michael Jackson interpreter in a town of 50,000 could do better than this. It’s… this is bad.

code_runner2y ago

I know what you mean. Its more noticeable (imo) on the Michael one.... but its definitely in there. I think the pitch correction is to blame for a bit of the weirdness.

hinkley2y ago· 3 in thread

> Artifacts aside, it sounds like Michael Jackson doing a Weird Al impression?! Every line has a distinctly “white and nerdy” vibe: it loses any seriousness and edge, exaggerating words for comic effect and enunciating lyrics really clearly so the punchlines can be heard.

No, it sounds like someone doing doing an impression of Weird Al doing an impression of Michael Jackson. Someone whose mom told them they were special and they believed it.

These examples are standing on a ridge line, surveying the uncanny valley and looking for the best way to cross.

blagie2y ago

... they're good enough.

I have an accent. If not for that, I'd be a great presenter.

If I could translate my voice into a poor Neil deGrasse Tyson, a poor Patrick Steward, a poor Carl Sagan, a poor Morgan Freeman, etc., my presentations would be... better.

hinkley2y ago

If it makes you more comfortable and confident, that is helping you.

This isn't autotune for the spoken word, though. It's not fixing pacing or vocabulary, and in the audio above it isn't even fixing intonation. Yes, a thick German accent will give you away as being of German extraction. But you're also using the word 'since' when Brits and Americans would use 'for', and it's not going to fix that. Any more than it'll fix my french when I make the exact same mistake going the other direction (for=duration vs for=purpose vs for=interval). If I hear 'since one month' you're likely German or Indian. If you ask how long I've been in Marseille you'll know I'm American in about half that time.

totetsu2y ago

Finally a way to not have to fix societies Prejudices just give everybody the tools to emulate the ideal of perfection no matter what color their skin or what their accent sounds like.

smath2y ago

Related article from 1 year ago on Darth Vader’s voice being AI generated going forward:

https://arstechnica.com/information-technology/2022/09/james...

mito882y ago

"celebrity voices impersonated"

Watch Light My Fire on YouTube Music https://music.youtube.com/watch?v=lN3v3EfA6_A&si=_hcG3Wjakxd...

ddmf2y ago

The most recent episode of Tacoma FD covered something similar to this mixed with a messed up Christmas Carol.

dreamcompiler2y ago

> ... Tom Waits, LeBron James, Knuckles, and, uh, Adolf Hitler.

I can't figure out if this is an example of Godwin's Law or not.

Calamitous2y ago

Key takeaway:

> No current artificial intelligence is powerful enough to hide the weirdness of Weird Al.

1 more reply

j / k navigate · click thread line to collapse

198 comments

104 comments · 14 top-level

minimaxir2y ago· 28 in thread

Unlike musicians who can't be replaced without significant postprocessing, have enough money to not be impacted by competition, and have legal muscle, voice over artists:

- Can be reproduced with good-enough results from out-of-the-box voice cloning settings on ElevenLabs or an open source equivalent (Bark, VALL-E X)

- Are already underpaid for their work as-is

- Have no legal ownership of their voice since they are contractors, and their voicework is owned by their clients who may not be as incentivised in protecting the VO.

I want to write a blog post about it but I suspect most people on Hacker News won't be interested in a treatise on the cultural impacts of the voicework in Persona 5 and Genshin Impact.

sumtechguy2y ago

The existing voice actors will be just out of work. There will be a small cadre of groups that want real voice. But for some projects that will not be that important.

Its going to get crazy.

Legend24402y ago

They don't need that - they already have enough data to generate plausibly human voices that don't sound like anyone in particular.

Voice cloning is a special case, these models are equally good at making new voices.

1 more reply

HappyDaoDude2y ago

I have said this will initially be sold as a feature on things like Audiobooks.

Pick your book, pick your reader and away it goes. The Diary of Anne Frank read by Gilbert Gottfried.

2 more replies

minimaxir2y ago

Recent voice models by OpenAI, Meta, and ElevenLabs all state upfront they work with paid professional voice actors, so this space will get intetesting fast.

hiccuphippo2y ago

Mozilla has a voice data project where people already do it for free(dom) ;)

https://commonvoice.mozilla.org/en

supriyo-biswas2y ago

HN isn't the only community to write for. While most people here seem to be unsympathetic to such job concerns, unconventional articles do hit the front page from time to time.

I'd like to read it, in any case.

pixl972y ago

rcarr2y ago

+1, I would also like to read it

1 more reply

ImprobableTruth2y ago

Voices are uncopyrightable, but impersonation isn't legal (see Midler v. Ford, for a notable case), so I don't think the situation is totally clear.

deepsun2y ago

> voice actors are fearing that the ability for generative AI to replicate their voices may cost them work

I'm not sure how to feel about that. I'm against the idea that some people "deserve" being paid for being lucky born with an interesting voice.

On the other hand, the world always worked like that. And, say, hard-working farmer or doctor were also lucky being born with necessary traits to make for their living, while others weren't.

2 more replies

sofixa2y ago

2 more replies

lazide2y ago

As long as they don’t claim the voice is the original actor (misspell the name perhaps, or the Hollywood classic ‘based on’), they won’t be impersonating no?

1 more reply

GuB-422y ago

Interesting note: many Vocaloids (most notably Hatsune Miku) are sampled from voice actors rather than singers.

Singers didn't want software clones, but voices actors are fair game.

sublinear2y ago

I have a different take on this.

AI voice is cheaper, but it's also a more boring and generic performance. There is zero progress made towards any sort of creative AI that produces good unique work.

The market for this then is small businesses who can't afford a professional voice actor. AI is opening up new markets, not killing the jobs of the truly talented.

chefandy2y ago

aussieguy12342y ago

The MVP might have the free "good enough" AI voiceover, it takes less money to bootstrap a new product that way.

The real product would have a real voice over actor paid for with VC money.

matteoraso2y ago

>There is zero progress made towards any sort of creative AI that produces good unique work.

It's only been a year. Give it some time and I'm sure AI will have much better results. Right now, you can get some of that unique work by finetuning the AI off of a person's existing portfolio.

zerojames2y ago

I am interested! You should write about what you find interesting; never worry if it will interest a particular group.

foobarian2y ago

raytopia2y ago

I'd be interested.

Most likey you'd see a lot of people saying that somehow getting rid of voice actors is good for "progress". Whatever that means.

Random aside someone really needs to make a hackernews that focuses more on game development and other arts so blog posts like your talking about would have a proper community to discuss them with.

Legend24402y ago

Replacing voice actors with text-to-speech is good because it lets you do things voice actors can't:

* Create dynamic new voice lines at runtime, for example game characters reacting to new situations.

* Operate at a scale that's infeasible for humans, for example turning every ebook into an audiobook.

1 more reply

dylan6042y ago

> and their voicework is owned by their clients who may not be as incentivised in protecting the VO.

The work product produced by their voice for fulfilling the contract is owned. No corp owns someone else's voice.

Jeff_Brown2y ago

2 more replies

minimaxir2y ago

They don't own the voice, but they own the vocal performance, which ends up being a meaningless legal distinction in practice.

It's one reason why VAs rarely take fan requests for a character they voice.

1 more reply

rockemsockem2y ago

No one owns a voice at the moment. There is no mechanism in the US to own a voice, even your own.

1 more reply

aaroninsf2y ago

<raises hand> I am

EGreg2y ago

Please do. Some of us critique capitalism

rcarr2y ago

mecredis2y ago· 17 in thread

It's kind of wild that these tools just transfer a copy of these models every time they're spun up (whether it's to a Google Colab notebook or a local machine.)

This must mean Hugging Face's bandwidth bill must be crazy, or am I missing something (maybe they have a peering agreement? heavily caching things?)

satertek2y ago

minimaxir2y ago

The Colab notebooks are a fresh and independent session with no caching.

1 more reply

civilitty2y ago

Bandwidth has always been crazy cheap.

hotnfresh2y ago

colechristensen2y ago

Indeed. If you're not using a cloud provider bandwidth is extremely cheap.

In fact locally I can get a 10 gbps home internet unmetered connection for $300/mo.

I'm not sure how they'd react if I transferred 1 PB/mo though :)

2 more replies

morkalork2y ago

If you host copies of your data with a few big providers could you do something smart like detect and redirect requests from AWS to an S3 bucket and not pay for bandwidth leaving the provider?

anonylizard2y ago

Huggingface has a strategic partnership with AWS.

1. AWS is far behind Azure and GCP in AI, so they gotta partner up to gain credibility.

fomine32y ago

AWS egress traffic charge is just outrageous so they can easily offer huge discount without improvement

jandrese2y ago

One doesn't usually opt for AWS when their goal is to reduce transfer costs.

1 more reply

toddmorey2y ago

Is hugging face just a model repository like GitHub is a code repository? Seems you can rent compute both cpu & gpu, but you are right that most models seem to be run elsewhere.

fragmede2y ago

Yes, exactly.

pdntspa2y ago

I really wish I could configure this crap to cache somewhere other than my C: drive

Or better yet, how about asking me where I want to store my models?

thulle2y ago

On linux there's the XDG_CACHE_HOME env variable for pip, but strangely enough there doesn't seem to be an windows equivalent.

callalex2y ago

I haven’t used windows in a while but I thought it supported some form of cross-volume symlink? Or at least mounting an image stored on another volume to an arbitrary path.

1 more reply

jonluca2y ago

You can do a lot of these fully locally with things like RVC web ui or https://tryreplay.io/

echelon2y ago

https://fakeyou.com has unlimited free RVC without an account. The UI needs work, though.

tr33house2y ago

wish they had something for Linux

mckirk2y ago· 10 in thread

Enjoy: https://youtu.be/gmNSFqyg_Z8

dwringer2y ago

I don't know what I was expecting but that isn't Hurt, it's Surfin' USA with Hurt's lyrics that sound extremely jittery and grainy.

I'm curious though if some AI soon could in fact synthesize the Beach Boys' style with the actual chords and melody from the NIN song, possibly with some of the pathos of Johnny Cash as well.

legitster2y ago

I agree. The "x words over y music" can be fun, but isn't really impressive as a true genre parody.

The one that always comes to mind for me is this video of an Eminem interview done from scratch as a Talking Heads song: https://www.youtube.com/watch?v=Kfl3N9nesRg

2 more replies

sumtechguy2y ago

The one I found fun was the matrix ice ice baby mashup. That was sort of janky but good enough to be fun.

tracerbulletx2y ago

The name of the channel is "There I Ruined It".. That's the point, the person who created it did it specifically to make you feel like that.

darkerside2y ago

Yeah, I hate it to the point of being personally offended. It has nothing to do with Johnny Cash's rendition. I'd probably feel a bit better, but not much, if it were advertised as a NIN mashup.

3 more replies

danjc2y ago

Where is my mind by Sinatra - https://youtu.be/BO4cKQ8DxYU?si=uzF_TdrQoZOAOlEc

qup2y ago

That's... Really nice.

code_runner2y ago

its always more fun when its a real group of talented people being silly, but I'd listen to an album of weird mashup like this for sure.

hinkley2y ago

nsbk2y ago

That hurt

satvikpendem2y ago· 10 in thread

entrepy1232y ago

POST-EDIT, CORRECTED ANSWER

I doubt it's currently actually "the best open source text to speech", but the answer I came up with when throwing a couple of hours at the problem some months ago was "ttsprech" [3].

PRE-EDIT, ERRONEOUS ANSWER

Same as above, but I had said "Silero" [0, 1, 2] originally, which I started trying out too, before switching to a third (less open) option.

  [0] https://github.com/snakers4/silero-models#text-to-speech
  [1] https://silero.ai
  [2] https://github.com/snakers4/silero-models#standalone-use
  [3] https://github.com/Grumbel/ttsprech#usage

lhl2y ago

I'm still awaiting a StyleTTS2 implementation. The audio samples sound top notch: https://styletts2.github.io/

modeless2y ago

You're in luck, the code dropped 6 hours ago :) https://github.com/yl4579/StyleTTS2

Looks promising, I'm going to check it out too! MIT license, even! If it's fast enough for real time, it could be the new best option. The paper claims faster inference than VITS...

1 more reply

NoMoreNicksLeft2y ago

We bought the $300/month plan for a few months earlier this year... and you'd only get 40 hours of audio generation for that. It wasn't really sufficient to our needs.

How many audio books is 40 hours?

We're probably several years out from it being something people use personally for audio books.

echelon2y ago

> We bought the $300/month plan for a few months earlier this year... and you'd only get 40 hours of audio generation for that. It wasn't really sufficient to our needs.

dylan6042y ago

>How many audio books is 40 hours?

Are you reading War & Peace or Cat In The Hat?

2 more replies

modeless2y ago

I doubt they're better than Google's TTS though.

ticulatedspline2y ago

Bark seems pretty good

https://github.com/suno-ai/bark Demo at https://huggingface.co/spaces/suno/bark

In the couple samples I tried it was substantially better at picking up meaning compared to VALL-E-X

follower2y ago

> What's the best open source text to speech?

I haven't re-evaluated OSS TTS options for a few months but from my own experience earlier in the year I've been pleased with the results I've gotten from Piper:

* https://github.com/rhasspy/piper

I've primarily used it with the LibriTTS-based voices due to their license but if it's for personal local use you can probably use some of the other even higher quality voices.

The official samples are here: https://rhasspy.github.io/piper-samples/

Here's a small number of pre-rendered samples I've used that were generated from a WIP Piper port of my Dialogue Tool[0] project: https://rancidbacon.gitlab.io/piper-tts-demos/

While it's not perfect & output quality varies for a number of reasons, I've been using it because it's MIT licensed & there's multiple diverse voice options with licenses that suit my purposes.

(Piper and its predecessors Larynx & Mimic3 are significantly ahead of where other FLOSS options had been up until their existence in terms of quality.)

[0] https://rancidbacon.itch.io/dialogue-tool-for-larynx-text-to...

----

Edit to add links to some of my notes related to FLOSS TTS, in case they're of interest:

* https://gitlab.com/RancidBacon/notes_public/-/blob/main/note...

artninja19882y ago

Would also like to know this. Can't seem to find an open source tts engine that works on mobile to read muh books

simonw2y ago· 8 in thread

codetrotter2y ago

Me neither. That’s what’s so weird about the internet.

evan_2y ago

more accurate to say that, while 500,000 people joined the discord by clicking a link, some much, much smaller number are actually active on any sort of a regular basis

3 more replies

lmm2y ago

> Imagine half a million people out in the streets together. You’d definitely notice that.

joenot4432y ago

There are definitely very real criticisms of Discord, searchability and discoverability being the most common, but at this point I think the die has been cast. Young people have made their choice.

BandButcher2y ago

These big teleconference apps are usually hit or miss but discord seems to be the winner currently for actual "social networking", also add in its trend in the gaming community

tavavex2y ago

(Also, I don't know if that implication was intentional, but gen Z and "underaged" haven't meant the same thing for many years now)

1 more reply

ThrowawayTestr2y ago

Are we so out of touch? No, it's the children who are wrong.

jrm42y ago

1 more reply

causi2y ago· 6 in thread

AI song covers are incredible, from Goku singing "Don't Stop Me Now" to the cast of Spongebob singing "Ocean Man".

ssalka2y ago

My favorite is the Mr. Krabs cover of "Billie Jean"

https://www.youtube.com/watch?v=CkQ-44PvTs8

civilitty2y ago

Mr Krabs rapping Lose Yourself by Eminem [1] is all the evidence I've ever needed that Clancy Brown should have been a rapper.

[1] https://www.youtube.com/watch?v=d7N6jOziN4E

1 more reply

all22y ago

This is actually good. Hysterically so.

shepherdjerred2y ago

Let me share my favorite: Plankton - Beggin'

https://www.youtube.com/watch?v=tJjhObngcxI

lostlogin2y ago

https://m.youtube.com/watch?v=XzqbhDqAEtw

cm20122y ago

Would have strongly preferred DBZA goku :)

1 more reply

RecycledEle2y ago· 4 in thread

Wow. I just realized any one of us could redo Weird Al's songs with his lyrics, but with the original singer's voice. We could be listening to Michael Jackson singing "Just Eat It" by lunchtime.

I am constantly amazed at how the new AI tech can be used.

Of course this would be illegal under most countries copyright laws.

unnah2y ago

mbg7212y ago

Al is very self-aware (that second character is a lower-case ell), he's less concerned with legal entities than with his relationships with musicians.

greenhearth2y ago

How would this be amazing? It just sounds stupid and a waste of time.

RecycledEle2y ago

And...they already did it.

distantsounds2y ago· 4 in thread

The sampled voices sound neither like Michael Jackson nor Weird Al. A good effort, but a professional impersonator could likely do better on either front.

nemo44x2y ago

It sounds like Weird Al trying to be Michael Jackson trying to be Weird Al.

Reventlov2y ago

As a non native speaker, it does sound a bit like Michael Jackson imo…

1 more reply

hinkley2y ago

The best Michael Jackson interpreter in a town of 50,000 could do better than this. It’s… this is bad.

code_runner2y ago

I know what you mean. Its more noticeable (imo) on the Michael one.... but its definitely in there. I think the pitch correction is to blame for a bit of the weirdness.

hinkley2y ago· 3 in thread

No, it sounds like someone doing doing an impression of Weird Al doing an impression of Michael Jackson. Someone whose mom told them they were special and they believed it.

These examples are standing on a ridge line, surveying the uncanny valley and looking for the best way to cross.

blagie2y ago

... they're good enough.

I have an accent. If not for that, I'd be a great presenter.

If I could translate my voice into a poor Neil deGrasse Tyson, a poor Patrick Steward, a poor Carl Sagan, a poor Morgan Freeman, etc., my presentations would be... better.

hinkley2y ago

If it makes you more comfortable and confident, that is helping you.

totetsu2y ago

Finally a way to not have to fix societies Prejudices just give everybody the tools to emulate the ideal of perfection no matter what color their skin or what their accent sounds like.

smath2y ago

Related article from 1 year ago on Darth Vader’s voice being AI generated going forward:

https://arstechnica.com/information-technology/2022/09/james...

mito882y ago

"celebrity voices impersonated"

Watch Light My Fire on YouTube Music https://music.youtube.com/watch?v=lN3v3EfA6_A&si=_hcG3Wjakxd...

ddmf2y ago

The most recent episode of Tacoma FD covered something similar to this mixed with a messed up Christmas Carol.

dreamcompiler2y ago

> ... Tom Waits, LeBron James, Knuckles, and, uh, Adolf Hitler.

I can't figure out if this is an example of Godwin's Law or not.

Calamitous2y ago

Key takeaway:

> No current artificial intelligence is powerful enough to hide the weirdness of Weird Al.

1 more reply

j / k navigate · click thread line to collapse