My country is already has blasphemy lynching mobs based on the slightest perceived insult, real or imagined. They will mob you, lynch you, burn your corpse, then distribute sweets while you family hide and issue video messages denouncing you and forgiving the mob.
And this was before AI was easy to access. You can say a lot of things about 'oh backward countries' but this will not stay there, this will spread. You can't just give a toddler a knife and then blame them for stabbing someone.
Has nothing to do with fame, with security, with copyright. This will get people killed. And we have no tools to control this.
https://x.com/search?q=blasphemy
I fear the future.
I had my 70 year mother ask me last week if she should remove her voicemail message because can't people steal her voice with it? I was surprised but I guess she heard it on a Fox segment or something.
I think it might be a rough couple years but hopefully we'll be through it soon.
Worse, there isn't an attitude of default skepticism in many areas/cultures. If a person is suspected of violating the moral code the priority will be punishment and reinforcing that such behavior isn't acceptable. Whether or not the specific person actually did the specific act is a secondary concern.
It's just going to increase the number of people who will be harmed or killed.
Out of curiosity, how much training data is needed currently to mimic a voice at various levels of convincingness?
> My country is already has blasphemy lynching mobs
in your case the problem is not AI, it’s your country.
If an AI fake-porn of some ordinary person involving a minor was unleashed, think of the utter shame and horror they would be treated by people for the rest of their lives, even if it were proven false.
No one would believe them, work with them, hire them, rent them, they would wish they had been lynched instead of the life they live.
Or https://www.npr.org/2024/09/19/nx-s1-5114047/springfield-ohi... , where repeating racial libel causes a public safety problem.
While this kind of incitement in no way requires AI, it's certainly something that's easier to do when you can fake evidence. See also https://www.bbc.co.uk/news/articles/c5y87l6rx5wo
As far as I can tell the collective conscious of every country is swayed by propaganda.
A written headline is enough to incite rage in any country much less a voice or video indistinguishable from the real thing.
Folks in “developed’ countries have their lives destroyed or ended all the time based on rumors of something said or done.
Sure, distrust everything digital, but what if only evidence of someone doing something wrong is digital?
In the UK a government was just elected with a historical absolute majority by only ten million people, and now first time offenders are being sent to prison for making stupid offensive statements online.
It’s like if someone said “I’m scared of someone bringing a semi-automatic weapon to my school and doing a mass shooting. My country has lax laws about guns and their proper use”. And then you said “in your case the problem is not guns, it’s your country”.
I mean, it’s technically true, but also unhelpful. Such ingrained laws are hard to change and you can be placed in danger for even trying.
Before someone decries the gun example as not being comparable, it is possible to live in a country with a monumental number of guns and not have mass murdering every day. It’s called Switzerland.
But let’s please stick to the subject of AI, which is what the thread is about. The gun example is the first analogy which came to mind, and analogies are never perfect, so it’s unproductive to nitpick the example. I don’t mean to shift the conversation from one contentious topic to another.
> what if AI was used to imitate a person saying something blasphemeous?
I've been contemplating writing an open letter to Dang to nuke my account. Because at this time you can likely deanonymize any user with a fair amount of comments. As long as you can correlate. You can certainly steal their language, even if not 100% accurate. It may be caution, but it isn't certain that we won't enter a dark forest and there's reason to believe we could be headed that way. But at the same time, is not retreating to the shadows giving up?Now imagine that account was linked to a SIM. It's trivial for a nefarious actor to get it re-activated, infact there was a video by Veritasium just today where they didn't even need your SIM.
But even if they are not that hi-tech, it's not that hard to get a SIM issued in your name, or other hacks of a similar nature, we have all heard of stories.
Worse, you lost that SIM a decade back, the number gets back into the queue, and is eventually re-issued to someone new... and they try to create a facebook account, and are presented with yours.
They can then re-activate your old facebook account, and post a video/audio/text of "godelski" saying they like pineapple on pizza. and before you can defend yourself, the pizzarias have lynched you.
(I dare not use a real example even as a jest, I live here)
Are you 100% sure of all your old social media accounts, all the SIM you have ever used to log-in to accounts?
We leave a long trail.
Of course, this could be misused to post something with plausible deniability, but if you want to say something controversial, why wouldn't you make another account for that anyway?
I know that one could theoretically sign posts with GPG, but it would be much nicer and less noisy if sites would have UI to show something like: Signed by <fingerprint>, key used for N years.
One issues is that most social media want your identity to be the account on their service and not some identity (i.e. key) that you control.
Not a great outlook though, if everybody does this...
If stylometric analysis runs on all comments on the internet then yeah.
Bad things will happen, very very bad.
I honestly think it should be at least illegal to do this kind of analysis because it'll be a treasure trove for the commercial sector to mine this data correlated to real people not to think of the destruction in millions of people with personal anonymous blogs etc.
Actually thinking about it further you could also easily group people political affiliations, and all kinds of other thoughts, dark, dark stuff!
I suppose even a throwaway could be linked to my identity if a comment was long enough, but probably only with some limited certainty.
Yes this violates any EU citizen's right to be forgotten under GDPR. Welcome to silicon valley.
The same way it took social media like reddit a few years of "finding the culprit" / "name and shame" till mods figured out that many times the online mob gets it wrong and so now that is usually not allowed.
But many people will suffer this until laws get passed or it enters into common consciousness that a video is more likely to be fake than it is to be real. Might be more than 5 years though. And unfortunately laws usually only get passed after there's proven damage to some people from it.
This kills the medium.
Just as ubiquitous scam calls have moved people away from phones, this moves people away from using media which cannot be trusted. Done enough this destroys reporting and therefore democracy. I wonder when the first nonexistent candidate will be elected.
Check the twitter link, you won't have to scroll much to find a mullah being blasted for blasphemy. No one is safe.
That, and cryptographic materials being used to sign stuff too.
I think that's possibly the best we can hope for from a technical perspective as well as waiting for the legal system to catch up.
I could then trivially generate pictures or even videos of you e.g. by knowing your name. Of course that's just an example but I do think that's where we are headed and so the concept of "trust" will change a lot.
https://www.dallasnews.com/news/watchdog/2021/03/19/its-mind...
Once it's 2 clicks away to generate a believable video of someone at the kkk kitten barbecue getting along with ted bundy and jeff epstein, surely the evidence value of that would dwindle, and the brief period in history when video evidence was both accessible and somewhat believable will come to an end.
Do you know "The boy who cried wolf"? Fabricate some allegations yourself and this will train people to disbelieve them.
You are assuming that people who are part of lynch mobs have the critical thinking skills to differentiate between real vs fake, and use logic.
Reminds me of the post I read on twitter, of some Thai/Chinese New Yorker whose mother told him not to speak Mandarin in public when COVID related Anti-Asian hate was rampant....
And he had to explain to her that she can't expect the sort of person who hits a random Asian to differentiate between Thai and Mandarin.
I recall photoshop blackmailing stories where usually woman were the target. Now literally "everyone" knows pictures can be manipulated/photoshopped. It will take a while yes, but eventually common folk will learn that these audios/videos can't be trusted.
Even if blasphemy is illegal in your country, people would probably agree that falsely accusing someone of blasphemy is also wrong.
The religion isn't the (whole) issue here, this situation can apply in the secular West just as easily. The punishment won't be death, but it can still ruin people's lives. A fake pedophilia accusation comes to mind, where even if proven innocent you'll still be royally fucked for the rest of your life unless you spend considerable expense and effort.
Sure, not lynch mobs, but AI-generate fake media can certain ruin people's lives, and unlike photshop etc, the barriers of skill and time required are very low, and the quality is very high.
I share my country's experience because I wanted to share my personal perspective and fears, but please don't under estimate how AI can affect you. Just because you won't be death doesn't mean they can't turn you into a social pariah with a few clicks.
In a more serious vein, this is definitely about unleashing an extremely powerful technology, at scale, for profit, and with insufficient safeguards (imagine if you could homebrew nuclear weapons -- that's inconceivable!)
There will be collateral damage. How much, and at what point will it trigger some legislation? Time will tell.
To explain for a more developed country context, the fakes that previously required skill in Photoshop and Audacity etc now is much simpler to implement with AI, allowing far more dipshits to create and share fake image/audio/video of someone they are pissed at during their lunch break on their phone.
That's way too quick, allowing people to shoot far too many arrows in a huff, before their reasonable brain has time to make them realise the consequences of their actions.
Blasphemy laws—and the violence that sometimes accompanies them—are a cultural issue, not a technological one. When the risk of mob violence is in play, it's hard to have rational discussions about any kind of perceived offense, especially when it can be manipulated, even technologically, as you pointed out. The hypothetical of voice theft amplifies this: If a stolen voice were used to blaspheme, who would truly be responsible?
This is why we must resist the urge to give into culturally sanctioned violence or fear, regardless of religious justification. The truth doesn’t need to be violently defended; it stands by itself. If a system cannot tolerate dissent without devolving into chaos, then the problem lies within the system, not the dissent.
“An appeaser is one who feeds the crocodile, hoping it will eat him last.” - Winston Churchill
Sure we have mobs and you don't, but we are talking about AI here.
Infact let's imagine a totally different culture to illustrate my point.
Imagine you are an Israeli, and people in your office have a habit of sending Whatsapp voice notes to confirm various things instead of calls, because that way you can have a record but don't have to type every damn thing out. Totally innocent and routine behaviour, you are just doing what many other people do.
A colleague pissed at you for whatever damn stupid reason creates a fake of your voice saying you support Hamas by using said voice notes, using an online tool that doesn't cost much or require much... are you saying just because you won't be lynched, that there isn't a problem?
You are confused why everyone is pissed at you and why suddenly your boss fired you, and by the time you find out the truth... the lie has spread to enough people in your social circle that there is no clearing your name.
Think of how little data in voice samples is required to generate an audio clip thats sounds very realistic, and how better it will get in an year. You don't need fancy PC or tech knowledge for that, already websites exist that do for cheap.
Just because you weren't lynched is no solace.
People are the problem, AI is just providing quality tools with minimal skill and cost required, thus broadening the user base.
I'm sorry, but this is a cope out. The "lynching from apparent cultural deviation" is something that needs to be moved on from. Developed countries do the same too to some extent, with "cancel culture" and such.
There are ways to have progress in this, and, well, to feed someone's entrepreneurial spirit, it's one of those really hard problems that a lot of people, let's say, "a growing niche market", needs it to be solved.
But Blasphemy by whatever means, is one of the tools by which society sets certain boundries, and it's really hard to move away from a model that worked so 'well' for us since the first civiliations.
It's in my profile :)
In about 5 years AI voices will be bespoke and more pleasant to listen to then any real human: they're not limited by vocal cord stress, can be altered at will, and can easily be calibrated by surveying user engagement.
Subtly tweaking voice output and monitoring engagement is going to be the way forward.
While AI voices will aesthetically be indistinguishable or even preferable they aren't going to carry any reputation or authenticity, which by definition is scarce and therefore valuable. In fact they're likely going to matter more because in a sea of generic commodified slop demand for people who command unique brand value goes up, not down. That's why influencers make the big bucks in advertising these days.
The fact is, Elecrow's a company I've worked with in the past (never signed any contracts, but reviewed a product of theirs 4 years ago that they provided). They're active in the exact same space my YouTube audience is (Pi, microcontrollers, hobby electronics, homelab).
There are a number of potential Elecrow customers who also subscribe to my YouTube channel (one of them alerted me to the tutorial series, in fact), and I would rather not have people be confused thinking I've sold my likeness or voice to be used for corporate product tutorials.
Especially any competitors to Elecrow, who I may have a relationship with, that could be soured if they think I'm suddenly selling my voice/online persona for Elecrow's use.
There is not enough voice space to accommodate everyone. Authors would like to fence off and own their little voice island. For every voice there are thousands of similar ones.
There's already VTubers who's whole visual identity is synthetic. Why wouldn't the same happen in any other space where performance can affect the perception of content, but you can now simply engineer the performance?
Like I said: give it 5 years and you'll have influencers who no one has ever heard the voice of, because they don't make content with their own.
> training <
They offer different voice cloning techniques today, starting from 30 seconds of audio input (sounds somewhat like the cloned voice but definitely not exactly the same) to multiple hours of voice input (sounds like the actual person). In addition, you can adjust the voices with a few parameters or simply create one by defining parameters.
The voice from the video could be an 'instantly cloned' voice based on a few seconds of voice input (judging from the quality). If you want to do y more advanced clone, you have to proof that it is your own voice.
But we know it does matter - i.e. there's research which shows a good sound quality on a voice call improves whether people believe what you say[1].
Now in any individual session, you probably can't make particularly big alterations, but imagine say, Google or Amazon shipping a modified voice assistant voice as "the default" with every new speaker box? Whether people ask for the default voice, or change it, would all become data which tells you what people are responding to. And so right there, your new "voice of Google" or "voice of Amazon" you use in other places now becomes informed by wide-scale testing of whether people listen to it.
And that's presuming no one simply runs studies where they stick people in fMRI machines and play them an AI voice recording which they module according to neural feedback till it's "optimal".
[1] https://today.usc.edu/why-we-believe-something-audio-sound-q...
But aside this nostalgic-ish specific context, I don't see why wouldn't they just create a synthetic voice to begin with it.
I believe the point here is to litigate it before it can just freely synthesize 100 voices it stole without compensation.
We've been able to product "voices" for decades. The issue isn't the tech so much as its training set.
One can't help but wonder what theft even means any more, when it comes to digital information. With the (lack of) legal precedent, it feels like the wild wild west of intellectual property and copyright law.
Like, if even a superstar like Scarlett Johansson can only write a pained letter about OpenAI's hustle to mimic her "Her" persona, what can the comparatively garden-variety niche nerd do?
Like Geerling, feel equally sad / angry / frustrated, but merely say "Please for the love of all that is good, be nice and follow an honour code.".
For this kind of misuse, the person needs to have some fame, or it's not interesting to steal their voice. In such cases, their fame can be used for retribution. E.g. I can't imagine that this will be good for the reputation of Elecrow in the end. Next time I read the name of this company, I'll think oh it's that company that is scamming people, not good for them.
I am more worried about the cases where someone uses this to e.g. get rid of a they don't like. E.g. imagine some university lecturer that has done nothing wrong, a student is not happy with their grade, use voice cloning to imply that the lecturer said something that could get them fired. With voice cloning getting really good, how can someone like that defend themselves? (Until this becomes so commonplace recordings are not trusted anymore.)
This can still be very useful when used against non-famous people e.g. in a bitter custody dispute by one party to besmirch the other.
Theft requires the loss of benefit of the stolen object to the victim. Copy & paste just blows over the house of cards that is the system which threatens people with cages and poverty if they use the claimed meme and not pay. I will jury nullify all copyright infringement cases I end up on, where the defendant is human and not a corporation.
> One can't help but wonder what theft even means any more, when it comes to digital information.
I'm not sure this is _just_ a digital problem. Did not Eric Schmidt not recently say that you should steal things and let the lawyers figure it out later if you're successful?[0,1][0] https://x.com/alexeheath/status/1823873344133062680
[1] I mean he said you should legally steal things... whatever that means...
Copyright seems to always have one or another wild wild west going on. Maybe you are in the wrong place if the world constantly jumps and kicks from under you trying to throw you off?
They dragged the term through different phases, but that’s just projection of will. Theft is undefined for objects with .copy() interface. It’s still there when you look at it.
People have to adjust expectations, not laws. Computers replaced computers, now voice acting replaces voice actors. Your popularity doesn’t mean anything really and wouldn’t it be unfair if only popular could spare their jobs.
> Computers replaced computers, now voice acting replaces voice actors.
It's incredible what web development does to someone's ability to communicate ideas.
In other words, that's just the normal lifecycle of words in a language with an active speaker community. In any stage of history, the meaning of words is just the speaker community's projection of will.
Best I can do now is acknowledge that what counts as "theft" is a complicated topic and can't be decided by a binary "is said object still there after alleged theft has occurred?". I've benefited from some digital theft, naturally, so I might be biased to uphold my own morality but the kind of theft contemporary AI tech has enabled is something else entirely. Somewhere there is where I draw the line.
Recently, I introduced a few friends to the works of digital artist wlop. The immediate reaction was "Is that AI?". I can't help but feel offended in behalf of wlop. It doesn't help that they have made LoRAs out of his work. It's not so much the "theft" of techniques/concepts/etc. that enrages me but rather, the theft of credibility that a human is capable of this output. I imagine Jeff Geerling (and, to a lesser extent, maybe ScarJo) is enraged along similar lines. In this AI summer, other people are fighting for their livelihoods, other people are fighting for their credibility. And, of course, there's an intersection of people whose credibility is their livelihood.
Note that in reframing it as theft of credibility, the owning party has been definitely injured to an extent. As in, said object (credibility) is no longer what it once was after alleged theft has occurred.
And I'm not trying to state some Universal Truths that I will debate to death. Again the whole point is that what counts as "theft" is a complicated topic. I'm sure if you spend a bit more brainpower, you can find analogies that will make me look like a hypocrite. I'm just seeing this community lately strongly signal towards preserving some "original" meaning of words in the belief that it will solve some problem or another and I'm tired of it; I have similar linguistic thoughts about the whole uproar on the term "hallucination" but that's for another comment thread essay.
> People have to adjust expectations, not laws.
I know this thread is about theft but this attitude is downright dangerous in general. People should expect laws to adjust, lest they become irrelevant. Quick example: it's not fair to tell workers to adjust their expectation in light of the emergence of the gig economy. Should they just expect their labor to be exploited then, moving forward? I say, absolutely not. Legislation should catch-up to uphold/strengthen labor laws. Replace "gig economy" with "AI" and we are sort-of back on topic.
Question is: who is to say how much is needed before it escapes likeness theft? The king of generic nerd voices is going to claim excessive likeness and the accused lifter isn't going to reveal his whole process. Also tuning AI voices by ear is surely possible soon so category kings are not saved by demanding to be left out of training. A ministry of voice authority sounds bleak.
You are crazy.
Call your congressperson, ask them to co-sponsor and/or vote for it.
https://www.cbsnews.com/losangeles/news/california-bills-pro...
https://salazar.house.gov/media/press-releases/salazar-intro...
https://files.constantcontact.com/1849eea4801/695cfd71-1d24-...
Politician’s careers live and die in the fickle Court of Public Opinion. They’re probably the most susceptible cohort to AI fakes.
One of the rare times, it seems, that politician’s incentives are aligned with the populous. (Yes, I could have left that last part out.)
The Camry class needs its defenders, I wholeheartedly agree, but it’s also a core principle of contemporary praxis that you gotta let people choose their comfort level/ability to contribute. Encourage, promote, embolden — but try not to shame :)
Anyway, something tells me this blog post is gonna be more than enough. I don’t think basically anyone is on the side of stealing people’s voices, it’s just intuitively icky in a way that scraping the NYT and deviantart archives for training data isn’t. Public shaming isn’t gonna win him a big sack of damages, but it doesn’t seem like that’s what he’s after!
"If a product is being sold that predominantly exploits the commercial value of an individual's identity, that product should be held to violate the right of publicity and not be protected by the First Amendment, even if there is some "expressive" content in it that might qualify as "speech" in other circumstances."
Whether or not the voice is determined to be predominant would be for courts to decide, of course, but there's clearly an argument.
1: https://law.justia.com/cases/missouri/court-of-appeals/2006/...
Describing a reasonable legal principle in terms of physics phenomena does not make it unreasonable.
However, in this situation, the right of publicity is probably more applicable.
If you argue similarly, then the whole juridical system is nonsensical, because everything is just particles and waves, and different configurations thereof - not to mention the many protected things that are acts, which are neither particles nor waves, and are completely made up.
I'd say it's desirable to regulate something like this, however nonsensical-seeming, so that we can at least somewhat protect the individuals, and the general well-being of society.
She has/had two numbers; magic jack and google. When I tried to call her, the magic jack was no longer in service and google said something about "unavailable".
I reached out to my cousin (my aunt's daughter) to inquire. I was told her number (and perhaps other things) had been "hacked", whatever that means. She had recently broken her hip and was in a hospital recovering.
With this on my mind, I received a call (from the google number), strangely, while processing files with GPT. My skepticism was primed and ready, possibly making me paranoid. However, I did my due diligence and asked dozens of questions, mostly boring things that she typically wouldn't have patience for. Sometimes she'd reply with a reasonable answer and sometimes not, which made it difficult to evaluate. Toward the end, I asked where she was. She said, with an awkward tempo "I'm at home, in Cuenca", which I found odd because she'd normally just say she was at home, period. I then pressed her to tell me where she was before she returned home. She said she didn't understand. I rephrased the question, stating that it was a simple inquiry, eg "where were you before going home?" She said "this is getting too strange and confusing " and killed the call.
I notified my cousin, telling her I thought something was suspicious, still cognizant of all the characteristics one would expect from a 90 year old recovering from a serious injury. My cousin might, technology wise, be in AOL territory.
About 5 days later, I received a call from my aunt, on the google line. This time,I was more passive and cautious, but again, asked dozens of boring questions to probe the situation. I was surprised by both her ability to answer certain questions and also her inability to answer some questions. I tried to ask questions on topics we'd never discussed, in case the line had been tapped for a long time and referencing was established by an imposter. I had begun to suspect I had been paranoid. But several aspects were burning me: 1) typing noises in the background 2) Shatneresque pauses for nearly every reply 3) refusal to answer some specific questions.
At the end of our apparent conversation, I asked her to do a very serious favor for me: send me a selfie, with one hand making the thumbs up gesture. She replied "I'll send you a photo of my passport ". I replied "that's stupid, ridiculous and serves no purpose. Don't do that. Understand? Do NOT send me a passport photo. I'm asking you something very important. Do exactly what I asked. Will you do this?" Her reply: "yes. What is your email address?" This was odd. I told her she already knew and it's the same one she'd had for years. She asked that I tell her anyway. Ok, 90 years old, traumatic injury, possible prescription drugs... "It's my full name @ xyzmail com". We killed the call.
I immediately called my cousin and told her of my suspicions, including some my aunt's babbling about all her finances and accounts being inaccessible. She said that was strange because she just deposited 8k into her account. Meanwhile, a notification appears in the phone, an email from my aunt. It's a photo of her passport.
Having no authority in this situation, but plenty well annoyed, I immediately jumped on a real computer and ran the photo through exiftool. The photograph was taken in 2023 and it was August of 2024. I then grabbed the geo coordinates (cryptically presented in exiftool) and with some effort, geolocated the image to right on top of her former residence, in Cuenca.
I still don't know WTF is going on and my cousin thinks I'm a dingbat. But what I know for sure, is this is an age where such things are plausible enough and will soon be inevitable. The way I think may be deranged, but I truly don't even know if my aunt still exists. But I can have a pretty compelling conversation, either with her, or something strongly resembling her, minus the Shatneresque pauses, typing noises and selective amnesia.
For example, if you asked for a selfie, she might just think you want a picture of her, and she remembers that she has a picture of her that she took last year where she looked good (passport phot that people dress up for), and wants you to have a good photo rather than one where she looks miserable in a hospital.
The way you tell the story makes it sound suspicious, but next time I would just be direct and tell her something seems suspicious to you, that someone could impersonating her, so that is why you are asking.
If someone is targeting you, perhaps are they already saw your comment here so that hypothetical person already know you're on to them, in which case saying that on the phone won't give any new information away.
For example, if you asked for a selfie, she might just think you want a picture of her, and she remembers that she has a picture of her that she took last year where she looked good (passport phot that people dress up for), and wants you to have a good photo rather than one where she looks miserable in a hospital.
The way you tell the story makes it sound suspicious, but next time I would just be direct and tell her something seems suspicious to you, that someone could impersonating her, so that is why you are asking.
If someone is targeting you, chances are they already saw your comment here so that hypothetical person already know you're on to them, so saying that on the phone won't give any new information away.
I couldn't include our entire dialogue into an HN comment, but yes, upon prodding as deeply as I could and running out of ideas, I explained my suspicions. The response wasn't what I expected, but not direct evidence supporting my concers quite either.
If it was my aunt, she understands well. If not, the perpetrator does too.
One of a few other instances which got my attention was a voicemail she left, which I retain a recording of. It starts by saying her name, awkwardly, followed by a 5-8 second pause, then saying "Hi. This is <her name>. I always refer to her by her abbreviated single syllable name, while the voicemail used her formal, full name.
I haven't heard from her since saying that if anything went wrong, I'd be looking for fingerprints on the passport.
Maybe just ask the cousin not to send any more money?
Or go there for a weekend and check?
> ...these observations hold true of singing, especially singing by a singer of renown. The singer manifests herself in the song. To impersonate her voice is to pirate her identity...
> We need not and do not go so far as to hold that every imitation of a voice to advertise merchandise is actionable. We hold only that when a distinctive voice of a professional singer is widely known and is deliberately imitated in order to sell a product, the sellers have appropriated what is not theirs...
I think this matters more in the court of public opinion than in real court in both cases though.
1. Why clone Jeff's voice?
When I was messing with stable diffusion using Automatic1111's interface, I noticed it came with a big list of artists to add to the prompt to stylize the image in some way. There was a big row in the media about ai art reproducing artists work and many artists came forward feeling it was a personal attack. But... I mean the truth is more general than that. When I pressed a button to insert a random name into a prompt, my goal was not "yes give me this person's art for free", it was "style this somehow".
I wasn't personally interested in any particular artist, I honestly would have preferred a bunch of sliders.
Jeff here is clearly a good speaker. That's a practiced talent and voice actors exist because it's hard. Elecrow wanted a voice over and they wanted it to be as good as they could make it. Jeff is very good. So did they want Jeff?
I think what they really wanted was a good and cogent narration with the tenor of a person. Not a machine making noises that sound like english. If they had an easy way to get that, we wouldn't be talking about it here.
2. What function does copyright serve?
Well. I think a reasonable argument would be that if people were able to reproduce your work for free, you would quickly find yourself without a monetary incentive to make more of it.
So. What happens if you combine answer 1 with answer 2?
I think it leads to: "We should consider making it illegal to automatically reproduce the work of an artisan.", you know, the luddic argument. An argument that has been perceived to be, more or less, settled.
So it seems to me: That for individuals, harms matter, and for society, it doesn't.
If someone cloned both Shaq's voice and Jeff's, and used them to endorse sneakers - I think it's a fair assumption that Shaq would see this as a business risk, and Jeff .. I'm going to go out on a limb, and assume he'd probably find it hilarious. Using Jeff's voice for sneakers would be more akin to your example of finding a midwestern voice with a useful corpus. Using Shaq's would be a much more obviously targeted appropriation.
What we're looking at here appears to be exactly this scenario, except this is Jeff's niche, not Shaq's. Using Shaq's voice for SBCs and related products would feel quite absurd - using Jeff's feels like a much more obviously targeted appropriation.
I think the general assumption is that they wanted to, at the very least, strongly imply his endorsement of the product or video.
Which I would say they did effectively. If I had happened on a clip of one of these videos outside the context of this controversy, I could have easily gotten the impression he was working with the vendor.
yeah and that's the problem. The style of an artist is a developed thing. To think that one could borrow your style not through learning and caring, but through mathematically analyzing the width, and colors, and patterns and applying it to a random noise — that's kinda insulting. If nobody cares about my real work, why do they care about using my style, then? Develop your own an teach your AI on that, if there really isn't any difference.
People say that AI learns how a human would. But a human wouldn't (couldn't!) learn like an AI can. He can't look at the pixels, can't mechanically churn through patterns. If someone can learn from art like AI learns art, I would also be opposed to them learning anything from me :D
Jeff has worked with and endorsed some of their products before, so that puts a wrench in that theory of "well they just picked a clean voice" and makes this almost litgable.
>I think it leads to: "We should consider making it illegal to automatically reproduce the work of an artisan.", you know, the luddic argument. An argument that has been perceived to be, more or less, settled.
There's the labor argument: People who's voices are samples should get a residual on the product they are being used for. Combine that with some sort of lack of liability on the subject when AI is used and we'd have a win-win.
But that requires money and companies don't want to pay other people. So we come at an impasse that leads to the luddite argument. Take the ball and go home if you don't want to pay. The fact that this comes into so few people's minds shows how successful companies are at casting off the idea of residuals.
Except that never happened and the voice belonged to a completely different voice actress and Scarlett Johanssen had exactly zero right to prevent this person from making money as a voice actress lending it to AI.
These complaints remind me a little bit of the story that a man complained that his photo was used to illustrate the article about how all hipsters look the same and it eventually turned out it wasn't his photo.
She said no.
So they found a soundalike and Tweeted out references to the movie Her (starring Scarlett Johanssen as an AI chat bot) in the days leading up to launch.
Scummy as fuck from OpenAI regardless of the technical legal rights and issues involved.
If there were just two blonds in the world, one famous and the other not and you wanted a blonde actress for the role and she said no is it scummy to hire the other one?
Is it scummy to hire "discount" Matt Damon instead of Matt Damon?
I don't have a dog in this fight but just to be clear, OpenAI has stated that they paid a voice actor to create the voice ("Sky") that sounds like Scarlett Johanssen. There was no "cloning" or "stealing" (that they say).
https://openai.com/index/how-the-voices-for-chatgpt-were-cho...
They may have been absolutely fine if Altman never approached Scarlet. But context matters.
It's kind of like saying "how do we know that face is similar/identical"? Humans are surprisingly good at knowing when something feels off in other humans. Even if we lack the vocabulary to fully explain the difference.
We’ve had people who are skilled voice mimics for ever, and they mostly exercise their skills for comedy/satire, and not for misrepresenting people’s opinions. IANAL either but I guess this is based on solid legal grounds, and misrepresenting people would be relatively easy to deal with legally.
I guess the difference is democratisation - we’ve moved from very few people having this skill, to virtually anyone with a computer being able to do something similar. And so policing it will be much tougher, and likely beyond the means of someone like Jeff Geerling if it would require legal action to remedy.
Computers made graphic design approachable, but early adopters oversaturated the market before it stabilized. We’ll eventually figure out social norms and regulations for AI voice mimicry too, but there will be chaos first. Also, tech always moves faster than law. By the time courts catch up, this will be old news.
Make a video, say what you think, get views, and probably put more pressure on Elecrow to respond.
It was linked in the article.
Does this controversy all become free publicity for elecrow?
Although it was not too hard to create I believe making it easier is something i don't like to achieve...
I hate to say this but ruining a narrators existence with AI seems to get easier every day.
Regarding how easy it was to clone my favorite narrators voice with open source tools I'm a bit afraid of what amazon could do with a whole cloud and massive man power
Is that some sort of a coat of arms?
There is absolutely zero evidence for this. I find it infuriating that this keeps being stated as a fact. So they go and hire a voice actor and clearly use her voice to train, but then they also scrape Scarlett Johansson from youtube and splice it into the training data to make the voice a bit more like hers? Really does that sound realistic?
Motive: Altman had some weird boyish thing for her and they asked her first, she said no.
Means: Lots of available data to use from her movies. They probably trained a model first without releasing it just because it’s ridiculously easy. Especially for OpenAI.
Opportunity: AI is astonishingly good at laundering and remixing without exposing the training set, for previously-unseen levels of plausible deniability.
They just about manage to make a good multimodal transformer that can generate audio and you expect that right away they can also interpolate in latent space? How does that actually work? It's not so simple. What benefit do they have from training on Scarlett Johansson's data, because they sure as hell have a big risk. They clearly hired a voice actress and they clearly told her to sound like Scarlett Johansson in "Her", and the end result perfectly fits with that. The voice doesn't "uncannily sound like SJ", no it just vaguely resembles her voice and mostly just mimicks the mannerisms from the movie. For me this is a perfect example of Occam's razor. One explanation is simple and realistic. The other explanation requires significantly more advanced AI control than OpenAI has claim/demonstrated, and it requires Altman to be so obsessed with the SJ idea that he goes out of his way to secretly train on her voice, risking legal exposure, while still hiring a voice actress.
Since that guy was CEO of Google it’s all good right???
https://www.theverge.com/2024/8/14/24220658/google-eric-schm...
We definitely need to overhaul a lot of these white collar fines. I just watched a video today and learned the federal maximum fine for being caught using child labor is capped at $15k per worker. No wonder child labor has skyrocketed over the decade.
It looks like we're heading in that direction.
IANAL and not sure about regional precedence on these topics, but there are plenty of ads where lookalikes or voice actors are used to use someone's likeness. they are mostly in satire, but there is yet to be a case where there was a litigation over this or prior approval needed.
we have ai-based voice abuse in the political sphere, and where there was only one legislation for banning the use in voice calls for one country (https://news.ycombinator.com/item?id=39304736), another country actively used the same underlying tech to aid their own rallies (https://news.ycombinator.com/item?id=40532157).
the tools are here to stay, but what is fair use needs to be defined more than ever.
Satire is one of the few use cases of fair use that hasn't been torn down. So that tracks.
> there is yet to be a case where there was a litigation over this or prior approval needed.
There's quite a few over impersonation. Most broadcast media knows how to skirt the rules though.
Same as it happens with unauthorized use of someone’s images. And platforms and their moderation teams have processes in place to report and remove that. Looks like we need something similar for voice.
You can absolutely positively find a free lawyer if your issue is interesting enough.
This is the most interesting issue of our day.
https://techcrunch.com/2024/09/19/here-is-whats-illegal-unde...
Not sure if those laws apply to Jeff tho, as they concern porn, politics and employer contracts.
[1] https://old.reddit.com/r/redscarepod/comments/1fmiiwt/which_...
Regulating prolong adoption and take resources.
Most likely all existing youtubers will have complete voice and video digital clones made out of them. Then you can also tune an LLM on their scripts and it'll respond in the same character as well.
In theory you could also bring back ones who are dead, which would be very interesting in a historical sense. Like if we had hundreds of hours of Napoleon talking in front of a camera, it would be trivial to recreate a digital version of him for anthropologic study, maybe even having various figures debate things with each other. That's what historians a century later after we all die will be able to do with impunity.
We already had fake news and organizations willingly spread fake news.
We had clearly fake pictures and people believing that.
Flat-earthers, no-vax and whatever.
This is just another brick in the wall.