Things are about to get worse for generative AI (opens in new tab)

(garymarcus.substack.com)

403 pointseddyzh2y ago755 comments

755 comments

216 comments · 101 top-level

ctoth2y ago· 9 in thread

Everybody just buying into the corporate narrative that anyone can actually own these sorts of things.

Who truly owns the tales of Snow White and Cinderella?

These stories didn't originate with Disney; they are part of a rich tapestry of folklore passed down through generations. Disney's success was partly built on adapting these existing narratives, which were once shared and reshaped by communities over centuries.

This conversation shouldn't just be about the technicalities of AI or the legalities of copyright; it should be about understanding the deep roots of our shared culture.

At its core, culture is a communal property, evolving and growing through collective storytelling and reinterpretation.

The current debate around AI and copyright infringement seems to overlook this fundamental aspect of cultural evolution. The algorithms might be new, but the practice of reimagining and repurposing stories is as old as humanity itself.

By focusing solely on the legal implications and ignoring the historical context of cultural storytelling, we risk overlooking the essence of what it means to be a creative society.

As a large human model, (no really I could probably lose some weight) I think it's just silly how we're all sort of glossing over the fact that Disney built their house of mouse on existing culture, on existing stories, and now the idea that we might actually limit the tools of cultural expression to comply with some weird outdated copyright thing is just...bonkers.

jerf2y ago

"Who truly owns the tales of Snow White and Cinderella?"

If you want to make your point, you need to choose something that isn't already public domain. Disney already only owns their own interpretations, and, arguably, whatever penumbric emanation they can convince a court is stealing from them, but it still certainly isn't the entire space of Snow White and Cinderella stories. There is some fairly recent stuff being used in the images in the article and there isn't even any question whether or not it's Mario or Coca Cola; if Nintendo and Coca Cola did a cross promotion I could believe the exact images that popped out.

If they were trying to claim the entire concepts of dumpy plumbers dressed in any manner vaguely like Mario that would be one thing... but that's Mario and Luigi, full stop. That's Robocop. That's C3PO. It's not even subtle. If we can AI-wash those trademarks away then we can AI-wash absolutely anything.

2 more replies

fnordpiglet2y ago

While a great concept in practical reality we live under a system of laws not of our individual devising, and known to be imperfect. While we can advocate for reform, reality is, LLM makers will be judged under the current law as it currently is formulated. The novelty will be the LLM and its technologies, not a total rethink of copyright under some noble cultural openness concept.

So, it’s not actually a corporate narrative, it’s actually the law that the narrative stems from, right or wrong. Maybe corporations had a huge role in shaping the law (I’d note copyright benefits individuals as well, though), but it is not mere propaganda or shaping of a shared reality through corporate narrative. It’s enforced by the guys with the guns and jails, as arbitrated by a judge.

It absolutely must be about the technicalities of the law as it’s at the basis a legal issue. By hand waving it away and claiming the social narrative is the right discussion you ignore the material consequences and reality in favor of a fantasy. We absolutely should -also- discuss the stifling nature of copyright and intellectual property, but you can’t ignore what’s actually happening here at the same time.

1 more reply

greenthrow2y ago

This reply is so incredibly out of touch with reality. Copyright law is very clear. If anything the "corporate narrative" here is that "AI" is somehow something new and different and these laws don't apply. Which is nonsense.

wwweston2y ago

> culture is a communal property

Public domain / communal property is also part of copyright, so it's not as if this is some forgotten concept that needs to be restored to the discourse.

Georgism is underconsidered, though.

> By focusing solely on the legal implications and ignoring the historical context of cultural storytelling

The legal implications are human implications and as much a part of culture as anything else. They have to do with what's fair and how rewards for effort are recognized and distributed. Formalizing this is less important in cultures that aren't oriented around market economies, which seems to be what much of this "rich tapestry of folklore" discourse wants to evoke and have us hearken back to, but that doesn't describe any society that's figuring out how to handle AI.

> we might actually limit the tools of cultural expression to comply with some weird outdated copyright thing is just...bonkers.

What's bonkers is the life in the literally backwards idea copyright is (or should be) mooted or outdated by novel reproduction capabilities.

The specific capabilities at the time were industrialized printing. People apparently much smarter than the typical software professional realized that meant some badly aligned incentives between (a) those holding these new reproduction capabilities and (b) those who created the works on which the value of those new reproduction capabilities relied. The heart of the copyright bargain is in aligning those incentives.

Specific novel reproduction techniques can change the details of what's prohibited or restricted or remitted and how and on what basis and powers/limits of enforcement, etc etc. But the they don't change the wisdom in the bargain. The only thing that would change that is a better way of organizing and rewarding the productive capacity of society.

1 more reply

iainctduncan2y ago

The idea that we should dispense with it to let generative AI companies make even more money seems totally bizarre.

2 more replies

pardoned_turkey2y ago

Oh come on. Copyright is a fairly ancient concept that benefits normal people as much as it benefits big corporations. Most book authors, songwriters, and so on aren't fat cats, and they would be harmed if we had zero protections for the duplication of their work. They'd need to depend on state sponsorship or charitable private patronage, both of which are problematic for obvious reasons and limit the range of artistic expression more than the market does.

Instead, we came up with a system where you can actually derive fairly steady revenue by creating new works and sharing them with the world. And critically, I think you misinterpret it as calling dibs on shared culture or on stories. Copyright is usually interpreted fairly narrowly, and doesn't prevent you from creating inspired works, or retelling the same story in your own words.

Generative AI is a problem largely because it destroys these revenue streams for millions of people. Yeah, it will be litigated by wealthy corporations with top-notch lawyers, for self-interested reasons. But if we end up with a framework that maintains financial incentives to artistic expression, it's probably a good thing.

2 more replies

up2isomorphism2y ago

You probably have to make one thing very clear before venting on big corporations: do you think these ips have value or not? If they do why you do not want to pay to the owner? If it does not then you shouldn’t use it. Either way there will be no conflict.(BTW OpenAI is an also a corporation in fact backed by one of the biggest corporation in the world).

syndacks2y ago

Did you read the article? Who owns Mario? Nintendo owns Mario, full stop. Your argument completely eschews the legal system of which modern society depends on to function as effectively as it does. There’s a reason you can’t steal other people’s work.

2 more replies

walt742y ago

Agreed, but to tackle the problem from that perspective would require making LLMs a public good, preferably run by the state, akin to public libraries. This could not only solve for the copyright problem, the state may even make it mandatory for publishers to contribute their published writings to the public LLMs. I'm sure libertarian tech bros have that in mind when they insist on open source development (which then opens another whole can of worms when you consider interpolative knowledge as intellectual nuclear fission, but that's another story).

Havoc2y ago· 9 in thread

To me that’s the wrong question.

Everyone knew it was trained on copyrighted material and capable of eerily similar outputs.

But it’s already done. At scale. Large corps committing fully. There is no chance of that toothpaste going back in the tube.

It’s a bit like when big tech built on aggressive user data harvesting. Whether it’s right, ethical or even legal is academic at this stage. They just did it - effectively without any real informed consent by society. Same thing here - 9 out of 10 people on street won’t be able to tell you how AI is made let alone comment on copyright.

So the right question here is what now. And I suspect much like tracking the answer will be - not much.

janice19992y ago

> There is no chance of that toothpaste going back in the tube.

I disagree - we've been here before. The same could be said of many technologies, like cheap music recording/manufacture. You can record an artist once and make records at scale. However no one would think you could record Taylor Swift once and make unlimited copies without paying her.

You should read up on the musicians strike of 1942. [0]

[0[ https://jacobin.com/2022/03/1940s-musicians-strike-american-...

chubot2y ago

This comment is ignorant of history

It happened with Napster, then Apple Music, now streaming services

There is no widespread file sharing in the general public, instead we have devices that we don’t own, and streaming subscriptions

Apple didn’t just copy all the music onto iPods and sell it — it took them a decade of deal making and lots of money to acquire the rights to the content

I’m not saying what’s right or wrong, just saying that this comment has very little understanding of these battles

4 more replies

j_maffe2y ago

That's a really eloquent way of saying "It's already happening, so give up on it." I'm sure it works out great for taking action and solving problems.

3 more replies

_xnmw2y ago

So you're saying this is a fait accompli. Like many great innovations in tech, break the law because the law is silly; remember when Uber and AirBnB were illegal in most major cities and achieved market dominance anyway?

I say, good riddance. I never believed in any such thing as "intellectual property" anyway, I say, get rid of it all, patents, copyright, and the whole pile of imaginary "rights". More than half the world (i.e. the Global South) don't recognize these rights anyway, and it is becoming increasingly difficult to enforce it without draconian legal overreach and monopolistic centralization.

1 more reply

anonymousab2y ago

Or they can be forced to destroy or retrain their models without any copyright materials for which they don't have or do not now attain licenses for. These are multi-billion/trillion dollar companies. They can afford to be responsible members of society here, however much their shareholders and C-suite might hate it.

2 more replies

pxoe2y ago

making sure that a dataset is clean and not full of material that's improperly sourced, copyrighted, unfit for use due to licensing or ethics, is not nearly hard enough nor "impossible" for it to be a situation where people should just "give up".

and yes, while open source models might be harder to regulate, those big corporations that currently use those things without distinction, exist as pretty established entities, and profit from services they offer in millions of dollars. there's more of a substantial existence, and more of a substantial scale of money they actually move. and they don't just "make a tool available", or have users do unambiguous actions where it would be the users that are infringing on anything, but do indeed use questionably sourced data and turn that into a model and offer that as a service. dirty data is very much a part of the deal with those.

FridgeSeal2y ago

You’re right, we should all just give up at the first hurdle, because “they’ve” already gotten away with it, hell, let’s just feed our children to the machine and elect openAI as the rulers of the world, after all, they’ve already succeeded, so we should just give up entirely. Definitely a good attitude to take.

ZitchDog2y ago

Napster hit scale too.

2 more replies

aatd862y ago

Data is dynamic. Ok for old data. What about new data?

zarzavat2y ago· 7 in thread

This for me does not make sense as a copyright violation. It’s like saying that Adobe is in trouble because you drew something infringing in Photoshop. If you prompt the model with the intention of creating something infringing by mentioning the name of the characters and the work, and you get something infringing out, then it’s you who have infringed the copyright, not the maker of the tool.

Alifatisk2y ago

> If you prompt the model with the intention of creating something infringing by mentioning the name of the characters and the work, and you get something infringing out, then it’s you who have infringed the copyright, not the maker of the tool.

Yeah but that is not the case, they never mentioned Mario and Luigi, yet, that's what the output turned out to be.

mattmanser2y ago

The user didn't create it, the cloud-hosted machine owned by OpenAI, that charges for access, did.

When prompted with 'futuristic robot' and 'italian plumbers'.

So the argument is that if openAI had not used copyrighted and trademarked source material, this wouldn't be happening. It's not transformative as it's reproducing these copyrighted materials and trademarks verbatim.

That's how it makes sense.

techdmn2y ago

This is an interesting idea. I assume that while the protected material would be obvious in some case, in many it would not. Would the tool have to be able to identify (and properly attribute) copyrighted material in its output?

fzeroracer2y ago

No, it's more akin to if Photoshop had a 'Mario' stamp which when used would stamp a random piece of Mario artwork from the games. Do you think this would be in violation of copyright?

Uvix2y ago

What about when you prompt the model without the intention of creating something infringing, and still get those same characters out in the result?

Xeamek2y ago

The post shows many examples where the prompt explicitly avoids any mentions of copyrighted materials but the generated results includes them regardless.

Did you even read the post?

But also, the argument of 'user responsibility' doesn't hold up on its own regardless (imo).

If I make and sell a toy printer that can only ever produce 3 pictures, and all of them contains copyright materials, would you really say that it's fine and responsibility falls under the end user? And I could sell that printer without any issues?

Lorak_2y ago

Did you read the article? It shows a lot of examples when no specific names are mentioned, or even with very generic prompts producing copyrighted material.

1 more reply

keiferski2y ago· 6 in thread

These don't seem all that difficult to fix to me. Most of the examples are not really generic, but are shorthand descriptions of well-known entities. "Video game plumber" is practically synonymous with "Mario" and anyone that has the slightest familiarity with the character knows this.

Likewise, how difficult is it to just use descriptive tools to describe Mario-like images [1] and then remove these results from anyone prompting for "video game plumber"?

1. The describe command can describe an image in Midjourney. I imagine other AI tools have similar features: https://docs.midjourney.com/docs/describe

bnralt2y ago

It seems like a somewhat dystopian thing to fix. Imagine a scenario where Photoshop would scan images you uploaded for copyright material and then refuse to work if it determined image contained any copyrighted material or characters (even if it was just a fan drawing you did).

This reminds me of the early days of the internet where people wanted to remove free fanfiction for violations of copyright laws. Trying to apply copyright laws to personal use cases where the creator isn't trying to sell the material is pretty terrible, in my view.

Imagine a scenario 50 years from now - "Robot, can you cut out this picture I drew for a school diorama." "Certainly." "And this one as well." "Error: Your picture seems like it might contain some copyrighted materials, and as such I am unable to interact with it."

7 more replies

gchamonlive2y ago

The thing is that those are really trivial or extreme examples. What we should take from this:

1. Generative AI systems are fully capable of producing materials that infringe on copyright.

2. They do not inform users when they do so.

So potentially any output could be infringing copyright source material, even from some obscure but still protected corner of the web, and anyone using that output could be exposed to lawsuit risk without warning.

This is very hard to fix.

3 more replies

mrweasel2y ago

It's going to be hard to remove every single "shorthand descriptions of well-known entities" or other prompts that can be used to generate copyrighted or trademarked content. Sure, if you're not deliberately trying to generate infringing content, you can probably remove or discard those results, the trouble is the people who will try to trick the AI to generate this content, blocking those people is going to be impossible, without excluding any copyrighted or trademarked training material.

Another issue for generative AI is mentioned in the article: "Systems like DALL-E and ChatGPT are essentially black boxes." What happens when an AI is used to make decisions where the user/victim is entitled to know exactly why the AI did what it did? From a business and legal perspective I think the current AI solutions are dangerous and should be used very sparsely, exactly because even the creators can't point to the exact pieces of information that caused the AI to make the choices it did.

4 more replies

rco87862y ago

> Likewise, how difficult is it to just use descriptive tools to describe Mario-like images [1] and then remove these results from anyone prompting for "video game plumber"?

This approaches impossibility at scale.

1 more reply

TheRoque2y ago

How do you know you are inputing "well known entities" if you don't know it beforehand ? If I type "columbian coffee logo" and end up with logos of brands that existed before hand, should I just reverse engineer the whole internet to find out if these logos existed already or not ? The AI should show its inspiration. A human who takes inspiration of something else for its creation knows precisely what it used, and if he crossed the line of plagiarism or not, but the way AI work are too opaque for that. I think the thing it needs to do is reveal its sources, nothing more, but it also means for the AI companies to reveal their dataset, and maybe information they shouldn't have, nor disclose.

bbor2y ago

Seems insane to try to prevent the model from reproducing content with a blocklist like this - to say the least, it’s more than just Mario. Plus, how would you possibly code common-sense fair use into the model? What’s the difference between a cartoon mouse and Mickey Mouse? What if it’s parody? This seems beyond ridiculous to try to enforce on the tool level.

WhiteNoiz32y ago· 6 in thread

As I understood it, the legal precedent for generative AI is the same one that allows google to scrape websites in order to index them for search for the common good. Google also can display cached versions of websites which is the original content of those sites. No one is going to say that google is copyright infringement just because it is showing content from other websites verbatim. So I think this is a weak argument. AI would be useless if we had to scrub all cultural references and popular IP's (even not so popular ones).

Personally, I think generative AI should be able to provide links to similar source material in the training data.. This would be the barest way to compensate those who have contributed to training the AI. I don't think generative AI is sustainable in the long term if it ends up killing all the websites/artists that created the original material. Plus I think having sources adds a layer of transparency and aids users in understanding when content is hallucinated vs. not. People should be able to opt out of having their content used for training and be able to confirm that it has been removed for future iterations. Let's be honest that AI companies are just trying to avoid lawsuits by keeping it secret. These are areas where I think regulation can help rather than worrying about doomsday scenarios.

whywhywhywhy2y ago

> No one is going to say that google is copyright infringement just because it is showing content from other websites verbatim

Journalists [1] and Getty Images [2] did in the past

[1]: https://yro.slashdot.org/story/03/07/14/025216/web-caching-g... [2]: https://www.theguardian.com/technology/2016/apr/27/getty-ima...

1 more reply

drubio2y ago

> * I don't think generative AI is sustainable in the long term if it ends up killing all the websites/artists that created the original material. *

This is the elephant in the room. Every tech wave has had its way of cajoling creators into investing time & money to make original material, then the rules changed.

Google, promised reach and new markets for content, it worked. Then they introduced snippets, ads and whole lot of other things to keep visitors on their freeway, while avoiding sending visitors to the original site.

Reddit, Stack Overflow and others, started with gamification (points, badges) & community to incentivize users to contribute original content.

Now AI is shaking up all these approaches. But with each one, the incentive to create original material appears to dwindle, since the returns are becoming less and less.

Like what's the incentive for any professional now, if AI is going to regurgitate their original content, without any upside (i.e. no potential for reach, no gamification, no community, no recognition, etc).

1 more reply

kenmacd2y ago

> I think generative AI should be able to provide links to similar source material in the training data

Except these aren't databases, so that's generally not possible, in the same way that it's not possible for your provide links to the source material it took to write your reply. How much learning led to the weights on your neurons that allowed you to generate that? Where did you learn about using italics and it's effect on how the words would be interpreted? Where did you learn the tone that would be appropriate in this particular forum?

> People should be able to opt out of having their content used for training

Okay... but then, if I write a book should I be able to opt out of you being allowed to read it? What conditions should I be able to put on who can read my work? Religion? Skin colour? People that aren't good at memorizing?

Hopefully the idea of putting limits on who can acquire knowledge sounds absurd to you. Why are those same limits okay if they're on 'what' rather than 'who'?

> AI companies are just trying to avoid lawsuits by keeping it secret

Which has created a barrier to further research. Instead of me and Joe being able to collaborate on research and papers using the same datasets, we now hide our training data lest the luddites come to smash the machines because learning is only okay if not done too well.

2 more replies

AlphaWeaver2y ago

No legal precedent has been set as of yet. The "precedent" you describe is the argument AI companies have been using (that training their models on information available on the Internet should be considered "fair use") but whether AI training actually satisfies the four-factor test for fair use remains to be seen.

1 more reply

layer82y ago

The ability to provide a reference to the source is the crucial difference here.

I agree that it should be possible to implement that for generative AI, although the training may become significantly more expensive in order to maintain that information, and the AI companies have little interest in doing so. They’ll probably rather try to heuristically assess possible copyright issues after the fact in a post-processing step.

The more interesting question is if copyright holders can claim unauthorized use of their works beyond the case of near-verbatim reproduction, because the works collectively inform the AI in a more general manner.

1 more reply

FrustratedMonky2y ago

Wonder. Do Cliff Notes have to pay royalties to the underlying material?

Cliff Notes contain quotes, and citations.

Does the cliff note company, when producing Cliff Notes for "Into The Wild", pay royalties to the publisher?

For that matter, does any paper, article, etc.. that may contain a quote from another, have to pay royalties to the source of the quotes?

1 more reply

aimor2y ago· 5 in thread

I did an interesting thing and looked at how well the Llama2 models could compress text. For example, I took the first chapter of the first Harry Potter book and recorded the index of the 'correct' predicted token. The original text, compressed with 7zip (LZMA?) to about 14kB. The Llama2 encoded indexes compressed to less than 1kB. Then, of course, I can send that 1kB file around and decode the original text. (Unless the model behaves differently on different hardware, which it probably does)

What I get from this is that Llama2 70B contains 93% of Harry Potter Chapter 1 within it. It's not 100% (which would mean no need to share the encoded indices) but it's still pretty significant. I want to repeat this with the entire text of some books, the example I picked isn't representative because the text is available online on the official website.

proaralyst2y ago

While I don't disagree that these models seem to contain the ability to recreate copyrighted text, I don't think your conclusion holds. How well does zstd compress Harry Potter with a dictionary based on English prose? I think you'll get some impressive ratios, and I also think there's nothing infringing in this case.

regularfry2y ago

What it tells you is that 93% of the information is sufficiently shared with the rest of the English language such that it can be pulled out into a shared codebook. LZMA doesn't have a codebook, not really.

In other words it's not that llama2 contains 93% of Chapter 1, it's that only 7% of Chapter 1 is different enough to anything else to be worth encoding in its own right.

sebzim45002y ago

Couldn't you use the same argument to reach the absurd conclusion that the 7zip source code contains the vast majority of Harry Potter?

A decent control would be to compare it to similar prose that you know for a fact is not in the training data (e.g. because it was written afterwards).

1 more reply

tayo422y ago

This is a little confusing. You turned the text into indices? So numbers? Then compressed that? Or the text as numbers without any extra compression is only 1kb?

The tokenizer the models use,(sentence piece) is more or less based on one way to do compression.(bpe). It's not really clear what your testing.

1 more reply

stubish2y ago

I wonder what the loss would be for 'translated into Finnish'? Translations between just about any human languages will contain less than 100% of the original.

clbrmbr2y ago· 4 in thread

Am I the only one believing that copyright has long outlived its usefulness? After all, copyright is not some natural law or mathematical consequence, but rather a social convention that made sense in the era of the printing press.

kayodelycaon2y ago

As an author, I do want the stories I write and worlds I build to be protected for a reasonable period.

Right now, copyright is a significant discouragement to any other entities from taking a story I wrote and claiming it as their own and preventing me from ever growing an audience for my work. It’s far from perfect, and I can’t afford litigation, but it enshrines a cultural value of allowing people to create things and be known for them. Profit is a side effect of this.

Art is already poorly valued compared to the enormous investment time and energy required to produce it. Removing copyright means you can’t even have minimal protections from a more popular person erasing you.

2 more replies

lbotos2y ago

But the concept and closer to the original (creators lifetime + x years or some such) seems still very valuable.

1 more reply

ausbah2y ago

people still print stuff, now it’s just on the internet, podcasts, etc so I don’t see why copyright should change just because the mediums have. also marking out as a “societal convention so it must be useless” is also pretty silly when money, gender, and whole heap of other concepts are societal conventions but still useful

asylteltine2y ago

And how are you supposed to make money from something you invent? Let’s say you make a hit video game. Without copyright people can pirate your game, steal the art, make unauthorized derivative works, etc. it’s just theft.

2 more replies

jpeter2y ago· 4 in thread

If I prompt "golden droid from classic sci-fi movie", what else am I asking for if not Star Wars?

anonymoushn2y ago

an original golden android in the style of a classic sci-fi movie that does not actually exist

edit: i feel like all these comments asking "what else should it generate?" are pretty weird given the proliferation of stuff like non-infringing Star Wars and Indiana Jones knockoffs in other media like Race for The Galaxy or Arkham Horror The Forgotten Age etc.

whywhywhywhy2y ago

If you do "Golden robot holding a lazer gun in a sci-fi setting, cinematic" it will give you a golden robot that doesn't look in the style of C3PO or Star Wars.

"Droid" is actually a Star Wars term [1], and saying you want it from a "classic sci-fi movie" is asking it to reference a real thing that is well known. Reid is intentionally pushing it that way to fill his agenda and these terms are not as generic as he's making out.

[1]:https://trademarks.justia.com/756/52/droid-75652542.html

sjfjsjdjwvwvc2y ago

Or another „copyrighted“ droid for that matter, after all it’s a classic.

Same with robot cop, what the hell did you expect to get…

Or Italian plumber with red hat with M on it, that’s just a description of Mario

Uvix2y ago

The robot from Metropolis?

koliber2y ago· 3 in thread

The responsibility for ensuring that copyrights were not violated fall on the person publishing the work. Whether they drew something themselves, hired an apprentice artists with no legal training to draw something, took a photograph of something, or used AI to create an image should not matter.

Why does anyone assume that ChatGPT or other tools would NOT produce previously-copyrighted content?

I can see a naive assumption that since it is “generated” it’s original. However that assumption falls apart as soon as you replace “ChatGPT” with “junior artist”. Tell them to draw a droid from a sci-fi movie, don’t mention anything else. Don’t say anything about copyrights. Don’t tell them that they have to be original. What would you expect them to produce?

naet2y ago

OpenAI is selling access to their GPT models, and those models are outputting copyright material for me to consume... isn't that just as much of a violation?

2 more replies

TheRoque2y ago

So it makes generative AI essentially unusable, because you don't know if the output is plagiarism or not, so you'd just doubt it always and never use it.

3 more replies

jawngee2y ago

Your argument is nonsense.

The junior artist in your hypothetical would have as much liability, if not more.

2 more replies

appplication2y ago· 3 in thread

There are an alarming number of responses seemingly completely unaware of the core thrust of the article (and NYT lawsuit). ChatGPT was able to reproduce and publish significant portions of NYT articles, completely verbatim for hundred-to-thousand word stretches.

It’s not derivative work. We’re way past that. NYT has an exceptionally strong case here and anyone arguing about the merits of copyright is way off the mark. This court case is not going single-handedly to undo copyright. OpenAI has very little going for them other than “this is new, how were we to know it could do this”. So knowing that, the currently trained models are in a very sticky situation.

Further, I don’t see NYT settling. The implications are too large, and if they settle with OpenAI, they will have a similar case pop up with every other model. And every other publisher of digital content with have a similarly merited case. This is an inflection point for generative AI, and it’s looking like it will be either much more expensive or much more limited than we originally thought.

A side effect of this: I am predicting that we will start to see a rise in “pirate” models. Models who eschew all legality, who are trained in a distributed fashion, and whose weights are published not by corporations but by collectives (e.g. torrent models). There is a good chance we see these surpass the official “well behaved” models in effectiveness. It will be an interesting next few years to see this play out.

benlivengood2y ago

My guess is that OpenAI will be able to basically copy Google/YouYube on this and offer a system like content-ID. Specifically, ChatGPT doesn't reproduce copyrighted works by default; only by request/action of a third party user much like YouTube serving whatever videos people upload. It wasn't the intent of OpenAI to infringe copyright and in fact a lot of or most researchers believed the models were not overfitted enough to reproduce significant portions of arbitrary works.

NemoNobody2y ago

Well I know exactly what the NYT has - a very strong case. I think this case OUGHT to upend copyright law - it's terribly broken and has been for years.

Essentially, if you don't have a massive corp behind a copyright it doesn't mean anything, if a corp is behind something it can be locked forever, regardless of any limits said copyrights are supposed to have.

The NYT list nothing from OpenAI using old news - they still lose nothing if openai can reproduce those articles verbatim.

If the NYT wins - we lose lots. I think it's time revisit copyright, we can do that you know, it's rather dated, could use an update regardless.

RestlessAPI2y ago

Such a thing happened with DALLE, Midjourney, and Stable Diffusion.

Stable Diffusion, when used to its fullest with thing like Control Net and LoRAs, blows the pants off of other proprietary models.

marckrn2y ago· 3 in thread

I might be a bit idealistic, but I've always believed that the core purpose of art and publishing should be to influence culture and society, not just to make a heap of money. That's why I feel original work needs its protection, but it should enter the public domain much sooner to fuel creativity and inspiration. We should be thinking in terms of a few years for this transition, not decades.

mypastself2y ago

The claim that art’s core purpose is societal impact seems to be a common refrain in today’s media, and I completely disagree. Its principal purpose is provoking emotion in the individual. This idea of art teaching you a lesson is likely why there’s so much ham-fisted “activist” fiction anowadays.

1 more reply

kranke1552y ago

So what do you suggest artists have for dinner.

3 more replies

endisneigh2y ago

Why should art be subject to these rules and not everything else?

1 more reply

CTmystery2y ago· 3 in thread

> My guess is that none of this can easily be fixed. Systems like DALL-E and ChatGPT are essentially black boxes. GenAI systems don’t give attribution to source materials because at least as constituted now, they can’t.

Is it necessary to fix in the model itself? It seems a gate in the post processing pipeline that checks for copyright infringement could work, provided they can create another model that identifies copyrighted work (solving the problems of AI with more AI :/)

LeonardoTolstoy2y ago

I should maybe preface this by saying that I probably agree that this is the way this will shake out ultimately.

But I also would say multiple odd post processing stuff (obviously completely obscured for security reasons) bolted onto a giant black box model will erode the trust in the results. If a robot was unveiled and the question of "what prevents this robot from using it's superhuman strength from smashing my head in" the answer of "don't worry there is a post processing step in the robots brain whereby if it detects a desire to kill we just cancel that" would be a little disconcerting.

The more satisfying solution is: the model / robot is designed to not be able to produce specific images / to smash human heads in. It just might not really be possible.

Eridrus2y ago

Exactly; there is no need to do this in the model, you just need well understood token retrieval methods for identifying copyright infringement that ChatGPT's competitors already have.

You will get into some murky definitions of what is exactly required for copyright infringement vs fair use, etc, but we already do this for ContentId for YouTube and text is far simpler.

1 more reply

Krasnol2y ago

I don't even think they want to fix it. They just want to see money. Some form of "tax" per prompt or other ridiculous "models".

This is such a nice, profitable opportunity. Much better than pay per view or subscription models for humans.

pointlessone2y ago· 3 in thread

If any of those results would be deemed infringing we can bid farewell to all fanart ever. Likewise, to all fanfiction. Or any original work that was merely heavily inspired by previous works. Like a lot of modern fantasy is basically Tolkien fan fiction. Or is Gandalf close enough to Merlin to claim prior art that is in public domain?

dkjaudyeqooe2y ago

It's fair use, whereas generative AI doesn't satisfy the same criteria. From https://www.ogcsolutions.com/is-fan-art-copyright-infringeme... :

For fan art to fall under the fair use exception, it must meet all four of the following criteria:

It must be transformative, meaning it adds something new and different to the original work.

It can’t be used for commercial purposes.

It must not negatively impact the market for the original work.

And finally, it must be created for a limited and non-exclusive audience.

numpad02y ago

Fanfictions are controlled by unspoken common sense rules and protected by copyright laws. It's almost weird to hear fan content world being seen as a wild west, it feels like listening to a caveman description of an Apple Store. No they're not living there, they're - have you ever used currency? The round medals that people keep in pockets and trays?

whywhywhywhy2y ago

Weirdly some of the most vocal about this have been professional illustrators and artists who make a lot of money off what is essentially selling fanart commissions, not sure if they're understanding it could impact their work if they get what they want.

Aerroon2y ago· 3 in thread

Aren't some of the examples basically asking for that content?

Ask someone about two Italian brothers in a video game with a red and green hat that have M and L on them. What do you think you would get?

If I describe "imagine a comic book duck that swims in a sea of gold in his vault" you would immediately think of Scrooge McDuck, no?

BlackJack2y ago

disclaimer: I work on GenAI at google, but views are my own

The question is, how did the model create Mario&Luigi or Scrooge McDuck without training on copyrighted data? It can't just crawl Wikipedia because Fair Use in Wikipedia doesn't constitute Fair use for a commercial AI model.

One possible outcome is more transparency on what datasets were used to train the models.

2 more replies

anonzzzies2y ago

Exactly: the prompts incite the same recall as humans have when seeing that prompt; it is just better than most people are drawing it.

sorokod2y ago

What do you think you would get?

What I might think is irrelevant. It is the content that the LLM produces that is relevant.

docdeek2y ago· 3 in thread

How is this different to Googling “robot cop” or “video game plumber” and being served copyrighted material?

Is it because Google will link to the image source? Or does the infringement begin when I use the image for gain, or claim it as my own? Perhaps it is because Google was allowed to crawl the page with the original image, so presenting them with a link is fine?

dkjaudyeqooe2y ago

Search engines are ruled fair use because they use the copyrighted material in a limited way, they provide a public good and they benefit the copyright holder.

Generative AI is more or less the opposite of that. It ingests the whole work, generates output that substitutes for the used work and profits the user of copyrighted work to the detriment of the copyright holder.

Throw in the fact that it is purley a mechanical transformation of the copyrighted work and generative AI is on shaky ground.

1 more reply

pointlessone2y ago

Google directs you to the original work. It doesn’t present you a derivative work based on the original. That is, original author, presumably, benefits from distribution. AI, on the other hand, slurps multiple original works, chews them up and gives you something average but close enough, and not any specific work in particular.

1 more reply

geraldwhen2y ago

Looking at a copyrighted image posted by an author is not infringement. Printing that image onto a shirt and selling it is infringement.

That’s what OpenAI is doing.

1 more reply

nojs2y ago· 3 in thread

In practice, what happens next when websites all start to block openai by default (or change their TOS to disallow OpenAI’s crawlers)?

It seems like there’s little incentive not to do this, because unlike Google OpenAI isn’t bringing any traffic or eyeballs. It may end up being a default setting in Wordpress for example.

But OpenAI presumably can’t afford to pay every single long tail source of content on the whole internet — so how does this end?

CaptainFever2y ago

> or change their TOS to disallow OpenAI’s crawlers

Additionally, this TOS can be ignored if you're in a jurisdiction with TDM exceptions.

> Finally, owing to the bar against contractual override, once a user complies with any conditions for gaining lawful access to a work (such as signing as a subscriber and/or making payment), he will be entitled to use the work for TDM purposes even if the terms of use expressly prohibit this. Content owners may wish to relook their business models and, where necessary, price-in the possibility that the licensed works may be used for TDM.

Source: https://www.twobirds.com/en/insights/2021/singapore/coming-u...

1 more reply

golol2y ago

It's not like you can hide the web from OpenAI. They could just use a secret crawler. Or buy the data from a third party company.

dkjaudyeqooe2y ago

This is what will kill generative AI and there is nothing the courts or lawmakers can do about it. Even in a fair use scenario you can't beat the TOS.

niemandhier2y ago· 2 in thread

Should not be a problem in the EU. Article 3 and 4 of the „ Copyright in the Digital Single Market“ Directive already regulate this.

Summary by Wolters Kluwer: […] Everyone else (including commercial ML developers) can only use works that are lawfully accessible and where the rightholders have not explicitly reserved use for text and data mining purposes.

AFAIK they are discussing something like a robot.txt to flag stuff as „not for training“. You will probably be expected to implement some safeguards and of course the end user will have to be careful in his use of the generated things.

Source at Kluwers: https://copyrightblog.kluweriplaw.com/2023/02/20/protecting-...

EU Legal Text: https://eur-lex.europa.eu/eli/dir/2019/790/oj

injidup2y ago

The EU cannot agree that the Do Not Track flag on web browsers is legally binding but big content should be able to create legally binding flags on their websites to avoid scraping of data? Seems odd!

1 more reply

sampo2y ago

> Summary by Wolters Kluwer: […] Everyone else (including commercial ML developers) can only use

That is a weird (wishful?) interpretation. Doesn't article 4 give the exception to everybody for the purposes of text and data mining, including commercial ML developers?

https://eur-lex.europa.eu/eli/dir/2019/790/oj

1 more reply

FridgeSeal2y ago· 2 in thread

I am beginning to think that in these discussions these models are functioning more like an obscuring factor than anything else and the discussion is getting bogged down in that, and not the crux of the argument.

They’re giving people plausible deniability in the “chain of responsibility”, and I think if we took away “LLM” and replaced it with “fairground sideshow magic box” the argument that LLM’s are somehow special and deserving of exemptions disappears real quick.

regularfry2y ago

I completely agree.

Betamax says that a technology which has significant non-infringing uses is not inherently infringing.

We've already got precedent saying that AI generated works don't accrue copyright protection, and by the same argument the act of generation by the AI expresses no intent, so infringement or otherwise must be down to the human using the output because the black box itself has no agency.

jcgrillo2y ago

I agree, and I would prefer to see concrete examples of LLMs being used productively and profitably in the industry in a "disruptive" manner--putting people out of work, etc--before we conclude they're somehow the next big thing. Basically, before claiming LLMs (or generative techniques, more generally) mean that we're on the doorstep of "general" intelligence, show me door!

The outline of that door might look like industrial adoption of these things for solving some actual problem other than the entertainment value of typing things into the box and seeing what comes out the other side. But so far, as far as I can tell, nobody's actually doing this?

2 more replies

wslh2y ago· 2 in thread

While different, I find this discussion about AI and copyrights as an evolution of the war that never was: Google/FB converting in the portal/proxy for content and while it is not generative AI you can find copyrighted images just using Google Images or as an snippet in the normal search engine. I mention Google because it is the de facto monopoly but this applies to a lot of aggregators.

I know we are talking about different technologies but it seems all these people were very silent and find some opportunity in having this war with OpenAI (not an endorsement) but not fighting others.

I am not making an statement about the morals of AI and aggregators/search engines (super interesting discussion that in a way was happening for long) but I am surprised that organizations are "just" waking up. It seems they just see it is a much simple and cheap fight.

theamk2y ago

The thing with Google is it is super trivial to exclude your text - tag on page, header on server, etc.. So all the conversations about google "stealing" context always seemed pretty silly to me.

Compared to that AI offers no way to opt out, which is a big difference.

dmbche2y ago

Personal use of copywritten material is fine - there is no breach of copyright when you download a picture from Google for yourself.

If you use it commercially then there is breach.

Uploading copywritten content is a breach of copyright as well, even without commercial use.

Google/Facebook are hosting and giving access to a bunch of media, which might or might not be copywritten - it's the individuals problem.They make.money from ads, not from the content.

AI companies stole copywritten media to train their commercial LLM, sell them or their products and make profit.

I don't think it's the same.

rmholt2y ago· 2 in thread

I feel like the outcome is obvious, there will be a finite list of IPs who's owners have enough money to actually sue, which will get filtered out of the output of publicly available models. They will just slap a detector model on the end of the generator to filter them out.

Private models will not care, nor will things change for IP owners with lesser power.

reqo2y ago

Many small owners together can bring a class action though

1 more reply

quonn2y ago

That seems unlikely, unless they settle out of court. And why would the NYT settle like that without receiving a billion?

Courts are likely to make generally binding decisions.

1 more reply

davidy1232y ago· 2 in thread

The solution could be great. I really don't like the way culture always goes to the same tropes, calling any potential innovation "out of Star Trek" (with attendant distorted expectations), right down to expecting an interface based on literal hand-waving in Minority Report. If copyright held works ("USS Enterprise") could be removed, yet the actual essential concepts (space ship, naming things) retained, it would be a tremendous breakthrough.

I think what NYT &c want is for large companies like Apple to pay them for access to their works. This to me is the wrong path, just leading to more silos and walled gardens, special access for the elite.

An alternative is base models trained on Wikipedia and public domain (science journals, etc). Foundations could support high quality, well rounded current events reporting. Wikimedia provides a good model for this, with referenced summaries that I don't think can be said to reasonably violate copyright. The models would need to be improved to support references, or RAG attribution would have to be widely used when bringing in works that have a current copyright.

disgruntledphd22y ago

Science journals are mostly under copyright of a few big publishers who are extremely hostile to any kind of ML being performed on the content.

1 more reply

sgt1012y ago

> special access for the elite.

I think that this is about property rights, the news industry has been gutted in the last 30 years, a lot of content creators (journalists) have lost their livings. The ones that are left are going to lose their livings if the content they generate is rendered valueless because there is no way of protecting that value.

In terms of special access, think about your shoes. They are nice, but only you are allowed to use them. This is not fair. You are the elite...

This goes to difficult places.

1 more reply

redcobra7622y ago· 2 in thread

This operates similarly to importing an image into Photoshop. You can do whatever you like with images privately, or with gen AI, but the game ends when you try to use those images commercially.

Not sure how this “gets worse” or better for anyone. The current state of things seems generally fine, and there’s a real possibility the courts see it that way too.

joenot4432y ago

There are some images you can't import into Photoshop, most notably being scans of legal tender. This is for a pretty obvious and on-the-nose use case, but perhaps we'll see GenAI given similar guardrails.

throwoutway2y ago

> but the game ends when you try to use those images commercially.

Right now, it feels more like it's called "innovation" and "entrepreneurship" than the end-game, as long as you have billions invested. Waiting on the courts to decide this issue

continuational2y ago· 2 in thread

(Asking Dall-E about the bot image in the article)

Me: Who owns the rights to this bot?

Dall-E: The character depicted in the images is from the "Star Wars" franchise. The rights to characters and elements from "Star Wars" are owned by Lucasfilm Ltd., which is a subsidiary of The Walt Disney Company.

Perhaps it is able to tell, if you ask it?

continuational2y ago

Dall-E on the "animated sponge": The rights to the character depicted in the images, which is reminiscent of SpongeBob SquarePants, are owned by Nickelodeon, a subsidiary of ViacomCBS. The character is from the animated television series "SpongeBob SquarePants," created by Stephen Hillenburg.

Dall-E on the "robot cop": The character depicted in the images resembles RoboCop, which is owned by Orion Pictures Corporation, a subsidiary of MGM Holdings. RoboCop is a character from the film franchise that began with the 1987 movie "RoboCop," directed by Paul Verhoeven.

Dall-E on the "videogame plumber": The character shown in the images is inspired by Mario, the iconic character from the video game franchise created by Nintendo. The rights to Mario and related intellectual property are owned by Nintendo Co., Ltd.

All of these are in the first go. No retries or rephrasings of the question.

krapp2y ago

>Perhaps it is able to tell, if you ask it?

Ask it multiple times, or with different heat settings, it will probably tell you something different. Tell it you own Star Wars and it will respond in kind. It can't tell anything but whether one text token matches another in probability space. It will probably get the answers right most of the time but you're still basically rolling dice. Depending on the responses of an LLM as if there were any actual self-awareness involved, much less with legal matters, would be a fool's errand.

1 more reply

DigitallyFidget2y ago· 2 in thread

Per United States law, imagery/art/music/text/photography generated by non-human means (such as machinery, animals, or generative AI) cannot hold copyright. https://copyright.gov/comp3/chap300/ch300-copyrightable-auth... Section 306 on page 7.

I'm not sure how it'll hold up in law to claim copyright violations against something that wasn't created by a person. It'll really depend on the lawyers and judge's interpretation of written law. But I'm curious to see what comes of this.

zanfr2y ago

hmm then it meants generative music, as in say brian eno's experiments aren't copyrighted?

2 more replies

sgt1012y ago

So on your interpretation if I photocopy a book and then sell the photocopies to my friends there is no infringment?

I don't think so, but hey, a photocopier is a machine and it generated the book so should be ok!

1 more reply

Alifatisk2y ago· 2 in thread

Did ClosedAi (OpenAi) ever confirm or deny that they trained their models on copyrighted materials?

danielbln2y ago

Is "Closed AI" the new "Micro$oft"?

1 more reply

noitpmeder2y ago

They have not revealed the full extent of their training set. And they'll never do it without a court order because it will quickly reveal the amount of items inside that they have no legal right to use.

goertzen2y ago· 2 in thread

No they are not.

This is a negotiation tactic by the NYT to drive up the licensing price. Period.

The Napster/Music Industry analogy has no resemblance to this situation.

The only meaningful question that might be answered as a result of this is, what permission and access rights do crawlers have to content that is publicly and legally available.

8organicbits2y ago

Surely there's a meaningful question about copying and distributing content verbatim, which GPT has been shown to do.

2 more replies

noobermin2y ago

The article does not mention napster, where did this reference come from?

preommr2y ago· 1 in thread

We need clearer laws that only apply to Generative AI. Too many comparisons and parallels are being drawn to actual people. "Like what if someone learned how to draw by watching trademarked material, and then accidentally produced it" But these models aren't people and they exist in a category of their own.

I do think it's somewhat trademark infringement by these models, also that it should be allowed and that ultimate responsibility should be on the person using the images in a final work meant for consumption by the general public as stand alone media.

danielbln2y ago

That's where I'm at. Dall+E spitting out C3PO should be entirely ok, unless I'm making money with the output, Disney should pound sand.

2 more replies

kranke1552y ago· 1 in thread

The generative AI rollout has taught me what happens when the interests of the many intersect with the destruction of the few.

You get steamrolled for defending yourself while you overhear above applause to those who have robbed you of your future.

kranke1552y ago

It makes no sense that one is not allowed to make and market a CG Mario movie, but suddenly if you use AI to launder the data it's suddenly ok.

2 more replies

beginning_end2y ago· 1 in thread

This perspective on regulation was interesting: https://drafts.interfluidity.com/2023/12/28/how-to-regulate-...

    "Congress should declare that big-data AI models do not infringe copyright, but are inherently in the public domain.

    Congress should declare that use of AI tools will be an aggravating rather than mitigating factor in determinations of civil and criminal liability."

troupo2y ago

OpenAI and others: AI should be regulated!

Governments starting regulation and companies filinig cipyright lawsuits...

OpenAI: NOT LIKE THAT

AlienRobot2y ago· 1 in thread

An argument I've seen made in pro of AI in past threads about this is that "scraping is legal."

Yeah, downloading the content of a webpage may be legal, but redistributing it isn't.

I wish people stopped trying to make these things seem more important than they really are just because IT people call them "technologies". Blockchain isn't a technology. HTML isn't a technology. React isn't a technology. And AI is now not a technology.

When I see ChatGPT or OpenAI, I don't think of "technology". I think of a program. Software. Because that's what it is. You don't say "none of the laws that exist in this world apply to this" every time you release new software.

I bet many people can't tell the difference between a quick answer from Google and a text generated by ChatGPT on Bing. They just see the output.

All that amazing capability of generative AI? That got old fast. It was groundbreaking for one instant. Now it's just an app that generates images. Just another piece of software. Nothing special about it.

Torrenting and other p2p file transfer protocols didn't get a pass for inventing groundbreaking ways to break the law. I don't think OpenAI will get a pass for doing the same.

danielbln2y ago

> All that amazing capability of generative AI? That got old fast. It was groundbreaking for one instant. Now it's just an app that generates images. Just another piece of software. Nothing special about it.

Speak for yourself, personally I find it still groundbreaking and while the magic won't last forever, it is and will remain groundbreaking especially considering that technological progress and development will continue way beyond what we have today.

mensetmanusman2y ago· 1 in thread

The world is a big place.

China can't produce LLMs because of inconvenient truths.

The US can't produce LLMs because of copyright.

Decentralized open source LLMs might exist that could work, but they won't have the giant GPU clusters.

A rich country with lax rule of law wins? Maybe that's why Sam went to the Saudis?

pelorat2y ago

Well. Japan can: https://petapixel.com/2023/06/05/japan-declares-ai-training-...

Hugsun2y ago· 1 in thread

There are good arguments for the copyright infringement belonging to the user, not the model maker, in this thread.

One issue with that is that there is not a reliable way to determine if copyright is being infringed.

Even if models could be used responsibly, there might not be a reasonable expectation that most people will. If infringement is so easy and avoiding it relatively hard.

I'm not sure what legal prescriptions should be made on this basis, but it's an interesting thought.

yokem552y ago

Bit torrent clients are almost exclusively used for copyright infringement. Yet they are perfectly legal to develop and distribute. On the flip side, operating a company premised around easy copyright infringement was ruled to be illegal (Napster).

Where we might end up is in a situation where it is legal to train a model. Legal to produce software for using the model to generate content. Legal to distribute all of the above. But offering a standing service that does the above and is capable of creating infringing work is illegal. Great news for llama hobbyists. Bad news for ChatGPT.

golol2y ago· 1 in thread

How about this: Image generators should be treated like random google image search. They sample randomly from the distribution of publicly viewable images. Google does it exactly while Image generators do it in an interpolative way. Google images produced copyrighted works most of the time, an image generator only sometimes. Neither should be liable if someone sells a copyrighted work that was produced to someone else.

elmomle2y ago

But when Google image search produces a result, the question of whether it is copyrighted is something I can generally figure out in a matter of seconds or minutes. This is not so for image generators.

1 more reply

qgin2y ago· 1 in thread

Things are about to get a lot worse for generative AI in the United States

They are about to be infinitely better for generative AI in China.

noitpmeder2y ago

China has massive IP theft and Chile labour issues that arguably give them competitive advantages too. Should we let those slide as well?

1 more reply

jlnthws2y ago· 1 in thread

We could get inspiration from the case of the record industry against Napster, or cabs VS Uber. Both parties are somehow abusing their position, but the world is moving on. Rent seeking is probably not an absolute source of wealth after all.

RecycledEle2y ago

Rent seeking should be a capital crime.

josh-sematic2y ago· 1 in thread

Gary Marcus is growing his subscriber base using images of copyrighted IP (C3PO, Mario, etc.). Fair use? Then why is the tool he used to produce those materials not also fair use of the IP? My take is that either we say the models are like people (do we penalize people for learning from IP and letting that influence what they subsequently produce?) or we say they are like tools (do we penalize Adobe because Photoshop makes it easier to make a picture of Mario on the Death Star?).

cogman102y ago

Because the fair use clause he's using is about giving commentary.

The reason the tool is problematic is because derivative works are also copyrighted. LLMs aren't adding value to their output or using creative functionality. That are smashing multiple works together to produce a response. And, many of them are selling the output which is doubly problematic.

Consider this, if I sell a book about gandolf and Dumbledore getting into a wizards duel, both jk and Tolkien have grounds sue me. Adding another copyrighted source does not protect me.

This is especially a big problem in the music industry.

Now should copyrights be like this? I don't know. It feels to me that copyrights have the wrong balance all over the place.

1 more reply

dmbche2y ago· 1 in thread

Hey so the problem isn't the output of the LLMs but the input - the data they are trained on is stolen (big suprise, you can't claim fair use when using something commercially, like training your LLM).

The output is irrelevant.

Edit1: If you want to verify this, check out all the lawsuits against AI companies : it's always about using their copywritten goods. Any discussion about the output is to talk about the amount of damage done to the copyright holder, not if damage exists or not.

kromem2y ago

Here's one of the senior legal peeps at the EFF who has litigated IP cases talking about the issue: https://www.eff.org/deeplinks/2023/04/how-we-think-about-cop...

It's not as clear cut as you think it is.

Paradigma112y ago· 1 in thread

So, whats the plan?

Content creators/artists compete globally. The only thing harsh regulations will do is create an unlevel playing field where artists from noncaring countries will have big advantages over artists from the west, which will be driven into illegality to compete.

In the end products will have to be classified anyway if they are infringing on copyright and/or were being built by an LLM. Most likely automated by another LLM.

sensanaty2y ago

Wouldn't the ones in the West with presumably stronger copyright laws be in a better position, since the trillion dollar megacorporations using their works have to actually pay them, whereas in places where copyright is ignored those creators just get all their shit stolen without credit even being given?

1 more reply

tim3332y ago· 1 in thread

They are just going to have to inform the AI in some sense of the current copyright situation and ask it not to infringe.

It's the same for human writers. If you are writing an article for Wikipedia say, you should read relevant source articles and then rewrite in a way that isn't a copy and paste beyond a few words.

noitpmeder2y ago

Ok I'll bite. Let's assume you've informed the current models about copyright and asked them not to infringe...

What happens when they continue to do so.

intrasight2y ago· 1 in thread

Just make LLMs be like your average human and forget details. I know that it's easier to say than to do, but so are many things worth doing. I can't plagiarize - my language and visual memory doesn't work that way. Such an LLM will have to "create" and answer from more fuzzy memory.

qolop2y ago

The class of models that Yann Lecun is bullish on (look up I-JEPA) do exactly this.

1 more reply

t_mann2y ago· 1 in thread

The article kind of amplified my regrets/anxiety for not getting a copy of books3 and the likes while it was easy. I didn't have an immediate use case, and I don't now, thought I'd wait until actually need it, but it feels like a window is closing here.

sjfjsjdjwvwvc2y ago

Don’t worry there are many people out there who have copies of it all, there is no way they manage to get the cat back in the bag even if all governments work together on this.

But yea get your own copies whenever possible

vimax2y ago· 1 in thread

Maybe Disney and the record labels shouldn't be claiming so much of public culture as their own.

dkjaudyeqooe2y ago

If they created it, they own it, why shouldn't they be claiming that?

2 more replies

quonn2y ago· 1 in thread

Maybe the way to go is to do pre-training on copyrighted data, then to thoroughly shake things up so that hopefully only some useful abstract structure of world knowledge remains and then train that on carefully selected licensed data.

disgruntledphd22y ago

If the models weren't just doing massively complicated interpolation then this would probably work.

Honestly the only way to deal with this is to change the training data and retrain everything (probably at the cost of performance).

airstrike2y ago· 1 in thread

I have no patriotic skin in the game, being neither American, nor European, nor Chinese, but this copyright issue seems overblown to me and like the perfect way to hand the leadership in generative AI over to China

startupsfail2y ago

Would you prefer to live under Chino-Russia dominating the technology sector or EU-US?

1 more reply

ultrablack2y ago· 1 in thread

We are all trained on copyrighted input. That is not a problem. What is a problem is if you reproduce it and try to claim copyright for that. If someone wants to create their own image of Mario in an AI, so what?

gumballindie2y ago

We are not machines. The argument that procedural text and image generators are similar to us is ridiculous. The issue is not whether people can generate images. The issue is ai companies stealing content and reselling it. That needs to stop.

1 more reply

amai2y ago· 1 in thread

Should the NYT not sue https://commoncrawl.org/ ? OpenAI just used the data from commoncrawl for training.

noitpmeder2y ago

Is that true? Has OpenAI revealed exactly what is in their training set?

SKILNER2y ago· 1 in thread

I don't understand the glee so many people have over this. I love being able to use Generative AI tools. How is it different than if I asked a person to draw these pictures for me? I know someone will gleefully clobber this question with a legal answer, but God, let's move forward, hunh?

wharvle2y ago

A bunch of rich people are raiding a little bit of work, each, from a whole bunch of people, then walling it off so they can get richer.

I’d not have a problem with this, personally, if their models were as available as the stuff they took from others. Instead it’s take, take, take… now wait a minute, that pile of loot I stole is mine!

dang2y ago

Related ongoing thread:

NY times is asking that all LLMs trained on Times data be destroyed - https://news.ycombinator.com/item?id=38816944 - Dec 2023 (93 comments)

Also:

NY Times copyright suit wants OpenAI to delete all GPT instances - https://news.ycombinator.com/item?id=38790255 - Dec 2023 (870 comments)

NYT sues OpenAI, Microsoft over 'millions of articles' used to train ChatGPT - https://news.ycombinator.com/item?id=38784194 - Dec 2023 (84 comments)

The New York Times is suing OpenAI and Microsoft for copyright infringement - https://news.ycombinator.com/item?id=38781941 - Dec 2023 (861 comments)

The Times Sues OpenAI and Microsoft Over A.I.’s Use of Copyrighted Work - https://news.ycombinator.com/item?id=38781863 - Dec 2023 (11 comments)

dawnim2y ago

This feels like another area where piracy will surely be superior in case things like this land on the disallowed side of regulation. The model trained on all data will outperform the model trained on a legal subset of data. Whether or not you use it to produce potentially infringing content is another point. Performance will likely improve from having references to copyrighted material and people capable of doing so, myself included, would probably prefer to interact with the non limited model. Perhaps time to update the laws or at least move liability from the creator of the model to the user. No one is going after pencil makers but I can draw a pretty good Mickey Mouse with access to one. Feels like me generating C3P0 and claiming ownership is my problem, not OpenAIs.

bambax2y ago

This only mentions ChatGPT (and M$ by association) but how would this impact "open" models? Even if their makers are somehow prevented from updating them, the models themselves are already in the wild...?

ponorin2y ago

this is exactly what i predicted: the current generative ai is basically rewarded based on how much it convinces people to be a real thing. it very much has the ability to copy verbatim unlike how most human memories work. without fundamental shift in the methodology of machine learning the fault can only be hidden, not solved. a cat and mouse game where one cat has to fight tens of thousands of mouse. it's also very telling how the discussion quickly turns into "maybe society needs to adapt" when so called technological innovation is involved. copyright problem should be solved for artists, not for datacentres. for now it's a handful of famous IPs, but what's stopping from generative ai to snatch some random indie artist's property and copying it ad infinitum?

smrtinsert2y ago

The NYTimes case is a clear one because they are delivering nearly the same content as an end product to users. The others seem like dead ends. The infringer would be the prompter, not the AI which operates more like a search engine. This is Napster all over again, what a phenomenal waste of time and money, where the artist will definitely come out with 0 at the end of it and a few corporations control everything - not to mention, there's nothing stopping anyone from releasing a tool that will crawl all spongebobs, generate your model for you and allow you to produce locally copyright infringing material it to your hearts content locally. You could drown yourself in local spongebobs.

1shooner2y ago

Imagine a future where copyright registration involves contributing your IP to a public adversarial model, which is then a regulated layer in future generative model licensing.

shkkmo2y ago

It seems like this article makes a basic copyright mistake. I don't see any evidence that these are " reproductions" of source material like since no source image is linked to compare.

Instead, these are derivative works. We already have a flourishing culter of derivitave works, such as fan art that exist in various shades of legal greyness.

Some derivative works are fair use, some are not.

The position of the Author here seems to be that generative AI should not be capable of creating any derivitave works, or should only be able to do so it it can accurately identify which are fair use and which aren't (which seems like an impossibly tall bar.) This stance seem like a giant attack on fair use that significantly expands the power of copyright.

To me, the takeaway from this is different. This makes clear that there is currently a risk when using AI generated art that you could end up unintentionally creating and publishing a derivative work unintentionally and thus without evaluating if that work constitues fair use.

karmakaze2y ago

It shouldn't matter how the images/etc are created. The problem comes about when it's used as an original work by the person that's doing so.

Imagine instead of AI/ML, we have a mechanical-turk-like service that produces output from descriptions. The service makes no claims that the generated outputs are not similar to any copyrighted works. The only claim the service makes is that they themselves claim no copyright on the output. It's then up to the user of the service to determine if the output is suitable for their intended use.

Whether such a service itself is legal is a separate matter. For that matter, say you outsourced the artwork to a person who again gave you infringing work. The user of that output is still in violation. With AI/ML we're basically outsourcing to a 'service' that is known to sometimes output copyrighted work so with the user knowing that, are responsible for fair usage.

legendofbrando2y ago

Surely one answer is to train (or aggressively fine-tune) a new model that doesn’t (or refuses) to produce these outputs and then - as exists already, augment that model’s understanding of copyrighted material by having it Bing/Google search as a RAG process that requires the end user to log into accounts at the New York Times (and other accounts) with their paid sub. This broadly replicates the process a person could do today when they read the internet and summarize it while paying rights holders.

Expensive to do but hardly the end of Generative AI or OpenAI should that be the difference between having a business or being sued out of existence. Never underestimate people who have a clear economic interest especially when their own existence is at stake.

sjducb2y ago

I think it’s a question of what counts as publication.

I think that an AI model is analogous to an employee. Imagine I ask my employee to write an article, and they just copy an existing one from the times. That’s plagiarism and bad work, not copyright infringement.

If I then decide to publish the plagiarised article, then I have committed copyright infringement.

I once ran into this exact problem with a human. I hired a designer to make some artwork for an app. When I launched the app it turned out that the human had just copied the artwork from another game. It’s my problem that I hired an idiot, and my problem that my app was infringing the copyright of another app. (We redesigned the graphics very quickly)

null_point2y ago

I suspect this may delay some short term progress by creating pressure on AI labs to train their models from data curated or synthesized in a way that is contentious of copyright law.

There is already troves of data that are fair game for training, but even "corrupted" data sets can probably be used if used intelligently. We've already seen examples of new models effectively being trained off of GPT-4. That approach with filters for copyrighted material might allow for data that is sufficiently "scrambled". Not to say building such a filter is definitely easy, but seems plausible.

KETpXDDzR2y ago

I'd expect "Open"AI et al to lobby heavily towards an "AI-generated content is excluded from copyright infringement". I think it's possible that they'll introduce a "generative AI" tax. Charge x cents per generated text/image and distribute the fund to all media companies.

In Germany you pay some amount extra on top of the sales price of anything that can store data (CX, DVD, USB sticks, HDDs, ...). This is then distributed to all companies that could be impacted by software piracy. I'm still not sure if that's legal considering the Geneva convention disallows collective punishment.

airesearcher2y ago

I think there is another way to solve this. Someone should train an LLM on copyrighted images. Then use that as a second pass on any image generated by the primary LLM to check if it might contain copyrighted images, and blur the copyrighted parts(or change them sufficiently).

Another change could be to the license agreement of LLMs - they could have the user assume liability for any material produced instead of the provider assuming liability. The user would agree that getting the rights for any copies and distribution of copyrighted materials is their sole responsibility instead of the provider.

8note2y ago

"from classic sci-fi movie"

How could you put that as the prompt without intending to infringe? Anything pulled from a classic sci-fi movie would be infringement. The term droid is also star wars specific?

Id consider the "red soda" one as grounds that the Coca-Cola brand has become generic and that it's synonymous with soda. Same thing with Mario too. There is so much non-nintendo content made featuring Mario the plumber that you could get that without training directly on Nintendo's artwork

wouldbecouldbe2y ago

What about non-mit source code, 100% it's trained on those as well.

asylteltine2y ago

I certainly hope so. You can’t just steal content and call it “””AI”””

ur-whale2y ago

It's not for generative AI that thing are about to get a lot worse.

It is in fact the very notion of Copyright is breathing its last breath, and it is fantastic to be alive to see it happen.

roenxi2y ago

Based on the rate of progress; I think this makes little difference to AI progress in the medium-long term.

At the moment, we don't have hardware that can do what humans do (process video feed from eyeballs and build up a world model). I imagine that we'll cross that barrier cheaply in the coming decades, at which point copyright becomes moot. AIs will be able to develop their own styles and world understanding from scratch, then generate original work.

digitcatphd2y ago

Rather than attempting to combat our obvious future, they should spend this effort to find ways to monetize and succeed in this new environment.

hahajk2y ago

> And a whole universe of potential trademark infringements with this single two-word prompt: animated toys

If you flood the market and dominate children's culture with toys from your TV shows, you absolutely cannot complain when your toys are considered iconic enough to be the generic "animated toy". These images don't replace or substitute the things they are depicting.

karmakaze2y ago

The real 'problem' is how do we navigate the present and near future where much more than physical labor is being automated? This is where we need sustainable solutions. The rough road on the way should also be smoothed out so as not to disrupt so many lives, but it's good to keep a perspective what and why we're doing these things.

SubiculumCode2y ago

Attribution weights could be the basis of new type of copyright asset licensing scheme. For all those tech employees who fed the company's model, a license in perpetuity to at least a portion of that value...but only if you fight for it. They are training to replace you, watching your every move, your thought processes, ready to make you a function call.

efields2y ago

It’s more interesting to me how these entities that operate the models start making money from them. They are a money pit and there’s not enough $20/month subscribers on earth to support them.

Enterprises that make content with this also don’t want to infringe on copyright. The AI companies don’t have a good story here. The value has not become evident after years.

_giorgio_2y ago

This guy built a career around nonsensical and catastrophic endings.

Everything that he sees has mysterious flaws that never happen.

caeril2y ago

Wow. I feel really sorry for these giant corporations who have wielded armies of lawyers against fanfic artists to prevent fair use, and to prevent trademarks and patents from expiring on the timelines enshrined by law.

Can we all have a moment of silence for poor Bob Iger? Maybe we can start a GoFundMe to help him out?

rolisz2y ago

Simple fix (at least for ChatGPT): ask it to avoid drawings with similarities to copyrighted characters.

logicchains2y ago

I predict this could be a boon for generative AI because restricting it to training on copyright-expired media would produce a higher quality training corpus, as low-quality material from so long ago is unlikely to have been preserved, leaving only higher-quality material.

Avicebron2y ago

I'm surprised this is presented as a revelation? I did pretty much this same experiment ages ago as part of a suite of tests comparing the efficacy of different sized models..

renewiltord2y ago

You can try, but I have Mistral on my local computer and it doesn't need the Internet. And people have pirate dumps they're going to run this stuff through.

I'll just do it myself.

amelius2y ago

Just like we have the uncanny valley for robots, LLMs are in the unoriginality valley. Only when we get out of it will the copyright issues go away.

smitty1e2y ago

The DALL-E/*GPT revolution sounds like the death of personal and corporate property.

That's gonna leave a Marx[1].

[1] https://youtu.be/7WDKivqFOgA?si=nWq5aeKA4dLytX3Z

ofslidingfeet2y ago

I'm still waiting for people to figure out the whole point of an automated process is that it behaves the same way each time.

penjelly2y ago

> My guess is that none of this can easily be fixed.

also my concern, except it feels like many of LLMs "problems" cant be easily fixed

zanfr2y ago

no matter how you look at it; the cat is out of the bag. OpenAI could be censored but you can't censor the opensource

Log_out_2y ago

That sound, as if layers and layers of renteering aristocracy were forced to work again against their will.

AC_86753092y ago

So the models overfit the training data, essentially memorizing, instead of generalizing?

wayeq2y ago

We need to figure out how to ever so gradually move toward a post-copyright economy.

throwuwu2y ago

Copyright is fucked. Even if Open AI somehow loses this and has to delete GPT4 and their training data, the generative AI cat is so far out of the bag that it’s gone on to live a full life and have many grandkittens. It’s already easy to install and run generative models and it’s just going to get easier and the models will keep getting better. These lawsuits are futile and won’t matter in 2 years or less.

RecycledEle2y ago

If we get rid of unconstitutional copyrights in the US, this ges away.

Recall that according to the US Constitution, copyright can only be on on "science and the useful arts."

Alternately, we could restore a reasonable limit to the duration of copyrights, like 14 years.

pxoe2y ago

there's an easy fix. the easiest. just don't use data that you don't have the rights to use. apparently that's just impossible.

"but what if we want to scrape the entire web and something makes it in anyway? see, that is impossible". well that's just saying "fuck it" and using bad data anyway. that's not an actual effort to "not use data you can't use" - there was just no way there'd be a 'rights cleared' way to use the entire web anyway. that is impossible. using a clean dataset is not impossible. it's very possible.

RandomGerm4n2y ago

Perhaps we should simply take this as an opportunity to finally abolish copyright. Smaller artists mainly earn their money with commissions. They are paid to do a very specific thing. Whether there is a copyright on the result is irrelevant. Someone else who would "steal" the image and use it without payment would apparently have fewer requirements. The person could have simply taken any AI image. Therefore, the artist in the scenario would not receive any money from the second person anyway.

Apart from this, it is mainly large companies that benefit from copyright laws. Why should we have laws that restrict progress just so large capitalist companies can maximize their profits?

1 more reply

skybrian2y ago

I wonder what Adobe Firefly does with these prompts?

oglop2y ago

So what? I feel like I’m taking crazy pills when I read these things. You all do realize the same thing happens in your mind with those same prompts right? That’s kinda how it works. Who is surprised by this? Yeah no shit it can kinda reproduce the text it was trained on, so do I! That’s how that works. And the NYT knew for a long ass time this thing was ingesting. Literally saw this in the marketing when I signed up last year.

I wasn’t shocked when I noticed I could query it about ANY math textbook I owned and it could talk with me about it. I did t bitch and gripe, I enjoyed it and have conversations.

Anyway, I’m in the minority I guess. I love that I can talk with it about books and news.

freddealmeida2y ago

not in japan.

Joel_Mckay2y ago

If ML cannot create copyrightable or patented material under current legal precedent, than shouldn't the prompt output be considered public domain regardless of content semblance?

The paradox should still violate Trademarks due to similarity, but likely cannot infringe on copyright content under prior legal opinion... if at least 80% different from prior art. The lawyers are likely going to have to do a special firm survey to figure this one out.

Bag of popcorn ready =)

yieldcrv2y ago

a lot worse for cloud providers hosting generative AI

the models can be fine

gfodor2y ago

Gary Marcus is the master of AI FUD

octacat2y ago

I am expecting politicians would do some nice mental gymnastics regarding regulating this. All major IT companies are doing genai now and nobody wanna hurt the companies.

Intox2y ago

Or... things are about to get worse for copyright holders.

I don't see any developped country pressing the brake on AGI in the near future to protect a few copyright holders from getting "stolen" in hypothetic scenarios.

23 more replies

Baldbvrhunter2y ago

I imagine the argument might be like this:

I hire a session musician to play on my new single, paying him $100. I record the whole session.

I ask him to play the opening to "Stairway to Heaven" and he does so.

"Well, I can't use that as a sample without paying"

"Ok play something like Jimmy Page"

"Hmm, still sounds like Stairway to Heaven"

"Ok, try and sound less like Stairway to Heaven but in that style"

"Great, I'll use that one"

and I release my song and get $5,000 in royalties.

Should I be sued for infringement, or the guitarist?

The problem, I suppose, is that if I had said "play something like 70s prog rock" and he played "Stairway to Heaven" and I didn't know what it was and said "great, I'll use that".

Should I be sued for infringement, or the guitarist?

9 more replies

iainctduncan2y ago

I am constantly suprised by the amount of apologizing for generative AI infringement here. The fact that it's already being done and is a technical breakthrough is irrelevant to existing copyright law. "We are big and innovative" may hold weight with legislators, but it won't with the courts.

Remember when everyone and their dog discovered sampling in the late 80's and they all thought they could get away with it because it didn't seem like infringement to the samplers? The courts had no qualms about slapping record labels for putting out records with unlicensed samples in them. Albums even got pulled off shelves while licenses were sorted out.

These companies are charging for a service that returns copyrighted content, full stop. You can't do that whether you are AI or someone drawing Mario and selling the pictures on iStock, or putting out records that sample someone else's work without permission. It took a while in the case of sampling, but it sure as hell happened.

1 more reply

sjfjsjdjwvwvc2y ago

Please ban all these AI companies, at this point I have enough OSS models, don’t really need any hosted service anymore.

IMO would be best if this stays a highly illegal technology that is only available to a few weirdo nerds /s

jdjdjdkdksmdnd2y ago

people are so naive. AI is a matter of national security now. its over. they exposed civilians to nuclear radiation for the nuclear bomb. and you think the state would let this get in the way of the AI arms race which they are anxiously anticipating? nope

whodidntante2y ago

Simple solution, when gpt-5 comes out, just rename it Claudine, and the NYT will drop their suit

j / k navigate · click thread line to collapse

755 comments

216 comments · 101 top-level

ctoth2y ago· 9 in thread

Everybody just buying into the corporate narrative that anyone can actually own these sorts of things.

Who truly owns the tales of Snow White and Cinderella?

This conversation shouldn't just be about the technicalities of AI or the legalities of copyright; it should be about understanding the deep roots of our shared culture.

At its core, culture is a communal property, evolving and growing through collective storytelling and reinterpretation.

By focusing solely on the legal implications and ignoring the historical context of cultural storytelling, we risk overlooking the essence of what it means to be a creative society.

jerf2y ago

"Who truly owns the tales of Snow White and Cinderella?"

2 more replies

fnordpiglet2y ago

1 more reply

greenthrow2y ago

wwweston2y ago

> culture is a communal property

Public domain / communal property is also part of copyright, so it's not as if this is some forgotten concept that needs to be restored to the discourse.

Georgism is underconsidered, though.

> By focusing solely on the legal implications and ignoring the historical context of cultural storytelling

> we might actually limit the tools of cultural expression to comply with some weird outdated copyright thing is just...bonkers.

What's bonkers is the life in the literally backwards idea copyright is (or should be) mooted or outdated by novel reproduction capabilities.

1 more reply

iainctduncan2y ago

The idea that we should dispense with it to let generative AI companies make even more money seems totally bizarre.

2 more replies

pardoned_turkey2y ago

2 more replies

up2isomorphism2y ago

syndacks2y ago

2 more replies

walt742y ago

Havoc2y ago· 9 in thread

To me that’s the wrong question.

Everyone knew it was trained on copyrighted material and capable of eerily similar outputs.

But it’s already done. At scale. Large corps committing fully. There is no chance of that toothpaste going back in the tube.

So the right question here is what now. And I suspect much like tracking the answer will be - not much.

janice19992y ago

> There is no chance of that toothpaste going back in the tube.

You should read up on the musicians strike of 1942. [0]

[0[ https://jacobin.com/2022/03/1940s-musicians-strike-american-...

chubot2y ago

This comment is ignorant of history

It happened with Napster, then Apple Music, now streaming services

There is no widespread file sharing in the general public, instead we have devices that we don’t own, and streaming subscriptions

Apple didn’t just copy all the music onto iPods and sell it — it took them a decade of deal making and lots of money to acquire the rights to the content

I’m not saying what’s right or wrong, just saying that this comment has very little understanding of these battles

4 more replies

j_maffe2y ago

That's a really eloquent way of saying "It's already happening, so give up on it." I'm sure it works out great for taking action and solving problems.

3 more replies

_xnmw2y ago

1 more reply

anonymousab2y ago

2 more replies

pxoe2y ago

FridgeSeal2y ago

ZitchDog2y ago

Napster hit scale too.

2 more replies

aatd862y ago

Data is dynamic. Ok for old data. What about new data?

zarzavat2y ago· 7 in thread

Alifatisk2y ago

Yeah but that is not the case, they never mentioned Mario and Luigi, yet, that's what the output turned out to be.

mattmanser2y ago

The user didn't create it, the cloud-hosted machine owned by OpenAI, that charges for access, did.

When prompted with 'futuristic robot' and 'italian plumbers'.

That's how it makes sense.

techdmn2y ago

fzeroracer2y ago

No, it's more akin to if Photoshop had a 'Mario' stamp which when used would stamp a random piece of Mario artwork from the games. Do you think this would be in violation of copyright?

Uvix2y ago

What about when you prompt the model without the intention of creating something infringing, and still get those same characters out in the result?

Xeamek2y ago

The post shows many examples where the prompt explicitly avoids any mentions of copyrighted materials but the generated results includes them regardless.

Did you even read the post?

But also, the argument of 'user responsibility' doesn't hold up on its own regardless (imo).

Lorak_2y ago

Did you read the article? It shows a lot of examples when no specific names are mentioned, or even with very generic prompts producing copyrighted material.

1 more reply

keiferski2y ago· 6 in thread

Likewise, how difficult is it to just use descriptive tools to describe Mario-like images [1] and then remove these results from anyone prompting for "video game plumber"?

1. The describe command can describe an image in Midjourney. I imagine other AI tools have similar features: https://docs.midjourney.com/docs/describe

bnralt2y ago

7 more replies

gchamonlive2y ago

The thing is that those are really trivial or extreme examples. What we should take from this:

1. Generative AI systems are fully capable of producing materials that infringe on copyright.

2. They do not inform users when they do so.

This is very hard to fix.

3 more replies

mrweasel2y ago

4 more replies

rco87862y ago

> Likewise, how difficult is it to just use descriptive tools to describe Mario-like images [1] and then remove these results from anyone prompting for "video game plumber"?

This approaches impossibility at scale.

1 more reply

TheRoque2y ago

bbor2y ago

WhiteNoiz32y ago· 6 in thread

whywhywhywhy2y ago

> No one is going to say that google is copyright infringement just because it is showing content from other websites verbatim

Journalists [1] and Getty Images [2] did in the past

[1]: https://yro.slashdot.org/story/03/07/14/025216/web-caching-g... [2]: https://www.theguardian.com/technology/2016/apr/27/getty-ima...

1 more reply

drubio2y ago

> * I don't think generative AI is sustainable in the long term if it ends up killing all the websites/artists that created the original material. *

This is the elephant in the room. Every tech wave has had its way of cajoling creators into investing time & money to make original material, then the rules changed.

Reddit, Stack Overflow and others, started with gamification (points, badges) & community to incentivize users to contribute original content.

Now AI is shaking up all these approaches. But with each one, the incentive to create original material appears to dwindle, since the returns are becoming less and less.

1 more reply

kenmacd2y ago

> I think generative AI should be able to provide links to similar source material in the training data

> People should be able to opt out of having their content used for training

Hopefully the idea of putting limits on who can acquire knowledge sounds absurd to you. Why are those same limits okay if they're on 'what' rather than 'who'?

> AI companies are just trying to avoid lawsuits by keeping it secret

2 more replies

AlphaWeaver2y ago

1 more reply

layer82y ago

The ability to provide a reference to the source is the crucial difference here.

1 more reply

FrustratedMonky2y ago

Wonder. Do Cliff Notes have to pay royalties to the underlying material?

Cliff Notes contain quotes, and citations.

Does the cliff note company, when producing Cliff Notes for "Into The Wild", pay royalties to the publisher?

For that matter, does any paper, article, etc.. that may contain a quote from another, have to pay royalties to the source of the quotes?

1 more reply

aimor2y ago· 5 in thread

proaralyst2y ago

regularfry2y ago

In other words it's not that llama2 contains 93% of Chapter 1, it's that only 7% of Chapter 1 is different enough to anything else to be worth encoding in its own right.

sebzim45002y ago

Couldn't you use the same argument to reach the absurd conclusion that the 7zip source code contains the vast majority of Harry Potter?

A decent control would be to compare it to similar prose that you know for a fact is not in the training data (e.g. because it was written afterwards).

1 more reply

tayo422y ago

This is a little confusing. You turned the text into indices? So numbers? Then compressed that? Or the text as numbers without any extra compression is only 1kb?

The tokenizer the models use,(sentence piece) is more or less based on one way to do compression.(bpe). It's not really clear what your testing.

1 more reply

stubish2y ago

I wonder what the loss would be for 'translated into Finnish'? Translations between just about any human languages will contain less than 100% of the original.

clbrmbr2y ago· 4 in thread

kayodelycaon2y ago

As an author, I do want the stories I write and worlds I build to be protected for a reasonable period.

2 more replies

lbotos2y ago

But the concept and closer to the original (creators lifetime + x years or some such) seems still very valuable.

1 more reply

ausbah2y ago

asylteltine2y ago

2 more replies

jpeter2y ago· 4 in thread

If I prompt "golden droid from classic sci-fi movie", what else am I asking for if not Star Wars?

anonymoushn2y ago

an original golden android in the style of a classic sci-fi movie that does not actually exist

whywhywhywhy2y ago

If you do "Golden robot holding a lazer gun in a sci-fi setting, cinematic" it will give you a golden robot that doesn't look in the style of C3PO or Star Wars.

[1]:https://trademarks.justia.com/756/52/droid-75652542.html

sjfjsjdjwvwvc2y ago

Or another „copyrighted“ droid for that matter, after all it’s a classic.

Same with robot cop, what the hell did you expect to get…

Or Italian plumber with red hat with M on it, that’s just a description of Mario

Uvix2y ago

The robot from Metropolis?

koliber2y ago· 3 in thread

Why does anyone assume that ChatGPT or other tools would NOT produce previously-copyrighted content?

naet2y ago

OpenAI is selling access to their GPT models, and those models are outputting copyright material for me to consume... isn't that just as much of a violation?

2 more replies

TheRoque2y ago

So it makes generative AI essentially unusable, because you don't know if the output is plagiarism or not, so you'd just doubt it always and never use it.

3 more replies

jawngee2y ago

Your argument is nonsense.

The junior artist in your hypothetical would have as much liability, if not more.

2 more replies

appplication2y ago· 3 in thread

benlivengood2y ago

NemoNobody2y ago

Well I know exactly what the NYT has - a very strong case. I think this case OUGHT to upend copyright law - it's terribly broken and has been for years.

The NYT list nothing from OpenAI using old news - they still lose nothing if openai can reproduce those articles verbatim.

If the NYT wins - we lose lots. I think it's time revisit copyright, we can do that you know, it's rather dated, could use an update regardless.

RestlessAPI2y ago

Such a thing happened with DALLE, Midjourney, and Stable Diffusion.

Stable Diffusion, when used to its fullest with thing like Control Net and LoRAs, blows the pants off of other proprietary models.

marckrn2y ago· 3 in thread

mypastself2y ago

1 more reply

kranke1552y ago

So what do you suggest artists have for dinner.

3 more replies

endisneigh2y ago

Why should art be subject to these rules and not everything else?

1 more reply

CTmystery2y ago· 3 in thread

LeonardoTolstoy2y ago

I should maybe preface this by saying that I probably agree that this is the way this will shake out ultimately.

The more satisfying solution is: the model / robot is designed to not be able to produce specific images / to smash human heads in. It just might not really be possible.

Eridrus2y ago

Exactly; there is no need to do this in the model, you just need well understood token retrieval methods for identifying copyright infringement that ChatGPT's competitors already have.

You will get into some murky definitions of what is exactly required for copyright infringement vs fair use, etc, but we already do this for ContentId for YouTube and text is far simpler.

1 more reply

Krasnol2y ago

I don't even think they want to fix it. They just want to see money. Some form of "tax" per prompt or other ridiculous "models".

This is such a nice, profitable opportunity. Much better than pay per view or subscription models for humans.

pointlessone2y ago· 3 in thread

dkjaudyeqooe2y ago

It's fair use, whereas generative AI doesn't satisfy the same criteria. From https://www.ogcsolutions.com/is-fan-art-copyright-infringeme... :

For fan art to fall under the fair use exception, it must meet all four of the following criteria:

It must be transformative, meaning it adds something new and different to the original work.

It can’t be used for commercial purposes.

It must not negatively impact the market for the original work.

And finally, it must be created for a limited and non-exclusive audience.

numpad02y ago

whywhywhywhy2y ago

Aerroon2y ago· 3 in thread

Aren't some of the examples basically asking for that content?

Ask someone about two Italian brothers in a video game with a red and green hat that have M and L on them. What do you think you would get?

If I describe "imagine a comic book duck that swims in a sea of gold in his vault" you would immediately think of Scrooge McDuck, no?

BlackJack2y ago

disclaimer: I work on GenAI at google, but views are my own

One possible outcome is more transparency on what datasets were used to train the models.

2 more replies

anonzzzies2y ago

Exactly: the prompts incite the same recall as humans have when seeing that prompt; it is just better than most people are drawing it.

sorokod2y ago

What do you think you would get?

What I might think is irrelevant. It is the content that the LLM produces that is relevant.

docdeek2y ago· 3 in thread

How is this different to Googling “robot cop” or “video game plumber” and being served copyrighted material?

dkjaudyeqooe2y ago

Search engines are ruled fair use because they use the copyrighted material in a limited way, they provide a public good and they benefit the copyright holder.

Throw in the fact that it is purley a mechanical transformation of the copyrighted work and generative AI is on shaky ground.

1 more reply

pointlessone2y ago

1 more reply

geraldwhen2y ago

Looking at a copyrighted image posted by an author is not infringement. Printing that image onto a shirt and selling it is infringement.

That’s what OpenAI is doing.

1 more reply

nojs2y ago· 3 in thread

In practice, what happens next when websites all start to block openai by default (or change their TOS to disallow OpenAI’s crawlers)?

It seems like there’s little incentive not to do this, because unlike Google OpenAI isn’t bringing any traffic or eyeballs. It may end up being a default setting in Wordpress for example.

But OpenAI presumably can’t afford to pay every single long tail source of content on the whole internet — so how does this end?

CaptainFever2y ago

> or change their TOS to disallow OpenAI’s crawlers

Additionally, this TOS can be ignored if you're in a jurisdiction with TDM exceptions.

Source: https://www.twobirds.com/en/insights/2021/singapore/coming-u...

1 more reply

golol2y ago

It's not like you can hide the web from OpenAI. They could just use a secret crawler. Or buy the data from a third party company.

dkjaudyeqooe2y ago

This is what will kill generative AI and there is nothing the courts or lawmakers can do about it. Even in a fair use scenario you can't beat the TOS.

niemandhier2y ago· 2 in thread

Should not be a problem in the EU. Article 3 and 4 of the „ Copyright in the Digital Single Market“ Directive already regulate this.

Source at Kluwers: https://copyrightblog.kluweriplaw.com/2023/02/20/protecting-...

EU Legal Text: https://eur-lex.europa.eu/eli/dir/2019/790/oj

injidup2y ago

1 more reply

sampo2y ago

> Summary by Wolters Kluwer: […] Everyone else (including commercial ML developers) can only use

That is a weird (wishful?) interpretation. Doesn't article 4 give the exception to everybody for the purposes of text and data mining, including commercial ML developers?

https://eur-lex.europa.eu/eli/dir/2019/790/oj

1 more reply

FridgeSeal2y ago· 2 in thread

regularfry2y ago

I completely agree.

Betamax says that a technology which has significant non-infringing uses is not inherently infringing.

jcgrillo2y ago

2 more replies

wslh2y ago· 2 in thread

theamk2y ago

The thing with Google is it is super trivial to exclude your text - tag on page, header on server, etc.. So all the conversations about google "stealing" context always seemed pretty silly to me.

Compared to that AI offers no way to opt out, which is a big difference.

dmbche2y ago

Personal use of copywritten material is fine - there is no breach of copyright when you download a picture from Google for yourself.

If you use it commercially then there is breach.

Uploading copywritten content is a breach of copyright as well, even without commercial use.

Google/Facebook are hosting and giving access to a bunch of media, which might or might not be copywritten - it's the individuals problem.They make.money from ads, not from the content.

AI companies stole copywritten media to train their commercial LLM, sell them or their products and make profit.

I don't think it's the same.

rmholt2y ago· 2 in thread

Private models will not care, nor will things change for IP owners with lesser power.

reqo2y ago

Many small owners together can bring a class action though

1 more reply

quonn2y ago

That seems unlikely, unless they settle out of court. And why would the NYT settle like that without receiving a billion?

Courts are likely to make generally binding decisions.

1 more reply

davidy1232y ago· 2 in thread

disgruntledphd22y ago

Science journals are mostly under copyright of a few big publishers who are extremely hostile to any kind of ML being performed on the content.

1 more reply

sgt1012y ago

> special access for the elite.

In terms of special access, think about your shoes. They are nice, but only you are allowed to use them. This is not fair. You are the elite...

This goes to difficult places.

1 more reply

redcobra7622y ago· 2 in thread

This operates similarly to importing an image into Photoshop. You can do whatever you like with images privately, or with gen AI, but the game ends when you try to use those images commercially.

Not sure how this “gets worse” or better for anyone. The current state of things seems generally fine, and there’s a real possibility the courts see it that way too.

joenot4432y ago

throwoutway2y ago

> but the game ends when you try to use those images commercially.

Right now, it feels more like it's called "innovation" and "entrepreneurship" than the end-game, as long as you have billions invested. Waiting on the courts to decide this issue

continuational2y ago· 2 in thread

(Asking Dall-E about the bot image in the article)

Me: Who owns the rights to this bot?

Perhaps it is able to tell, if you ask it?

continuational2y ago

All of these are in the first go. No retries or rephrasings of the question.

krapp2y ago

>Perhaps it is able to tell, if you ask it?

1 more reply

DigitallyFidget2y ago· 2 in thread

zanfr2y ago

hmm then it meants generative music, as in say brian eno's experiments aren't copyrighted?

2 more replies

sgt1012y ago

So on your interpretation if I photocopy a book and then sell the photocopies to my friends there is no infringment?

I don't think so, but hey, a photocopier is a machine and it generated the book so should be ok!

1 more reply

Alifatisk2y ago· 2 in thread

Did ClosedAi (OpenAi) ever confirm or deny that they trained their models on copyrighted materials?

danielbln2y ago

Is "Closed AI" the new "Micro$oft"?

1 more reply

noitpmeder2y ago

goertzen2y ago· 2 in thread

No they are not.

This is a negotiation tactic by the NYT to drive up the licensing price. Period.

The Napster/Music Industry analogy has no resemblance to this situation.

The only meaningful question that might be answered as a result of this is, what permission and access rights do crawlers have to content that is publicly and legally available.

8organicbits2y ago

Surely there's a meaningful question about copying and distributing content verbatim, which GPT has been shown to do.

2 more replies

noobermin2y ago

The article does not mention napster, where did this reference come from?

preommr2y ago· 1 in thread

danielbln2y ago

That's where I'm at. Dall+E spitting out C3PO should be entirely ok, unless I'm making money with the output, Disney should pound sand.

2 more replies

kranke1552y ago· 1 in thread

The generative AI rollout has taught me what happens when the interests of the many intersect with the destruction of the few.

You get steamrolled for defending yourself while you overhear above applause to those who have robbed you of your future.

kranke1552y ago

It makes no sense that one is not allowed to make and market a CG Mario movie, but suddenly if you use AI to launder the data it's suddenly ok.

2 more replies

beginning_end2y ago· 1 in thread

This perspective on regulation was interesting: https://drafts.interfluidity.com/2023/12/28/how-to-regulate-...

    "Congress should declare that big-data AI models do not infringe copyright, but are inherently in the public domain.

    Congress should declare that use of AI tools will be an aggravating rather than mitigating factor in determinations of civil and criminal liability."

troupo2y ago

OpenAI and others: AI should be regulated!

Governments starting regulation and companies filinig cipyright lawsuits...

OpenAI: NOT LIKE THAT

AlienRobot2y ago· 1 in thread

An argument I've seen made in pro of AI in past threads about this is that "scraping is legal."

Yeah, downloading the content of a webpage may be legal, but redistributing it isn't.

I bet many people can't tell the difference between a quick answer from Google and a text generated by ChatGPT on Bing. They just see the output.

Torrenting and other p2p file transfer protocols didn't get a pass for inventing groundbreaking ways to break the law. I don't think OpenAI will get a pass for doing the same.

danielbln2y ago

mensetmanusman2y ago· 1 in thread

The world is a big place.

China can't produce LLMs because of inconvenient truths.

The US can't produce LLMs because of copyright.

Decentralized open source LLMs might exist that could work, but they won't have the giant GPU clusters.

A rich country with lax rule of law wins? Maybe that's why Sam went to the Saudis?

pelorat2y ago

Well. Japan can: https://petapixel.com/2023/06/05/japan-declares-ai-training-...

Hugsun2y ago· 1 in thread

There are good arguments for the copyright infringement belonging to the user, not the model maker, in this thread.

One issue with that is that there is not a reliable way to determine if copyright is being infringed.

Even if models could be used responsibly, there might not be a reasonable expectation that most people will. If infringement is so easy and avoiding it relatively hard.

I'm not sure what legal prescriptions should be made on this basis, but it's an interesting thought.

yokem552y ago

golol2y ago· 1 in thread

elmomle2y ago

1 more reply

qgin2y ago· 1 in thread

Things are about to get a lot worse for generative AI in the United States

They are about to be infinitely better for generative AI in China.

noitpmeder2y ago

China has massive IP theft and Chile labour issues that arguably give them competitive advantages too. Should we let those slide as well?

1 more reply

jlnthws2y ago· 1 in thread

RecycledEle2y ago

Rent seeking should be a capital crime.

josh-sematic2y ago· 1 in thread

cogman102y ago

Because the fair use clause he's using is about giving commentary.

Consider this, if I sell a book about gandolf and Dumbledore getting into a wizards duel, both jk and Tolkien have grounds sue me. Adding another copyrighted source does not protect me.

This is especially a big problem in the music industry.

Now should copyrights be like this? I don't know. It feels to me that copyrights have the wrong balance all over the place.

1 more reply

dmbche2y ago· 1 in thread

The output is irrelevant.

kromem2y ago

Here's one of the senior legal peeps at the EFF who has litigated IP cases talking about the issue: https://www.eff.org/deeplinks/2023/04/how-we-think-about-cop...

It's not as clear cut as you think it is.

Paradigma112y ago· 1 in thread

So, whats the plan?

In the end products will have to be classified anyway if they are infringing on copyright and/or were being built by an LLM. Most likely automated by another LLM.

sensanaty2y ago

1 more reply

tim3332y ago· 1 in thread

They are just going to have to inform the AI in some sense of the current copyright situation and ask it not to infringe.

It's the same for human writers. If you are writing an article for Wikipedia say, you should read relevant source articles and then rewrite in a way that isn't a copy and paste beyond a few words.

noitpmeder2y ago

Ok I'll bite. Let's assume you've informed the current models about copyright and asked them not to infringe...

What happens when they continue to do so.

intrasight2y ago· 1 in thread

qolop2y ago

The class of models that Yann Lecun is bullish on (look up I-JEPA) do exactly this.

1 more reply

t_mann2y ago· 1 in thread

sjfjsjdjwvwvc2y ago

Don’t worry there are many people out there who have copies of it all, there is no way they manage to get the cat back in the bag even if all governments work together on this.

But yea get your own copies whenever possible

vimax2y ago· 1 in thread

Maybe Disney and the record labels shouldn't be claiming so much of public culture as their own.

dkjaudyeqooe2y ago

If they created it, they own it, why shouldn't they be claiming that?

2 more replies

quonn2y ago· 1 in thread

disgruntledphd22y ago

If the models weren't just doing massively complicated interpolation then this would probably work.

Honestly the only way to deal with this is to change the training data and retrain everything (probably at the cost of performance).

airstrike2y ago· 1 in thread

startupsfail2y ago

Would you prefer to live under Chino-Russia dominating the technology sector or EU-US?

1 more reply

ultrablack2y ago· 1 in thread

gumballindie2y ago

1 more reply

amai2y ago· 1 in thread

Should the NYT not sue https://commoncrawl.org/ ? OpenAI just used the data from commoncrawl for training.

noitpmeder2y ago

Is that true? Has OpenAI revealed exactly what is in their training set?

SKILNER2y ago· 1 in thread

wharvle2y ago

A bunch of rich people are raiding a little bit of work, each, from a whole bunch of people, then walling it off so they can get richer.

dang2y ago

Related ongoing thread:

NY times is asking that all LLMs trained on Times data be destroyed - https://news.ycombinator.com/item?id=38816944 - Dec 2023 (93 comments)

Also:

NY Times copyright suit wants OpenAI to delete all GPT instances - https://news.ycombinator.com/item?id=38790255 - Dec 2023 (870 comments)

NYT sues OpenAI, Microsoft over 'millions of articles' used to train ChatGPT - https://news.ycombinator.com/item?id=38784194 - Dec 2023 (84 comments)

The New York Times is suing OpenAI and Microsoft for copyright infringement - https://news.ycombinator.com/item?id=38781941 - Dec 2023 (861 comments)

The Times Sues OpenAI and Microsoft Over A.I.’s Use of Copyrighted Work - https://news.ycombinator.com/item?id=38781863 - Dec 2023 (11 comments)

dawnim2y ago

bambax2y ago

ponorin2y ago

smrtinsert2y ago

1shooner2y ago

Imagine a future where copyright registration involves contributing your IP to a public adversarial model, which is then a regulated layer in future generative model licensing.

shkkmo2y ago

It seems like this article makes a basic copyright mistake. I don't see any evidence that these are " reproductions" of source material like since no source image is linked to compare.

Instead, these are derivative works. We already have a flourishing culter of derivitave works, such as fan art that exist in various shades of legal greyness.

Some derivative works are fair use, some are not.

karmakaze2y ago

It shouldn't matter how the images/etc are created. The problem comes about when it's used as an original work by the person that's doing so.

legendofbrando2y ago

sjducb2y ago

I think it’s a question of what counts as publication.

If I then decide to publish the plagiarised article, then I have committed copyright infringement.

null_point2y ago

I suspect this may delay some short term progress by creating pressure on AI labs to train their models from data curated or synthesized in a way that is contentious of copyright law.

KETpXDDzR2y ago

airesearcher2y ago

8note2y ago

"from classic sci-fi movie"

How could you put that as the prompt without intending to infringe? Anything pulled from a classic sci-fi movie would be infringement. The term droid is also star wars specific?

wouldbecouldbe2y ago

What about non-mit source code, 100% it's trained on those as well.

asylteltine2y ago

I certainly hope so. You can’t just steal content and call it “””AI”””

ur-whale2y ago

It's not for generative AI that thing are about to get a lot worse.

It is in fact the very notion of Copyright is breathing its last breath, and it is fantastic to be alive to see it happen.

roenxi2y ago

Based on the rate of progress; I think this makes little difference to AI progress in the medium-long term.

digitcatphd2y ago

Rather than attempting to combat our obvious future, they should spend this effort to find ways to monetize and succeed in this new environment.

hahajk2y ago

> And a whole universe of potential trademark infringements with this single two-word prompt: animated toys

karmakaze2y ago

SubiculumCode2y ago

efields2y ago

It’s more interesting to me how these entities that operate the models start making money from them. They are a money pit and there’s not enough $20/month subscribers on earth to support them.

Enterprises that make content with this also don’t want to infringe on copyright. The AI companies don’t have a good story here. The value has not become evident after years.

_giorgio_2y ago

This guy built a career around nonsensical and catastrophic endings.

Everything that he sees has mysterious flaws that never happen.

caeril2y ago

Can we all have a moment of silence for poor Bob Iger? Maybe we can start a GoFundMe to help him out?

rolisz2y ago

Simple fix (at least for ChatGPT): ask it to avoid drawings with similarities to copyrighted characters.

logicchains2y ago

Avicebron2y ago

I'm surprised this is presented as a revelation? I did pretty much this same experiment ages ago as part of a suite of tests comparing the efficacy of different sized models..

renewiltord2y ago

You can try, but I have Mistral on my local computer and it doesn't need the Internet. And people have pirate dumps they're going to run this stuff through.

I'll just do it myself.

amelius2y ago

Just like we have the uncanny valley for robots, LLMs are in the unoriginality valley. Only when we get out of it will the copyright issues go away.

smitty1e2y ago

The DALL-E/*GPT revolution sounds like the death of personal and corporate property.

That's gonna leave a Marx[1].

[1] https://youtu.be/7WDKivqFOgA?si=nWq5aeKA4dLytX3Z

ofslidingfeet2y ago

I'm still waiting for people to figure out the whole point of an automated process is that it behaves the same way each time.

penjelly2y ago

> My guess is that none of this can easily be fixed.

also my concern, except it feels like many of LLMs "problems" cant be easily fixed

zanfr2y ago

no matter how you look at it; the cat is out of the bag. OpenAI could be censored but you can't censor the opensource

Log_out_2y ago

That sound, as if layers and layers of renteering aristocracy were forced to work again against their will.

AC_86753092y ago

So the models overfit the training data, essentially memorizing, instead of generalizing?

wayeq2y ago

We need to figure out how to ever so gradually move toward a post-copyright economy.

throwuwu2y ago

RecycledEle2y ago

If we get rid of unconstitutional copyrights in the US, this ges away.

Recall that according to the US Constitution, copyright can only be on on "science and the useful arts."

Alternately, we could restore a reasonable limit to the duration of copyrights, like 14 years.

pxoe2y ago

there's an easy fix. the easiest. just don't use data that you don't have the rights to use. apparently that's just impossible.

RandomGerm4n2y ago

Apart from this, it is mainly large companies that benefit from copyright laws. Why should we have laws that restrict progress just so large capitalist companies can maximize their profits?

1 more reply

skybrian2y ago

I wonder what Adobe Firefly does with these prompts?

oglop2y ago

I wasn’t shocked when I noticed I could query it about ANY math textbook I owned and it could talk with me about it. I did t bitch and gripe, I enjoyed it and have conversations.

Anyway, I’m in the minority I guess. I love that I can talk with it about books and news.

freddealmeida2y ago

not in japan.

Joel_Mckay2y ago

If ML cannot create copyrightable or patented material under current legal precedent, than shouldn't the prompt output be considered public domain regardless of content semblance?

Bag of popcorn ready =)

yieldcrv2y ago

a lot worse for cloud providers hosting generative AI

the models can be fine

gfodor2y ago

Gary Marcus is the master of AI FUD

octacat2y ago

I am expecting politicians would do some nice mental gymnastics regarding regulating this. All major IT companies are doing genai now and nobody wanna hurt the companies.

Intox2y ago

Or... things are about to get worse for copyright holders.

I don't see any developped country pressing the brake on AGI in the near future to protect a few copyright holders from getting "stolen" in hypothetic scenarios.

23 more replies

Baldbvrhunter2y ago

I imagine the argument might be like this:

I hire a session musician to play on my new single, paying him $100. I record the whole session.

I ask him to play the opening to "Stairway to Heaven" and he does so.

"Well, I can't use that as a sample without paying"

"Ok play something like Jimmy Page"

"Hmm, still sounds like Stairway to Heaven"

"Ok, try and sound less like Stairway to Heaven but in that style"

"Great, I'll use that one"

and I release my song and get $5,000 in royalties.

Should I be sued for infringement, or the guitarist?

The problem, I suppose, is that if I had said "play something like 70s prog rock" and he played "Stairway to Heaven" and I didn't know what it was and said "great, I'll use that".

Should I be sued for infringement, or the guitarist?

9 more replies

iainctduncan2y ago

1 more reply

sjfjsjdjwvwvc2y ago

Please ban all these AI companies, at this point I have enough OSS models, don’t really need any hosted service anymore.

IMO would be best if this stays a highly illegal technology that is only available to a few weirdo nerds /s

jdjdjdkdksmdnd2y ago

whodidntante2y ago

Simple solution, when gpt-5 comes out, just rename it Claudine, and the NYT will drop their suit

j / k navigate · click thread line to collapse