Skip to content

Top Best Ask Show New Jobs

Zuckerberg 'Personally Authorized and Encouraged' Meta's Copyright Infringement (opens in new tab)

(variety.com)

496 pointsspankibalt1mo ago453 comments

https://apnews.com/article/meta-mark-zuckerberg-ai-publisher...

453 comments

210 comments · 37 top-level

modeless1mo ago· 33 in thread

Funny how people are suddenly on Elsevier's side. It's clear to me that AI training is transformative fair use under existing law. Maybe this will be the case to prove it.

eloisius1mo ago

I find it grating that so many AI boosters try to frame pushing back against the AI industry as a sudden about-face for everyone that spent the last 20 years pushing back against the copyright industry. I’m also in favor of decriminalizing or legalizing small amounts of pot for personal use. That doesn’t mean I’m behind industrialized narcotic production on such a huge scale that it that it starts to distort the economy, and companies looking for new ways to add methamphetamine to every goddamn product.

protocolture1mo ago

>I find it grating that so many AI boosters try to frame pushing back against the AI industry as a sudden about-face for everyone that spent the last 20 years pushing back against the copyright industry.

What do you think the outcome of tightening fair use is going to be? Do you think its going to be most effectual against these big evil AI companies we are meant to fear? Or is it going to end up putting more individual creators on the end of Disneys pitchforks?

Like if you support creating a gun to kill a monster, that's great. But you need to understand that weapons rarely only target the person you want them to. And its unlikely that any bill that specifically targets a certain size or profit margin is going to make it all the way into law without being generalised to the approval of large IP holders.

Its much much (much) better to look at this as an opportunity to erode IP laws for everyone, than to make them worse and hope that your particular enemies are the only ones that are affected.

>That doesn’t mean I’m behind industrialized narcotic production on such a huge scale that it that it starts to distort the economy, and companies looking for new ways to add methamphetamine to every goddamn product.

Thats such a non sequitur. This isnt a weed legalisation argument, its "Do we make IP worse for everyone, because you dont like some people benefiting from fair use".

butlike1mo ago

Tell me more about these methamphetamine products. Inquiring minds would like to know!

dfxm121mo ago

It would be disingenuous framing because the argument against copyright stems from a belief that information should be free. Meta does not do things in this spirit. There's no about face needed...

2ndorderthought1mo ago

Speaking of ai and meth, have you seen videos of the palantir CEO Alex karp? Dude looks like he's regularly getting the same meth shots Hitler used to get.

But I hear you. One of my biggest tells that someone can't be reasoned with is when they resort to whataboutism without any consideration for how 2 situations can actually be different even if there is some commonality. It's a powerful bad faith argument technique. When that style of argument comes up I nod my head and walk away. Some people are just doomed.

nadermx1mo ago

I also find it funny, I said this regarding the other thread and article[0]

'"They then copied those stolen fruits"

How are these fruits "stolen" if they still have what was allegedley stolen?

Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act

And even if, arguendo, sure its stolen. The purpose of copyright is to "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries"

And you would be hard pressed to prove that LLM's haven't advanced the arts and sciences, so at bare minimum transformative, ie fair use.'

[0] https://news.ycombinator.com/item?id=48026207#48029072

Johnny5551mo ago

>How are these fruits "stolen" if they still have what was allegedley stolen?

If you write a book and I take it and embed its knowledge into my product that is so pervasive that no one needs to buy your book any more (and I don't even credit you so no one knows where that knowledge came from), to you really still have what was stolen? And I didn't even buy a copy of your book to copy it.

2ndorderthought1mo ago

Cool cool cool. So all the code and data you send to anthropic and chatgpt should be mass distributable to forward other peoples arts and science? All your meeting notes with ai summarizers, slack chats with bots? Might as well put your entire company and all plans for it on github mit licensed. Ill take a peek, see if there's anything valuable to me in that. Don't worry you can keep it all on your github too. It's still yours afterall. Copilot will be training on it too though btw

albedoa1mo ago

You were swiftly corrected about your misunderstanding under your original comment. Reposting it here, removing the quote farther from its context, and hoping to not be downvoted again is very weird!

nadermx1mo ago

I don't see how me quoting the actual complaint the news was about, in both threads, was me being swiftly corrected. If you where to base it on upvotes then this one shows I'm right and you got swiftly corrected here. In both cases it was relevant as both threads where not yet merged and about the same complaint. And held two positons on front page and I was adding to the discourse.

protocolture1mo ago

>It's clear to me that AI training is transformative fair use under existing law.

I wouldn't even go that far. Its an entirely new product. Its like the guy who sold you the keyboard demanding royalties for the software you built.

That the person who wrote the book couldn't predict a new use case for the book in training LLMs, is irrelevant. The book isn't in the LLM. Its not being sold with the LLM. Its one of billions of tools used to create the LLM.

People try and sell this as the AI companies extracting value from the poor little IP holders like Disney. Its maddening. That content is your cultural heritage. It already belongs to you, just some idiot has been granted a lifetime of exclusive exploitation. An LLM is trained on data you already own. Disney et al wants to exploit the new technology to extract even more money out of stuff created often decades ago.

At absolute worst its reverse engineering, which was supposed to be fair use protected in the US but apparently that's been somewhat eroded.

xigoi1mo ago

> The book isn't in the LLM.

An LLM is essentially a lossy compression of the training data. The book absolutely is in there, it’s just mangled to the point of unrecognizability.

gizajob1mo ago

If my book isn’t in your LLM, then prove it and don’t use my book to train your LLM.

conception1mo ago

Illegally obtaining copyrighted materials is usually the issue not the transformation part

akerl_1mo ago

Looking at the complaint ( https://publishers.org/wp-content/uploads/2026/05/2026-05-05... ), that seems like the part that's got the most solid foundation, especially given that while torrenting the books, they were also seeding to other peers.

The items they call out around training the models (and attempting to claim that each subsequent model generation should count as an additional instance of infringement) seem far less grounded in the current court interpretations of AI training.

King-Aaron1mo ago

Absorb all "our" IP without consent, in doing so remove "our" own source of revenue, and then repackage it as their own product. Not really fair use IMO.

visarga1mo ago

How does that work? Is it a kind of infringement without substantial similarity?

brendoelfrendo1mo ago

I think this completely misses the point... the point is that Meta pirated the media they used to train their model.

I am not a fan of US copyright law, but if I torrented millions of books, I would be facing a felony charge in criminal court and a (with statutory damages as high as $150,000 per title in cases of willful infringement) multi-billion dollar lawsuit in civil court.

In my opinion, this has nothing to do with whether or not AI training is transformative and this fair use, and everything to do with whether or not the laws apply to everyone equally. If Facebook isn't forced to pay billions and elect a sacrificial executive to serve prison time, then I will remain angry.

rvz1mo ago

> It's clear to me that AI training is transformative fair use under existing law. Maybe this will be the case to prove it.

That is not what this case is about. It is more about the illegal violation and piracy of copyrighted content done by Meta for commercial use and Zuck knew they were doing it.

Why did Anthropic settle [0] with a multi-billion dollar payout to authors after commercializing their LLMs that was trained off of copyrighted content that was illegally obtained and kept without the authors permission?

There's a reason why they (Anthropic) did not want it to go to trial. (Anthropic knew they would lose and it would completely bankrupt them in the hundreds of billions.)

AI boosters will do anything to justify the mass piracy and illegal obtainment of copyrighted material for commercial use (not research) which that is not fair use in the US. There is no debate on this. [0]

[0] https://images.assettype.com/theleaflet/2025-09-27/mnuaifvw/...

visarga1mo ago

I think copyright is far for being the most important aspect related to AI, it's geopolitical and economical. And even if it was the most important, there is only a case to be made for 1. that copy used to train models and 2. rare or induced regurgitation by targeted prompting.

The original work is not replicated identically, why would we replicate a work when it can be more easily seen in original or replaced with an alternative options online. We use AI to produce new outputs to new situations. We already have had drives and networking for plain copying.

whattheheckheck1mo ago

If i could ask for a summary from an llm vs buy a book id go with the summary. That eats into commercial use and the supreme court case sided with Gerald Ford when a newspaper published a small gist of his autobiography because it ate into the sales

Larrikin1mo ago

Every single Wikipedia article of a book or TV show has this summary. Ford should have lost.

2ndorderthought1mo ago

Yea nope. I like the full book without any loss of information. Even if I don't want to read the entire book. LLMs love to respond even when something is outside of their training set.

__loam1mo ago

It's not settled law so I'm not sure how that's clear to you.

jacquesm1mo ago

I think both Elsevier and the people that appropriate IP for training commercially deployed AIs purpose without the consent of the author(s) should be legal.

stiray1mo ago

It actually depends on evilness of the company. Elsevier is just less evil that Zuckerberg and Meta, while publishers are even less problematic. I dont think there is anything funny in that.

Or anything to defend on Meta. If they go out of business, humanity profits.

4k0hz1mo ago

Elsevier is shitty to people doing stuff that (imo) should be allowed. Meta is making money doing the same thing and not getting the same shittiness from Elsevier.

Elsevier at least works within the (admittedly broken) system, Meta does not.

blks1mo ago

When you use millions of copyrighted materials to bundle together to produce a commercial product, I wouldn’t call that a fair use. Especially when licensing of such material doesn’t explicitly allow that, the material wasn’t even purchased on consumer markets and your commercial product may be a competitor/analogue to the copyrighted material.

Not even going to all GPL stuff, that in a better world should have screwed all the slop companies

stackghost1mo ago

I'm not on Elsevier's side, but I still think it's bullshit that giant companies are allowed to do things at a scale that I'd go to prison for.

platevoltage1mo ago

That's always going to be true for the Capitalist class.

matheusmoreira1mo ago

The enemy of my enemy, and all that.

happytoexplain1mo ago

"Funny" is how dishonest snipes are framed. It such a common trope of internet quips, it's wearing me out. Can we please try to just format our disagreements without the snideness?

platevoltage1mo ago

Such a garbage take. This is not a parody or a critique. Mark Zuckerberg is not Weird Al Yankovic.

ben_w1mo ago· 28 in thread

A lot of people would be very pleased if this leads to Zuckerberg getting even the statutory minimum damages ($750?) on each infringement.

The previous infringement case with Anthropic said that while training an AI was transformative and not itself an infringement, pirating works for that purpose still was definitely infringement all by itself. The settlement was $1.5bn, so close to $3k for each of the 500k they pirated, so if Zuckerberg pirated "millions" (plural) it is quite plausible his settlement could be $6bn.

qingcharles1mo ago

What's frustrating is all those kids who got criminal charges for running MP3 sites back in the day [1], and this guy rips off every piece of media in existence and will walk away literally because he's too rich to be charged.

[1] See, e.g. https://en.wikipedia.org/wiki/Oink%27s_Pink_Palace#Legal_pro...

shrubby1mo ago

https://pluralistic.net/2025/04/23/zuckerstreisand/

Cory Doctorow wrote a nice summary of the Zuckerstreisand book by Sarah Wynn-Williams.

"First, Facebook becomes too big to fail.

Then, Facebook becomes too big to jail.

Finally, Facebook becomes too big to care."

falsemyrmidon1mo ago

https://en.wikipedia.org/wiki/Capitol_Records%2C_Inc._v._Tho...

24 songs and was at one point $80k per song, almost 20 years ago. Let's let Zuck off with an even 100k per infringement.

matheusmoreira1mo ago

Definitely what pisses me off the most. All these "pirates"? Arrested. Why isn't the copyright industry raiding the homes of these tech billionaires then? Why isn't SWAT pointing guns at their faces while the squad seizes all of their computers and equipment? Why aren't these CEOs in cuffs?

nadermx1mo ago

I just don't see why everyone seems to not be cheering that perhaps we are not going to go back to the days where all those kids are going to be re charged. It almost feels like everyone wants to go back to labels carpet bombing students with lawsuits[0]

[0] https://w2.eff.org/IP/P2P/riaa-v-thepeople.html

borgai1mo ago

didn't all of this ai stuff happen because they gave away llama? worth it imo

NoMoreNicksLeft1mo ago

What's frustrating is that I don't even consider infringement to be a crime. Why are you all so upset about this, rather than his real crimes?

matheusmoreira1mo ago

I'm a copyright abolitionist. I don't care at all that they're training AIs on copyrighted works. I care a lot that they're not getting relentlessly hunted down by the copyright industry for it like all the "pirates" that came before them. The copyright industry has actually ruined lives by litigating their "infringement" nonsense. It's only fair that they go after this guy as well.

His constant violation of people's privacy is also horrendous and worthy of condemnation, but that's not directly related to the copyright infringement matter. It's a separate issue.

ethbr11mo ago

You get Al Capone on the charge you can make stick.

hsuduebc21mo ago

I'm kinda being upset because on top of his ridiculously amoral and sometimes illegal behavior there are people which lives were ruined because they shared few mp3 files. Now this person once again — have absolutely no responsibility for his actions even for something so idiotic like copyright infringement when others were severely punished.

stubish1mo ago

Lets define more things society doesn't want to happen as not-crimes so we can do more of them.

verisimi1mo ago

Principles and law (that determines 'crime', a legal word) are not the same thing.

qingcharles1mo ago

Why not both?

bix61mo ago

What are his real crimes?

archagon1mo ago

Because the rich can do it and we can’t.

j-bos1mo ago

It's the increase in emotionality, principles loosely held, it allows a particular goal they get tossed, Tbc this extends far beyond the current topic and commenters.

_s_a_m_1mo ago

I a just world he should end forever in jail for the things he has done

grebc1mo ago

Nothing will happen to him/Meta while DJT is president.

He bought the best protection around for breaking the law.

dehrmann1mo ago

I'm not sure what Trump's levers are with this since it's a civil matter. There's no DOJ--it's publishers and an individual vs. Meta.

kevin_thibedeau1mo ago

He likes sham investigations of attorneys general.

timcobb1mo ago

Okay but... I am very unimpressed by this. How is it that he then gets to still be an AI monopolist/hegemonist? How's that fair? He basically force-acquired all this stuff without asking, now he's haggling for it later. Where are the criminal charges? Where is the deprivement of, if not freedom, then equity assets.

utopiah1mo ago

Here I am, finally cheering for IP lawyers. /$

gloxkiqcza1mo ago

For context, his net worth is ~$220 billion.

azinman21mo ago

And meta's worth is much more than that. He's not personally paying.

goofy_lemur1mo ago

I would be very pleased if Zuckerberg got away with it. I don't copyright or infringe, but honestly, he was the one guy of the big guys who released everything as open source.

If he did the right thing, then we should all support his choice to use it under fair use.

Freedom means that the state shouldn't punish a public benefactor.

It makes me furious to see programmers fighting against an open source hero.

If it was closed source for Meta profit, I understand.

But they gave it away free, so it infuriates me that people support damages for a public benefactor.

Churches and schools get free money from the government. We need a rule that open AI (not the company, I mean the actuality), can torrent whatever they want because it's for the public good.

Otherwise the rich companies win and can pay their sources and the small guys are screwed.

If Meta has to pay for their training data, they will need to profit from it and won't be able to offer it free.

Nobody in their right mind would ever support the publishers here.

bamboozled1mo ago

There will be not a single consequence for any of this.

nielsbot1mo ago

In a just system there would be jail time (if found guilty). Barring that a modest fine. Say, $1T.

LastTrain1mo ago

That’ll keep him from even thinking of doing something like that again! /s

qarl1mo ago· 23 in thread

I know people really hate AI training on their work - but is it really any different than a human reading it?

I know there's a complaint that AI can verbatim repeat that work. But so can human savants. No one is suing human savants for reading their books.

Producing copyrighted material, of course. Training on copyrighted material... I just don't see it.

EDIT: Making a perfectly valid point, but it's unpopular, so down I go.

Quarondeau1mo ago

There's a huge difference in scale. The human mind can only process a limited portion of all works available over a lifetime. Human learning is therefore naturally limited to small-scale reuse, which serves to keep it proportional.

A machine training on all copyrighted materials in the world for commercial purposes at an industrial scale makes it disproportionate.

qarl1mo ago

I see that as a distinction - but does it make a difference?

If a company hired hundreds of savants, then it would be illegal for them to read books?

I don't follow.

jryan491mo ago

I had to buy the copyrighted material before reading it... Meta apparently operates in a different legal system than me. That's my issue with it.

qarl1mo ago

Yes, I have no objection to that part. It's the arguments that training itself is the problem.

Sarah Silverman as the most prominent example.

jryan491mo ago

I mean the act of reproducing the copyrighted material is what is illegal. LLMs I've used for coding has outputted exact copyrights for code verbatim into my code before. When that happens it feels kind of fishy to be honest.

thomasahle1mo ago

The human savant will remember where they read it and give you credit. It might lead more people to read your work, and ultimately you make money.

The AI won't even know where the page of text it's seeing came from, and people will avoid your book as they can just ask the AI. So you make less money. (Talking about specialized technical books here.)

qarl1mo ago

Not necessarily.

nancyminusone1mo ago

No one is asking human savants about what they read 1 million times per day.

Suppose they did, and some guy was filling stadiums regularly to hear him recite an entire audio book. That would probably get the attention of someone's lawyers.

qarl1mo ago

I don't see your point. The problem is producing the copyrighted work, not processing it beforehand.

If it's illegal for AIs it should be illegal for humans, too. Is that really what you're arguing? It should be illegal for savants to read books?

SahAssar1mo ago

I don't think anyone is arguing that the consumption is illegal. It's the reproduction that is illegal.

Read a book, that's fine. Write a book, that's fine. Read a book and then write a book that is 99.9% the same as the book that you read and sell it for profit without a license from the original author, that's infringement.

Barrin921mo ago

>The problem is producing the copyrighted work, not processing it beforehand.

the distinction isn't particularly clear cut with an open source model. If it is able to reproduce copyright protected work with high fidelity such that the works produced would be derivative, that's like trying to get around laws against distribution of protected works by handing them to you in a zip file.

It's a kind of copyright washing to hand you the data as a binary blob and an algorithm to extract them out of it. That wouldn't really fly with any other technology.

And that's really where a lot of the value is mind you, these models are best thought of as lossily compressed versions of their input data. Otherwise Facebook ought to be perfectly fine to train them on public domain data.

grebc1mo ago

It’s different.

qarl1mo ago

Hm. I'm not sure I follow your logic.

grebc1mo ago

You asked, I answered.

If you’re struggling to comprehend that a person reading a book is different then you’re a bad bot.

fantasizr1mo ago

reading it after stealing it: gray area. producing & monetizing competing works devaluing the original is a problem

qarl1mo ago

So is it a problem when humans produce and monetize competing works? My understanding is that there quite an industry in humans reading books and synthesizing their points. Cliff's Notes, for example.

NoOn31mo ago

Why should an AI have the same rights as a human?

How about then to grant AI all other rights, for example, to allow voting?(sarcasm)

qarl1mo ago

We're not talking about rights, we're talking about illegal acts. If it's illegal for a machine to do it, how can it be ok for a human?

Just from a rational argumentation point of view. Clearly if a law is written saying as much, then sure. But there is no such copyright law like that yet.

NoOn31mo ago

The issue is certainly not so simple. But it seems to me, purely theoretically, that the rules don't necessarily have to be the same for living people and non-living machines.

pkaeding1mo ago

But machines don't do things. People do things, and they use tools/machines to do those things more easily or efficiently.

triceratops1mo ago

> I know people really hate AI training on their work - but is it really any different than a human reading it?

Yes it's very different. Humans need to eat, sleep, and pay taxes. You also have to pay them competitive wages.

qarl1mo ago

I'm not sure your argument is supported by the actual law as written.

triceratops1mo ago

https://news.ycombinator.com/item?id=48029673

There's nothing in the law to support your argument either. The law however does say, very unambiguously, that copying without permission isn't allowed . There aren't exceptions for "training" just because it's superficially similar to a human activity (reading a book). A human isn't allowed to hand-copy Harry Potter. Even if they bought all the Harry Potter books.

ipython1mo ago· 14 in thread

Just gonna say... Aaron Swartz faced years of prison time and ultimately decided to take his own life... for downloading scientific journal articles... to share freely with the world (aka not even profiting from it).

But a multi-billion dollar corporation downloading millions of copyrighted creative works so that they can reshape the entire labor market by training a new type of artificial intelligence model on that data set? Meh, sounds like Silicon Valley disruption, give the man a medal!

defen1mo ago

One man illegally downloading copyrighted material is a crime. Multinational corporations illegally downloading copyrighted material is the only remaining growth area in the US economy and vital to national security.

platevoltage1mo ago

They should make another one of those PSAs. "You wouldn't steal 10,000,000 cars".

lesuorac1mo ago

And Jstor dropped the lawsuit when Aaron deleted his local copy. DOJ didn't drop theirs.

I doubt Meta has deleted their local copy though ...

qingcharles1mo ago

It's absolutely unthinkable that Meta and friends aren't still using a corpus containing the entirety of every book they can obtain. There is no way they're building frontier LLMs without it. You can be sure as hell the Chinese are doing it, so the US corps are absolutely still doing it.

alex11381mo ago

And also I think MIT didn't defend Aaron but maybe I'm wrong about that

spongebobstoes1mo ago

Aaron Swartz was treated unjustly because copyright sucks. we should oppose such laws and treatment, not wield them as retributive tools against our opponents

it is wrong to advocate for everyone to be treated equally unjustly. better to advocate for the removal of the bad laws/structures

ipython1mo ago

It would be easier to advocate for the reform of those laws if they were actually applied evenly.

I’m not calling for its use as a “retributive tool”. Just that it be applied evenly.

jmye1mo ago

> not wield them as retributive tools against our opponents

No, we should apply them equally to Mark Fucking Zuckerberg (which is decidedly not retributive, however much you want to make an emotional appeal) until such time as they are repealed as laws. It’s not really that complicated.

Melatonic1mo ago

Truly ahead of his time

zajio1am1mo ago

Well, Meta also shared their AI models freely with world

bamboozled1mo ago

They released products for free use, they didn't release the code of those models for free. Which IMO would make some of what they did here right.

alex11381mo ago

Had Aaron copied Snapchat 5 times the DOJ would've been fine with it all. His fault for not having the foresight

alex11381mo ago

(I'm being sarcastic. Zuck gets rewarded for continually copying Snapchat features into his products)

TiredOfLife1mo ago

> Aaron Swartz faced years of prison time and ultimately decided to take his own life.

According to comments here that was totally deserved. You should not mess with copyright.

swader9991mo ago· 12 in thread

I take issue with the use of tense used in this framing. Its not 'infringed' its 'infringing' and to say that it happened is wrong, its happening and happening continuously in these models that are in use. To say a one time payment settles it is missing the whole scope of this theft.

Royalties are owed and continuously owed as these models are deployed and doing inference. How is it any different to paying a small pittance to someone every time a song is played?

ronsor1mo ago

Royalties for inference are unrealistic in a way that even royalties for training aren't.

The LLaMA models were released openly. Copies exist everywhere in the world. You aren't going to be able to charge someone for running `llama.cpp`; a court order ceases to have practical relevance at that point.

eaglelamp1mo ago

Inference might be unreasonable for a royalty agreement, but, in assessing damages, it is certainly relevant.

"I made enough copies for everyone" isn't a valid defense for copyright infringement.

swader9991mo ago

These models can provide citations so I don't see why they can't tick a royalty owed. I'm sure many here could help build this pipeline.

Aurornis1mo ago

First, LLMs do not reliably cite works. They are not looking things up in a database and repeating them. I think this false idea occurs a lot in people who don't understand what LLMs are or how they work.

Second, royalties are not required to cite a source.

Can you imagine how disastrous it would be to everything from news reporting to scientific publishing if that was the case?

ronsor1mo ago

... LLMs cannot reliably provide citations. If you ask for citations, and the model did not use a web search tool, then whatever "citations" you receive are unreliable. Please do not trust these models to be honest. Just because they can discuss a topic doesn't mean they "know" where the knowledge came from in the same way that you don't need to have studied physics to catch a ball.

platevoltage1mo ago

Perhaps it's not. Let's force Meta to pay royalties in the same way you have to pay royalties if you want to sample someone else's song.

kodt1mo ago

If you steal a book and read it, should you have to pay every time you use the knowledge gained or recall parts of it from memory?

teddyh1mo ago

No. People are not LLMs. And even if some argue that they are mechanically similar, they are legally distinct.

drfloyd511mo ago

If I charged people for the privilege of listening to me recite relevant parts of the book to them for profit? Yes. Depending on the copyright.

kodt1mo ago

So like a teacher?

mitthrowaway21mo ago

What if you steal a CD and then play it on your radio station each morning?

swader9991mo ago

If I perform a song in public then yes, I should pay the creator every time I play it. I fail to see the difference here.

jcalvinowens1mo ago· 7 in thread

I had to block meta's ASN on my personal cgit server a few weeks ago because they were ignoring robots.txt and torching it. Like hundreds of megabytes of access logs just from them, spread around different network blocks to clearly try and defeat IP based limiting. I couldn't believe it.

dawnerd1mo ago

I had to last year too, nonstop crawling, random urls that didn't exist. It looked like they were trying to proxy user queries through to a search endpoint too. The ASN matched so I know it wasn't someone spoofing them.

bflesch1mo ago

IMO ASN-based blocking should be much more common, but unfortunately it is not supported as a first-class configuration option in many common tools.

jcalvinowens1mo ago

Yeah, I dont know how anybody stays sane without it. I have a list of over a thousand ASNs I blackhole at this point...

Mine is a daily bash cronjob that fetches a text-based database and uses grep to build an nftables-apply script with all the IPs for the blocked ASNs. I keep meaning to share it, but it's embarrassingly messy I haven't had time to clean it up...

Henchman211mo ago

It would break the internet to make this available to the average person. A large swath would actively choose to block stuff like: all of Meta, Alphabet, Apple, Amazon, etc etc etc.

Anyhoo, now you mention it this is the tack I am going to take in my own network, thanks!

walrus011mo ago

It's a real pain in the ass because in the absence of ASN based blocking, you often have to give something a long list of IP ranges in CIDR notation, and be certain you don't "miss" even one ipv4 /23 or /24 or a crawler will get through.

hsuduebc21mo ago

Hey, how do you identify them? Is there a service to recognize which of these companies scrapped you?

jcalvinowens1mo ago

Every few weeks I run my nginx access logs through a script that uses the same textual ASN database to tally them up and spit out a summary report. There are many different sources for periodic textual ASN databases you can parse with UNIXy tools.

283042834092341mo ago· 7 in thread

So... "move fast and steal things"?

lm4111mo ago

When the AI scrapers were just getting started, that is basically what I thought - their plan was to scrape / suck up everything they possibly could before people realized what was happening and blocked them.

The rate at which they were spidering and scraping was so far beyond what any other supposedly legit spider was doing, it seemed like the logical explanation.

pseudalopex1mo ago

Move fast and break laws.

mil221mo ago

It started at the top and at the beginning.

vips7L1mo ago

The biggest theft from the working class that has ever happened.

platevoltage1mo ago

In Mark's case, he still breaking things too.

MengerSponge1mo ago

Always Has Been

eowln1mo ago

Steal things? What is this, the “you wouldn’t pirate a car” argument again? I thought we were well over that.

josefritzishere1mo ago· 6 in thread

I would rather Zuckerberg do 6 months in jail and probation than fine Meta.

Lammy1mo ago

You aren't going to be able to make me anti-piracy just because some corpo benefits from it too.

ginko1mo ago

People who don't believe in copyright shouldn't be punished for "breaking" it.

Corporations believe in copyright so if they "break" it they should get punished for breaking rules they made up themselves.

Generally the law should be more strict for corporations than for real people.

edit: People downvoting can you argue why you disagree? I do think it's fair for the law to be more strict on the powerful rather than on the powerless.

tintor1mo ago

but it is easier to enforce law on the powerless

idle_zealot1mo ago

I think this is an easy distinction to make: copyright is bullshit and knowledge should be free. I have no problem with pirates sharing information freely. I do have a problem with a company taking someone else's work and profiting from it. The only thing worse than copyright as it exists is copyright that can be selectively ignored when the powerful will it. Attempt to use copyright to promote Free software with the GPL? Ha, nope, copyright for me and not for thee; I'll train on your code and sell it back to you. You want to preserve access to a game or film that's unavailable or unplayable? Time to send the C&D and destroy you. Only bad things are possible.

Until we progress as a society to the point that we can put this system behind us we should at least fight to make enforcement uniform. In fact, uniform enforcement is probably a good starting point for arguing for abolition, as the pain of that enforcement is felt by proles and elites alike.

jmclnx1mo ago

I agree, time to start handing out real punishments, I think 6 months is way to small.

If this was you or me, we would be in prison for decades and have a fine in the millions. Time for these people to feel consequences.

As someone said, they will probably settle for around 6 billion, that is the same as say a $100 fine for us.

karanbhangui1mo ago

This comment could get its own DSM classification for how insane it is.

I'm all for strong justice, but you want to imprison an executive for decades for copyright violations?

0x3f1mo ago· 6 in thread

HN really loves the copyright lobby when it's against someone they hate, huh

teddyh1mo ago

The problem is people at large companies creating these AI models, wanting the freedom to copy artists’ works when using it, but these large companies also want to keep copyright protection intact, for their regular business activities. They want to eat the cake and have it too. And they are arguing for essentially eliminating copyright for their specific purpose and convenience, when copyright has virtually never been loosened for the public’s convenience, even when the exceptions the public asks for are often minor and laudable. If these companies were to argue that copyright should be eliminated because of this new technology, I might not object. But now that they come and ask… no, they pretend to already have, a copyright exception for their specific use, I will happily turn around and use their own copyright maximalist arguments against them.

(Copied from a comment of mine written more than three years ago: <https://news.ycombinator.com/item?id=33582047>)

frozenseven1mo ago

>wanting the freedom to copy artists’ works when using it

Learning from copyrighted content is legal - for both humans and AI. If Meta is in hot water for anything, it's piracy and/or storage of copyrighted material.

teddyh1mo ago

By ”using it”, I mean using the AI model.

amanaplanacanal1mo ago

I think it's more that the little guy gets the book thrown at them while the rich bitch gets a slap on the wrist. This is widespread, and is BAD regardless of your personal opinion on copyright.

0x3f1mo ago

Perhaps, but that doesn't make a win for the copyright lobby a win for the little guy. If anything, pro-copyright decisions are decidedly _bad_ for the little guy.

frozenseven1mo ago

Yeah, it's very hypocritical.

glaslong1mo ago· 5 in thread

All those lawsuits against students who downloaded but didn't even redistribute mp3s. Less than a fair use transformation. Just the file download itself. ... Lesson learned: those students should have stolen millions instead!

watwut1mo ago

The real distinguishing criterion is whether you are super rich or not.

butlike1mo ago

That may have been an information shaping campaign. If the end-user can get prosecuted, the discourse turns from a positive to a negative connotation inherently, which helps curb the behavior by the powers-that-be.

glaslong1mo ago

So if we want to run this strategy, I suppose we need to fine Meta out of existence, as an example to the rest.

Steeeve1mo ago

Actually, it was the uploads that were a problem. Not downloads.

spwa41mo ago

You mean the courts found a technical legal explanation for why very large scale copyright infringement by multinationals was legal, but infringement by individuals of multinational owned copyrights was to be HEAVILY punished?

No shit.

Of course, the excuse doesn't even apply: the offense of the tech bosses is not training these models (they had that declared legal the second it became clear only the big companies would be training big models), the offense of all these tech companies is running a piracy site. Taking copyrighted works, storing them, reproducing them and then publishing the results to third parties, in many cases for payment, and organizing this practice knowingly, willingly as a company. Paying others to help them do it. This is the worst copyright offense one can possibly commit. It is what one public prosecutor referred to in the Nappster case as "organizing a criminal cartel to violate criminal law on a huge scale".

Tech bosses weren't sued for downloading, in other words, they were sued for uploading. For asking payment for publishing copyrighted works, without any money going to the authors.

When Kim Dotcom did that, in the words of the US Attorney general, this is "charges of criminal copyright infringement, racketeering, and money laundering" (you see, getting paid for criminal activity is money laundering, a charge that was also made against teenagers selling warez cds)

ChatGPT tells me, unaware and unwilling to discuss the INCREDIBLE unfairness, that in the US, first-time offenders can face up to 5 years in prison, while repeat offenders can face up to 10 years PER OFFENCE. ChatGPT is unwilling to discuss it.

The courts are also unwilling to discuss this, but no worries! New technicality: only a public prosecutor gets to ask ...

Dario Amodei wilfully committed large scale copyright infringement, as did all the tech bosses from Musk to Bezos ... and "strangely" nobody in any court even mentioned how much 10 years times 500,000 is, despite systematically threatening that punsihment repeatedly in the cases against teenagers.

Note that the law is extremely clear that company management IS NOT shielded if ordering criminal actions (violating criminal law, as opposed to violating a contract). In that case, company management carries full criminal culpability, INCREASED from if they did it themselves. Of course, this is only ever applied for refusing to pay tax or court fees.

If the law were applied alike and fairly to individuals and tech bosses, Amodei would have to be VERY lucky the human race still exists by the time his corpse leaves prison.

Telaneo1mo ago· 5 in thread

Looking forward to the personal liability.

I've wondered what the legalese justification for letting liability evaporate as it does so often with corps. So far the reasons I'm left with are 'shrugs' and 'the relevant provision (seemingly? apparently?) simply don't apply', neither of which are any good.

I was going to make a joke about how we should attach magnets to Aaron Swartz' corpse, since that'd make for a pretty potent energy source, given how fast he must be spinning. But honestly, I think he would have seen this sort of thing coming, given how his case was handled and how things really haven't gotten any better.

Aurornis1mo ago

The handling of Aaron Swartz’s case was a travesty, but he wasn’t indicted for piracy. The charges were for fraud, unlawfully accessing a protected computer, and damaging a computer.

In the years since the basis of the case has been forgotten and replaced with an assumption about piracy, but it was a case about unlawful access.

Telaneo1mo ago

Given Swartz's intent and actions, I mostly don't care about the difference. Someone taking the clean copy of Star Wars (that probably doesn't exist) of Disney's servers and putting that up on the internet for everyone to download is mostly just doing copyright infringement. The computer equivelent of breaking and entering is a sideshow and a means to get to some files and little more (assuming they didn't do much more). The reasons I care about breaking and entering is because what usually follows is stealing and a violation of privacy (or if the case of computers, the latter, as well as the computer being turned against my own interests) neither of which is the case when it comes to what Swartz did. The breaking of the door itself isn't some sinful act, that should be punishable in and of itself.

The law doesn't see it that way, but it is not the ground truth.

j-bos1mo ago

Yeah, cpaa is a loaded gun pointed at anyone who's ever touched a computer.

woah1mo ago

Alternate reality Aaron Swartz escaped canonization and is now running an AI/crypto startup that pays you to upload training data with his YC alum buddies

Telaneo1mo ago

Every now and then, I feel like we live in the worst possible world. Then I realise it could be much worse.

This does not comfort me.

zx80801mo ago· 5 in thread

Can someone explain why are we reading this instead of "Meta was fined for copyright infrigement" news?

tbrownaw1mo ago

Well the article says this is the start of a lawsuit, so maybe wait for it to work its way through the courts?

2ndorderthought1mo ago

Because meta will delay any case for several years. Then the lawyers will settle for 1/100th to 1/1000th of what they stole quietly. Meta will rebrand and change its name again just like it did after its last major scandal.

No accountability for rich people has funny patterns like this.

Cider99861mo ago

They might not need to change their name. I don't think that copyright infringement is seen as bad by Americans compared to the privacy stuff that Facebook is known for—not that most Americans care about privacy, I guess I don't really know why Facebook rebranded.

Personally, I would be happy if AI companies are what finally take down intellectual monopoly (intellectual property). I know being anti-intellectual-monopoly isn't a common view, but i don't see average people thinking it is so important—as you can see by the huge increases in piracy recently. Could be wrong about this, I haven't done research on public opinion about copyright.

Honestly, this whole case could be great. Either copyright loses, good for us. Or Zuckerberg loses, also good for us.

I would say that copyright loses is better for society than Zuckerberg loses because, my wish for Zuckerberg to lose is from hatred, while my wish for copyright to be abolished is from my wish to help humanity.

Even Supreme Court justices[1] have said the case for copyright is thin.

[1] (before he became a justice) https://en.wikipedia.org/wiki/The_Uneasy_Case_for_Copyright

gizajob1mo ago

They don’t need to rebrand - “Meta” (after / exceeding) is a catch all for whatever they’re being meta at today: piracy / privacy infringement / theft / slop production etc.

solid_fuel1mo ago

In 2024, voters signaled that they don't care about corruption when they reelected the most corrupt administration in American history. Since then, there has been a widespread understanding that the rich will not face consequences in this country. For example, take a look at the Trump administration's suppression of the Epstein files. Or the Trump families cryptocurrency schemes. Or the ridiculous ballroom.

Anyway, the point is - there will be no justice until the citizens of the united states demand it.

UltraSane1mo ago· 4 in thread

Remember when nerds loved saying "information wants to be free"?

phyzome1mo ago

That was intended as a warning, not an aspiration. Some people misunderstood.

UltraSane1mo ago

No, it was always meant as a good thing and was usually said in the context of censorship, which copyright is really just a form of.

IAmBroom1mo ago

Remember when the nerds said, "The law should apply equally to all"?

UltraSane1mo ago

That was never a common saying of nerds. But it is true.

spate1411mo ago· 3 in thread

> a Meta spokesperson said, “AI is powering transformative innovations, productivity and creativity for individuals and companies, and courts have rightly found that training AI on copyrighted material can qualify as fair use. We will fight this lawsuit aggressively.”

> Authors have sued AI companies for copyright infringement before - and lost.

So, basically nothing will come out of this

anthk1mo ago

Until Sony, Nintendo, Disney... sues them and Zuck craps down his pants. And the NSA themselves, too; because for sure they are half-backed from them. If they keep pirating down Japanese and European media, these can just wipe their asses with USA licenses and declare all media from the US un-Copyrighteable Europe and Japan.

fantasizr1mo ago

they'll litigate how meta acquired those materials to train. you can do whatever you want with a book after it's in your house. but how did it get there?

gizajob1mo ago

They’re already on record as hoovering up Library Genesis and Anna’s Archive. For their “fair use” copyright bonfire to train their LLM.

So not are these publishers rightfully pissed, Meta didn’t even give them the $6.99 for each epub to begin with. They’ve stolen the whole thing as part of this “fair use” campaign to destroy human authorship free of even the most basic remuneration.

motbus31mo ago· 2 in thread

I know personally a case of a engineer who was told to do something despite all the legal problems because the company had lawyers for a reason

Telaneo1mo ago

I'd love for that to come out during discovery when the lawsuit hits, but it probably never will. Blowing the whistle is also not a great option in this economy, although I wish more people did.

nojvek1mo ago

If it wasn’t written down and there’s no recordings of it, hard to prove.

runjake1mo ago· 2 in thread

I don't have strong opinions on Zuck needing to be punished for this, because I have friends and family doing the same thing, although perhaps not at the same scale. I myself do not download copyrighted content. I think "rules for thee, not for me" goes both ways.

FireBeyond1mo ago

How much revenue have your friends and family made from "doing the same thing"?

runjake1mo ago

Some. In some cases they've "stolen" tens of thousands in content. Like I said, not at the same scale, but the same "crime" nonetheless.

I'd much rather prosecution focus on Zuck's more serious crimes against privacy and civilization as a whole. But maybe this is a small start?

_doctor_love1mo ago· 2 in thread

"You can be unethical and still be legal that’s the way i live my life"

- Mark Zuckerberg

_doctor_love1mo ago

Note for the downvoters: this is literally what he said.

senordevnyc1mo ago

I downvoted because this was from an IM to a friend back in 2004, when Zuckerberg was twenty years old. I'm sure if you already hate him, this is so juicy and interesting, but I don't find 99.99% of the dumb shit young adults say to their friends in private conversations to be informative of almost anything.

danielmarkbruce1mo ago· 2 in thread

Except, as the article says.... it's not copyright infringement. Whether it should be or not is another issue.

hoppyhoppy21mo ago

>But the latest lawsuit alleges that Meta and Zuckerberg deliberately circumvented copyright-protection mechanisms — and had considered paying to license the works before abandoning that strategy at “Zuckerberg’s personal instruction.” The suit essentially argues that the conduct described falls outside protections afforded by fair-use provisions of the U.S. copyright code.

danielmarkbruce1mo ago

One can allege all manner of things.

The title is clickbait at it's worst. The situation around copyright and AI is stock standard "CEO makes a decision in an area that is clear as mud".

nadermx1mo ago· 2 in thread

"They then copied those stolen fruits"

How are these fruits "stolen" if they still have what was allegedley stolen?

Dowling v. United States, 473 U.S. 207 (1985): The Supreme Court ruled that the unauthorized sale of phonorecords of copyrighted musical compositions does not constitute "stolen, converted or taken by fraud" goods under the National Stolen Property Act

And even if, arguendo, sure its stolen. The purpose of copyright is to "To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries"

And you would be hard pressed to prove that LLM's haven't advanced the arts and sciences, so at bare minimum transformative, ie fair use.

RIMR1mo ago

I think you are confusing the idiom "stolen fruits" with an actual accusation of criminal theft. Aside from its use in this phrasing, neither "theft" nor "steal" appears anywhere else in the article.

nadermx1mo ago

The article, references the complaint. And even then, why use it at all?

soundworlds1mo ago· 1 in thread

I should hope that if Zuckerberg isn't severely punished for this, it at least sets a legal precedent for every other person to do the same with immunity.

All the Aaron Schwartzes of the future could freely share scientific papers with the world.

agnosticmantis1mo ago

Willing to bet they'll lobby for regulatory capture and raise the drawbridge for the little guys.

pessimizer1mo ago· 1 in thread

Shouldn't this stuff trigger RICO? Why do torrent site operators get led off in cuffs for running operations that usually lose money, but Zuck doesn't?

RICO specifically cites "criminal infringement of a copyright" as laid out in 18 U.S. Code § 2319. If the CEO tells his employees to download hundreds of thousands of works illegally in order to carry out his money-making scheme, how is that not organized crime even if (dubiously) LLM training on the material is fair use?

-----

RICO: https://www.law.cornell.edu/uscode/text/18/part-I/chapter-96

Definitions: https://www.law.cornell.edu/uscode/text/18/1961

> As used in this chapter — (1) “racketeering activity” means (A)[...]; (B) any act which is indictable under any of the following provisions of title 18, United States Code: [...], section 2319 (relating to criminal infringement of a copyright),[...]

18 U.S. Code § 2319 - Criminal infringement of a copyright: https://www.law.cornell.edu/uscode/text/18/2319

-----

edit:

> 18 U.S. Code § 1962 - Prohibited activities

> (c) It shall be unlawful for any person employed by or associated with any enterprise engaged in, or the activities of which affect, interstate or foreign commerce, to conduct or participate, directly or indirectly, in the conduct of such enterprise’s affairs through a pattern of racketeering activity[...].

https://www.law.cornell.edu/uscode/text/18/1962

From the lawsuit:

“Meta — at Zuckerberg’s direction — copied millions of books, journal articles, and other written works without authorization, including those owned or controlled by Plaintiffs and the Class, and then made additional copies of those works to train Llama,” the suit says. “Zuckerberg himself personally authorized and actively encouraged the infringement. Meta also stripped [copyright management information] from the copyrighted works it stole. It did this to conceal its training sources and facilitate their unauthorized use.”

alex11381mo ago

> Meta also stripped [copyright management information] from the copyrighted works it stole. It did this to conceal its training sources and facilitate their unauthorized use.

WTF

Steeeve1mo ago· 1 in thread

The title of the article and the content of the article are not the same. Zuck has been accused, according to the article, but then the article itself basically says that Meta was pursuing licensing and then at some point the business unit responsible for pursuing that licensing was told not to, presumably because there was a fair-use strategy of some sort in place so licensing would not be required.

theonemind1mo ago

I saw something about this. It would seem like it would be hard to obtain all of those licenses, probably impossible, and then if you want to go on to pirate more, that you licensed stuff kind of makes it look like you knew or believed you should've done it for all of them, which I think would make infringement willful, and imply some cognizance of guilt?

When you think about the objectives and constraints on the table, and how disproportionately light penalties imposed on large corporations can be, if you can muster any kind of crappy argument, doing absolutely zero licensing is the no-brainer clear win. You get all of the material. You avoid a massive cost. Then the tech friendly Federal courts of the Trump administration will interpret all of the laws as far as possible in your favor and impose the lightest penalties they reasonably can.

It's a no brainer. License none of it, it's more data, it's cheaper, it's easier, the win is blinding. But if you license, you pay so much, if you use anything you didn't license you've tipped your hand on cognizance of guilt, blahblah. The contrast is stark.

pixel_popping1mo ago· 1 in thread

That's non-sense to think every single model provider hasn't done this, where are they supposed to get their data from? I think it's a bit unfair to assume it's fine for Kimi, Deepseek or the hundred of other models but not for Meta.

Jensson1mo ago

> I think it's a bit unfair to assume it's fine for Kimi, Deepseek or the hundred of other models but not for Meta.

The difference is how easy they are to sue. Good luck suing Chinese companies over this, but suing American companies is much easier.

palata1mo ago· 1 in thread

Too rich to care.

alex11381mo ago

Honestly, too rich potentially off fraud

Consider the case of someone who gets banned but Facebook keeps collecting money on their business account. Or consider the case of Facebook's video metrics scandal, or... whatever. It's a little fuzzy translating how much value equates to how much stock price equates to how much real-world is-this-useful-to-me but it does matter when FB is accused of marketing (Aaron Greenspan, thinkcomp, has brought this up, in his 2019 testimony to UK parliament) advertising to more people in a region or country than actually physically exist

So fraud builds on itself, you have more fraud money to pay lawyers to try to defend you in fraud cases

dmitrygr1mo ago

Who will be the first to implement a one-layer three-weight model and add it to BitTorrent? Let it “train” on all downloaded files. That makes it fair use. Am I doing this right?

bawolff1mo ago

Does it matter? The company's liability would (i assume) not change if the ceo authorized it or some other high level figure authorized it.

The question to answer is, did it happen and if so is this copyright infringement (not covered by fair use), not which company official authorized it.

coolThingsFirst1mo ago

I love how there's not even a class action lawsuit against AI companies that stole the work of everyone to train their models on.

Who gave permission to Anthropic/ClosedAI to scan hundreds of thousands of books to feed to their systems which they commercially sell. Why is this the new normal. Even GitHub a month ago was like if you don't opt out we will read even your private repos for AI training.

Tech is turning into next level BS, I don't know if it always was like this but this has pierced even the very bottom.

SrslyJosh1mo ago

Rules for thee but not for me.

tabs_or_spaces1mo ago

Why is there only a fine and not also the seizure/forfeiture of the stolen property and the derivative works/products built on it?

Like the fine means nothing to meta, and they'll still be the beneficiaries of their infringement.

In this current state, you really just need to have enough money to bypass this lawsuit and be on your way.

HumblyTossed1mo ago

Waiting for the perp walk.

Tired of the double standard that CEOs get away when bad things happen (because they can’t be everywhere all the time) but all the benefits when the company makes a great profit (because they’re personally driving results!).

forestingfisher1mo ago

Based. If i read a book from a piracy site, i can still cite that book publicly. This should also apply to AI models. I am also opposed to copyright at all, but that’s another question

lenerdenator1mo ago

The behavior will continue until a consequence is imposed.

andai1mo ago

And thus sparked the entire sector of open weight LLMs...

_s_a_m_1mo ago

Cant wait for absolutely no consequences. Consequences are for peasants like us.

giannicmptr10001mo ago

this dude got in over his head with the evil empire, it is interesting how he learned judo and tried to surf, that being said I despise social media and what it did to society

Der_Einzige1mo ago

Good.

dbg314151mo ago

https://www.tomshardware.com/tech-industry/artificial-intell...

> "81.7TB"

https://en.wikipedia.org/wiki/United_States_v._Swartz

> "approximately 70 gigabytes"

j / k navigate · click thread line to collapse