Meta torrented & seeded 81.7 TB dataset containing copyrighted data (opens in new tab)

(arstechnica.com)

1270 pointsgameshot9111y ago938 comments

938 comments

275 comments · 96 top-level

gizmo1y ago· 76 in thread

Based on the encyclopedic knowledge LLMs have of written works I assume all parties did the same. But I think there is a broader point to make here. Youtube was initially a ghost town (it started as a dating site) and it only got traction once people started uploading copyrighted TV shows to it. Google itself got big by indexing other people's data without compensation. Spotify's music library was also pirated in the early days. The contracts with the music labels came later. GPL violations by commercial products fits the theme also.

Companies aggressively protect their own intellectual property but have no qualms about violating the IP rights of others. Companies. Individuals have no such privilege. If you plug a laptop into a closet at MIT to download some scientific papers you forfeit your life.

lolinder1y ago

Yes. And the problem here isn't that companies get away with doing things like this, the problem is that individuals don't. Attempting to lock information behind a nightmarish legal system is the problem.

I'm pretty much at the point now where I don't buy the "copyright incentivizes creation" argument any more. Copyright, like advertising, incentivizes creation by enormous corporations, but also like advertising it incentivizes creations that overwhelmingly have little value.

Creative individuals don't need copyright to be incentivized to create—they need a safety net that gives them the freedom to spend time on the creativity that naturally wants to bubble out. If the goal is to encourage creativity, copyright is a lousy and enormously expensive substitute for Universal Basic Income.

post-it1y ago

Also, in Canada, it's basically impossible to protect your IP as an individual due to the astronomical cost and lack of options to recover that cost. So copyright will never incentivize my creations, or those of any small creator.

Lucasoato1y ago

> If you plug a laptop into a closet at MIT to download some scientific papers you forfeit your life.

In case anybody here doesn't know, that's a reference to Aaron Swartz, an activist (and Reddit co-founder) that was risking 35 years in prison and a $1 million fine just for downloading a lot of academic papers from JSTOR. He eventually took his life because of the pressure. May his soul rest in peace.

3 more replies

ysofunny1y ago

the english empire once tried to mantain a monopoly over steam loom machines

the americans cheated their way to competition,

heck, even before that, the english empire got jumpstarted by stealing gold from the spanish (who were themselves exploiting it away from aztec and other mexican natives)

I'm saying it's business as usual, but also, culture doesn't work like tangible physical widgets so we must stop letting a few steal this boon of digital copying by means of silly ideas like DRM, copyright, patents. all means to cause scarcity

choult1y ago

Hollywood became popular for filmmaking because they were literally the opposite side of the country from Thomas Edison and his patents...

3 more replies

miltonlost1y ago

People criming in the past is not an excuse for companies committing crimes today. You’re excusing lawlessness.

Cain killed Abel and got away with it!! I can kill someone today too!!!

3 more replies

nottorp1y ago

Interesting, if we're to trust what NotOpenAI and Facebook say about their IP, the US should pay the UK reparations for IP theft based on textile industry profits starting in the 1850s until today?

portaouflop1y ago

Why do I get sued when I share some BitTorrents but $bigcorp can just do it with 1000 scale without problems?

The issue here is not copyright/patents/etc - the issue is that the law is applied selectively — the issue is that Aaron Schwartz is dead for sharing knowledge with the public and Zuccborg is a billionaire building his torment nexus

1 more reply

sebzim45001y ago

I don't think I've heard the term "English empire". Is it an attempt by the Scottish to pretend they weren't involved?

3 more replies

pockmarked191y ago

> Spotify's music library was also pirated in the early days.

I want to know more, please enlighten me (anyone who knows). I read the book "The Spotify Play" and it made it seem like the pirated music was an internal-only thing and not something available to customers. Is that true?

3 more replies

marcosdumay1y ago

> If you plug a laptop into a closet at MIT to download some scientific papers you forfeit your life.

Just to point, but the material in question was public domain, so nobody had even a copyrights claim over it.

1 more reply

Cthulhu_1y ago

Crunchyroll started off as a straight up piracy site, it now has millions of paying subscribers and was sold to Sony for over a billion a few years ago.

gnfargbl1y ago

I think if Google attempted to download the entirety of JSTOR with the express intent of making the full dataset freely available, then Google would also face legal consequences.

It's true, and relevant, that Google would feel those consequences much less sharply than Swartz did.

vintermann1y ago

Don't buy into the rhetoric and call it "consequences". It's always a choice to sue, a choice to prosecute, and this would be true even if these choices were made consistently and impartially (which they certainly aren't).

1 more reply

josefx1y ago

Google book search was declared fair use and copyright holders ended up having to explicitly request removal of their works.

Apparently he would have gotten away with downloading the JSTOR database if he made it clear that he intended to only publish half of each paper.

coliveira1y ago

Yes, these companies are based on massive IP and copyright theft. And they still want to lecture others about their "property rights".

immibis1y ago

Something to understand about capitalist competition (also in politics) is that it's a war. Not one with guns and bombs, but more like a cold war, with espionage and hacking and just generally doing anything you can to gain an advantage without bringing negative consequences on yourself.

The limit is what you can actually get away with, not what the rules say you can get away with, and the system aggressively selects players who recognize this. It's amoral - there is no "ought", only "is". An actor gets punished or not, with absolutely no regard to whether it "should" get punished. One thing is consistent: following the rules as written means you lose.

You can see it in Y Combinator (and other) startups. The biggest ex-startups are things like AirBNB (hotels but we don't follow the rules but we don't get punished for not following them) and Uber (taxis but we don't follow the rules but we don't get punished for not following them).

One way to not get punished for not following the rules is to invent a variation of the game where the rules haven't been written yet. I again refer you to AirBNB and Uber; Omegle also comes to mind, although they didn't monetize.

Viewed in this light, Aaron Swartz's mistake was not the part where he downloaded journal articles, but the part where he got caught downloading journal articles. Shadow library sites are doing the same thing, minus the getting caught. So are Meta and Google and OpenAI. sci-hub is only involved in a lawsuit because it got caught and is now in the stage where it finds out whether it gets punished or not.

oblio1y ago

> Something to understand about capitalist competition (also in politics) is that it's a war.

Turns out there are 2 simultaneous wars there. One where companies and individuals compete ruthlessly.

And another one where if non profit associations of individuals form, guns come out.

soheil1y ago

Aaron committed suicide and FBI going after him was meant more as a lesson to the other kids at MIT than anything.

MegaUpload did the same, kim dotcom got raided in his sleep by FBI in New Zealand! So no I don't buy your reductionist argument, there are forces at play that allow companies with founders with the likes of Google to get away with it but not others.

yowzadave1y ago

> Youtube was initially a ghost town (it started as a dating site) and it only got traction once people started uploading copyrighted TV shows to it

To this day, there are a huge number of videos that show copyrighted content on YouTube; they are usually crappy clips, reversed and with different music playing in the background to avoid automated detection.

belter1y ago

"Zuckerberg was at White House for meetings on Thursday" - https://www.reuters.com/world/us/zuckerberg-was-white-house-...

Wowfunhappy1y ago

> Based on the encyclopedic knowledge LLMs have of written works I assume all parties did the same.

I don't understand why you wouldn't just buy copies of the books. Seems like such a relatively inexpensive way to strengthen your legal case.

freeone30001y ago

Buying a copy of the book doesn’t grant you the right to copy it. That is what copyright is for.

2 more replies

londons_explore1y ago

Pretty sure that even if you gave a purchasing team enough money for retail price and a list of all books ever published, they wouldn't be able to buy even a quarter of them.

1 more reply

jokethrowaway1y ago

Buying the books won't automatically give you permission to use the content commercially

gosub1001y ago

thanks to the byzantine copyright system, you can't easily do it. Plus, just speculating, but maybe by paying, it establishes "consideration" for some implied contract? "You implicitly entered a contract with us by purchasing the book, then violated the contract by 'distributing' the material for commercial use" ?

1 more reply

cess111y ago

Too much paperwork, too much effort. These are important people, doing much more important stuff than whatever book authors do.

Or so they think, I think.

1 more reply

plasticbugs1y ago

I briefly worked for Crunchyroll, which began life as an anime pirating service with subtitles. The contracts with the Japanese anime publishers came later. Now they vigorously protect their content from "pirates".

electriclove1y ago

Some can pirate on a large scale and see no repercussions.

Some can steal from stores and see no repercussions.

Some can steal from others and see no repercussions.

Some can violently harm others and see no repercussions.

Some can damage property and see no repercussions.

Some can’t. This world is not right.

1 more reply

1vuio0pswjnm71y ago

"Spotify's music library was also pirated in the early days."

"Ek, who had been the CEO of the piracy platform uTorrent, founded Spotify with his friend, another entrepreneur named Martin Lorentzon. Both-Ek at 23 and Lorentzon 37-were already millionaires from the sales of previous businesses. The name Spotify had no particular meaning, and was not associated with music. According to Spotify Teardown, the company developed a software for improved peer-to-peer network sharing, and the founders spoke of it as a general "media distribution platform." The initial choice to focus on music, the founders said at the time, was because audio files are smaller than video files, not because of a dream of saving music.

In 2007, when Spotify first publicly tested its software, it allowed users to stream songs downloaded from The Pirate Bay, a service for unlicensed downloads. By late 2008, Spotify would convince music labels in Sweden to license music to the site, and unlicensed music was removed. From there, Spotify would take off across Europe and then the world."

https://qz.com/1683609/how-the-music-industry-shifted-from-n...

sylario1y ago

And Hollywood was created on the west coast because for intellectual property it was still the far west and it allowed them to ignore patents on movie technologies.

1 more reply

cess111y ago

It's roughly the Spotify story too. They had an extremely impressive catalog very early, way before they were bought by the entertainment cartel. The founders had background in torrenting and the initial product was quite similar to The Pirate Bay but with clearly capitalist ambitions and branding, in contrast to the anarchist leanings of the Pirate Bureau and rather anarchic attitude of The Pirate Bay.

BrenBarn1y ago

Exactly. We need leaders with the political will to apply a "financial death penalty" to companies that engage in this kind of brazen behavior. That means all assets seized, the company dissolved, personal assets of executives seized, executives jailed. People running companies should live in mortal fear of ever doing the things that they routinely do today.

1 more reply

wcfrobert1y ago

VC and startups are fundamentally about disruption. You can't make an omelette without breaking a few eggs (laws). The incumbent players are not going to sit still and let things be "disrupted". A common response is to make sure the public knows about the broken eggs. I would say youtube, Google, Spotify, Uber, doordash, etc. all have made my life much better.

2 more replies

vkou1y ago

> Google itself got big by indexing other people's data without compensation.

So in other words, it got big by providing free user traffic to people's websites without asking for compensation?

You generally don't charge the phone book money to include you in it. It's actually the other way around.

sandeepkd1y ago

Reminds me of recent discussions about similar topic, what may clearly look like a crime can be treated differently depending on if you do it as an individual or as a company. Somewhere down the line its all about understanding the limits and boundaries of the system, its a skill in itself.

yurlungur1y ago

I think the difference may be LLMs may not be laundered clean of copyright data anytime soon. Even if chatgpt got big and profitable, it's not so clear that it won't contain copyrighted data as that may simply be necessary to train the best models.

1 more reply

dcchambers1y ago

I guess the solution is to create a shell company for your illegal activities?

georgemcbay1y ago

The modern solution has been to grow so fast that by the time anyone can go after you legally you've already amassed so much money/power that you can have the laws rewritten (or at least enforced) around your existence.

IMO part of the reason the SV tech bros are embracing right wing grift culture so publicly now is that this method, which had been serving them well for decades, doesn't really work without the infinite free money lending spigot being wide open.

1 more reply

Cumpiler691y ago

You must be new to billionaire business practices: break the rules first, ask for forgiveness later.

By the time the cheque comes, your illicit venture either went bust or you built a bilion dollar empire capable of buying the best lawyers and lobbying to walk away clean.

sneak1y ago

> If you plug a laptop into a closet at MIT to download some scientific papers you forfeit your life.

I’m opposed to copyright and pro-aaronsw, but the state did not kill him.

1 more reply

modzu1y ago

i know of a company that poisoned an entire town! thats terrorism if done by an individual. the company still exists, just paid a settlement and carried on...

3 more replies

pbh1011y ago

> Google itself got big by indexing other people's data without compensation

Weird framing given how much value was and is still placed on Google driving traffic to you

mrkeen1y ago

For Google's case the order was reversed.

Google used to send customers to your site. Now they try to show you the information on their site so that the customer doesn't need to go to your site.

1 more reply

joshstrange1y ago

Even before the LLM-craze Google was showing their Answers box or whatever it was called at the top of the results that told you the answer (sometimes) so that you didn’t have to visit any website.

1 more reply

newsclues1y ago

Comprehensive intellectual property needs to happen for the modern (digital) era.

Basically the entire legal system needs to be retooled and rethought for computers.

actionfromafar1y ago

Looks like the entire legal system is being retooled at the moment.

threeseed1y ago

No we just need to enforce the existing laws.

And the legal system is for humans not computers.

2 more replies

yard20101y ago

RIP Aaron Swartz

soheil1y ago

So be a company? Last I checked it costs a couple of hundred dollars to form an LLC, what am I missing?

cyanydeez1y ago

Mmm, the broader point is: laws are are as real as the cash you can pay a lawyer to fight.

smugma1y ago

Spotify was born as a response to piracy. Why do you say their catalog was pirated?

mrtesthah1y ago

Don’t forget the original developers of Skype also created Kazaa first.

djmips1y ago

Doesn't Google have their own internal scanning of books?

ctrlp1y ago

The sooner people learn this lesson, the sooner it might change.

chanux1y ago

Corporations are people. Just a notch above the regular kind.

Izikiel431y ago

So, might makes right, a tale as old as humanity

whatever11y ago

How does that prosecutor sleep at night?

observationist1y ago

This frames Google's indexing of the web in a totally, abjectly wrong fashion. It wasn't "other people's data", it was data people published to the public internet, implicitly and explicitly granting permission to download through the act of serving that data without restriction to whoever navigated to a particular URL.

That's how the internet works. If you want private content, you need to put up a gate mechanism of some sort with authentication or other methods of restricting access. Without that, you are literally having your server "serve" the content to whoever asks for it, without restriction or exception, without ToS or meaningful contract or agreements.

You can't have it both ways. "But they didn't know" or other post-hoc claims of innocent people publishing content to the web being misled or confused or abused is infantilizing nonsense.

The web wouldn't have been as amazing and revolutionary and liberating if the fundamental public and open nature of its systems was private and walled off by default.

Your take on YouTube going viral initially over copyrighted content isn't correct, either - it was ease of use and access. It was fairly popular by the time Google bought it, and once it was reachable and advertised by google itself, it exploded, because by that time, everyone had defaulted to using google for search.

Other people corrected your Spotify take.

The reason they pirated is because it is functionally impossible to gain access to the data in any other way. For consumers, there are lots of old shows, music, and other content that aren't accessible, so they turn to piracy. A vast majority of the time, if content is accessible, people will pay and do the technically legal and "right" thing.

Publishers exploit authors and content creators in the name of "platforming" and "marketing" , effectively doing as little as possible to take 90%+ of the value of a product and providing as little as possible to the producer of content or books or music. They get by on technicalities and have captured the legal arena entirely, with any attempt at reform or revolution meeting a messy death at the hands of lawyers and big money publishers.

Screw those people. They lie, cheat, and steal, and somehow have gotten away with fooling the world into thinking they're the good guys.

Copying bits and bytes is not stealing, and the ones trying to shill that narrative are trying to fool as many people as possible into giving them more money without any return of value in kind. I'd download the hell out of a car. Pirate everything.

larodi1y ago

The most outrageous thing about the whole story is that smart people (like here and not only) knew this all since day one. They been uncovering this the whole time.

And in their face, with all the fierce ignorance, broligarchs deny, evade and totally pretend this never happened. The most non open company of all even went to lengths to accuse others of stealing their IP - not theirs to begin with.

Just think of it - why did all major content platforms closed their APIs the day after GPT-2 got the word going…? Cause they knew all this very well - the content is precious and needed. They been doing it all along. Distilling the essence of world’s writing and digital imagery they had no right to.

We have a saying where I come from - no mercy for the chicken, no laws for the millions. I thought it was a local thing at first, it turned is how the world goes. Nothing new under the sun, indeed.

1 more reply

nostrademons1y ago

A bigger lesson might be "don't get caught until you're big enough to destroy the people suing you."

Napster got shut down for widespread enabling of copyright infringement. So did numerous other filesharing startups, including Travis Kalanick's first startup, Scour. Lots of small startups get put out of business all the time for being sued and not having the money to defend themselves.

Likewise, individuals like Donald Trump or Elon Musk get away with all sorts of illegal shit, because they are big enough to shut down the court systems prosecuting them.

Google's genius was in staying under the radar and aligning their incentives with everyone that might dislike them, until they were big enough that they could simply crush anyone that might dislike them.

illegalmemory1y ago

" If you plug a laptop into a closet at MIT to download some scientific papers you forfeit your life."

This is exactly what I immediately thought while reading the article. It almost feels like the legal system only punishes general public, while most of these guys are above it.

rchaud1y ago

Airbnb and Uber have showed us that laws matter only to the extent that the political will to enforce them exists. Throw enough lawyers and lobbying money at the problem and the laws can simply be re-written to be friendlier to your business model.

6 more replies

veggieroll1y ago

Wilhoit’s law:

> There must be in-groups whom the law protects but does not bind, alongside out-groups whom the law binds but does not protect.

1 more reply

rahton1y ago

The legal system is built to favor large corps and capital owners. See Katharina Pistor books for instance.

1 more reply

jamesbfb1y ago

RIP Aaron

censorfree1y ago

>This is exactly what I immediately thought while reading the article. It almost feels like the legal system only punishes general public, while most of these guys are above it.

Welcome to the modern day aristocracy. Not only what you mentioned, this world is also divided into a group of insider who can get capital from 0 - 2%, while rest of us has a cost of 17%, 22% or 30%?

isaacremuant1y ago

It doesn't "seem". The entire system in most countries works, by design, that way because the people in power trade in influence at a different plane.

That's why democracy often feels "failed" in that no change can be achieved because "it's just more of the same". Few Lobbyists representing the interests of a few people have more power than millions voting differently.

1 more reply

G_o_D1y ago

Money speaks ! Money buys !

threeseed1y ago

> Google itself got big by indexing other people's data without compensation

Wrong.

a) Robots.txt which defines what content you wish to make available to third parties predates every search engine including Google. Web site owners chose to make it available to Google and search engines have respected their wishes despite it not being in their best interest.

b) The difference here is that OpenAI, Meta etc have not even tried to honour the wishes of copyright holders. They just considered everything as theirs.

c) Google grew big because it had no ads, fast interface and PageRank was significantly better. It wasn't because it had the most comprehensive index.

karamanolev1y ago

> Web site owners chose to make it available to Google.

Strong disagree. Since robots.txt is optional and the default is "crawl me as you please", website owners don't "choose to make it available", they just don't choose to make it non-available.

1 more reply

RALaBarge1y ago

To your first point, the op said without compensation, not without permission.

tobyhinloopen1y ago

a) If you don't have a robots.txt, you're indexed by default. It's opt-out, not opt-in. If you do nothing, you're being indexed.

1 more reply

veggieroll1y ago

Robots.txt is irrelevant after hiQ Labs v. LinkedIn (2019)

fredgrott1y ago

point c is wrong...they had ads since the original yahoo contract....

1 more reply

boesboes1y ago

Wrong. Google ignores robots.txt entirely

1 more reply

yoavm1y ago· 15 in thread

We all like hating big corporations, especially Meta, and people seem to use this as an opportunity to advocate for punishing them. I think it's wiser to advocate for changing our IP laws.

_Algernon_1y ago

We're sick of the double standards.

https://en.wikipedia.org/wiki/Aaron_Swartz#United_States_v._...

https://en.wikipedia.org/wiki/Aaron_Swartz#Death

While Aaron Swartz was bullied to suicide, these corporations will walk free and make billions. I say give every tech CEO the Swartz treatment, then change the law.

4 more replies

palata1y ago

You're conflating different problems.

Big corporations are too big, they should just not exist. When you have corporations more powerful than the government of the biggest states, it's a bug, not a feature.

The IP laws may need rethinking. Saying that they should disappear because big corporations are above the law doesn't help, though. First kill the big corporations, then think about fair laws. Changing the law now would not change anything since those corporations are already above the law.

7 more replies

lrvick1y ago

I truly hope Meta has a serious security issue that burns their company to the ground.

That said, I want them to burn for the right reasons.

Downloading data that should be available to the public is not one of them.

lblume1y ago

Exactly. Everyone should have the right to have access to this.

1 more reply

yodsanklai1y ago

Big corporations don't have morale or ethics. They'll break any laws as long as it's profitable. There's no point complaining about Meta or Zuck. Meta does what it's designed to do. If people aren't happy, they should vote for more regulations.

1 more reply

Ekaros1y ago

First punish them. Then change the laws.

DaSHacka1y ago

I bet you and my "first build the product, then worry about security" manager would get along.

2 more replies

blueboo1y ago

We may in retrospect find that the moment may have passed where "big corporations" have become more powerful and impactful on our lives than the IP laws on the books. After all, we can already plainly see they only come into effect when useful by the powerful

aprilthird20211y ago

I think most of the public is probably in favor of stronger IP laws now that big corps are threatening to make them jobless with IP-disrespecting AIs

rchaud1y ago

Something tells me stronger IP laws will be drafted by holders of that IP, with little if any regard to the potential for job losses for regular people from AI.

1 more reply

freeAgent1y ago

The point is about the hypocrisy and double-standards evinced by this behavior.

jillyboel1y ago

First we must prosecute Meta into committing suicide like was done to Aaron Swartz. After justice is served, we should change IP laws.

boesboes1y ago

They broke the law and should be punished for that. Whether the law should change is a separate discussion.

Also, change the law so this is legal for poor meta? smh..

miltonlost1y ago

Big corporations all like hating their consumers abd legal laws. You love committing crimes it seems.

DaSHacka1y ago

I fail to see how you arrived at GP being a hobbyist criminal based on their suggestion that IP laws need to be modernized.

peterbonney1y ago· 7 in thread

The more I learn about how AI companies trained their models, the more obvious it is that the rest of us are just suckers. We're out here assuming that laws matter, that we should never misrepresent or hide what we're doing for our work, that we should honor our own terms of use and the terms of use of other sites/products, that if we register for a website or piece of content we should always use our work email address so that the person or company on the other side of that exchange can make a reasonable decision about whether we can or should have access to it.

What we should have been doing all along is YOLO-ing everything. It's only illegal if you get caught. And if you get big enough before you get caught then the rules never have to apply to you anyway.

Suckers. All of us.

wrs1y ago

And if you were in any doubt before, this lesson is now exemplified by the holder of the highest office in the land and approved by popular vote. The rewards of acting ethically are, unfortunately, sometimes only personal. This must be a hard environment to raise children in, given the examples they see around them.

4 more replies

Barrin921y ago

>What we should have been doing all along is YOLO-ing everything

No it isn't. The actual sucker attitude is copying what they do. You should act morally and with integrity out of respect for yourself. I never had any illusions that large tech companies act with respect towards the law, but it also has nothing to do with me.

afandian1y ago

If you have a spare few hours, the Acquired podcast episode on Meta is enlightening. They just stumbled through growth hack experiment after experiment without seemingly any risk assessment or ethics.

1 more reply

77pt771y ago

> It's only illegal if you get caught

Not quite. It's only illegal if you get caught and you are the wrong kind of person.

For the right kind of person not even a pat on the wrist.

clueless1y ago

yep, pretty much.

callc1y ago

This sort of mindset is devoid of morals and honor. Don’t fall into the this mindset trap.

Like when Trump said he is “smart” for evading taxes during the presidential debates (IIRC the first ones, not recent ones).

It’s absolutely despicable. Have a moral compass. Treat people fairly. Be nice. Let’s be better than toddlers who haven’t learned yet that hitting is bad, and you shouldn’t do it even if mommy and daddy aren’t in the room.

2 more replies

hall0ween1y ago

<Tether's ears burning>

mik19981y ago· 7 in thread

Libgen is a civilizational project that should be endorsed, not prosecuted. I hope one day people will look at it and think how stupid we were today to shun the largest collection of literary works in human history.

greeniskool1y ago

Anna's Archive encourages (and monetizes!!) the use of their shadow library for LLM training. They have a page dedicated to it on their site. You pay them, and they give you high download speeds to entire datasets.

adamsb61y ago

I wonder how much more libgen traffic can be attributed to the lawsuit.

When Metallica sued Napster, for many people the reaction was, "wait I can download music for free?"

luqtas1y ago

Libgen turns into a problem when you have a company developing generative AI with it, either giving money to GPU manufacturers or themselves with paid services (see OpenAI)

2 more replies

rafram1y ago

I think you’re overstating its importance. The internet already makes it possible to order almost any book in existence and have it arrive at your doorstep within a week or so, or often on your ebook reader instantly. And your local library probably participates in an interlibrary loan system that lets you request any book held by any library in the country for free.

LibGen gives you access to a much smaller body of works than either of those. It’s a little more convenient. But the big difference is that it doesn’t compensate the author at all.

Just go to a real library.

intotheabyss1y ago

And what about the other billions of people on the planet that don't even have a library, let alone a doorstep to receive a first world delivery service.

Cyph0n1y ago

1. We are not talking about physical books.

2. DRM is built in to most purchased ebooks, which means you can’t consume the book on any device. “Illegal” tools exist to circumvent this.

3. Large ebook stores - like other digital stores - essentially lend you a copy of the book. So when they are forced to pull a book, they’ll pull your access too.

Of course, now that the big players have consumed/archived the entire book dump, they can go ahead and kill it to prevent others from doing the same thing.

mik19981y ago

No one sells scans of older books, which are often sparsely available in obscure (often private) libraries.

1 more reply

fimdomeio1y ago· 4 in thread

It really makes you think about those crazy internet folks from back in the day who thought copyright law was too strict and that restricting humanity to knowledge in such a way was holding us all back for the benefit of a tiny few.

jeroenhd1y ago

I'm all for chopping up copyright law. But until we do so, companies like Meta need to be treated just like everyone else.

That means lawsuits, prison sentences, and millions in fines. And that's just the piracy part, there's also the lying/fraud part.

Interestingly, a Dutch LLM project was sent a cease and desist after the local copyright lobby caught wind of it being trained on a bunch of pirated eBooks. The case unfortunately wasn't fought out in court, because I would be very interested to see if this could make that copyright lobby take down ChatGPT and the other AI companies for doing the same.

1 more reply

stefan_1y ago

The more concerning thing is that the best thing these overpaid people could come up with was.. download the torrent, like everyone else. Here you are, billions of resources, and no one is willing to spend a part of it to at least digitize some new data? Like even Google did?

1 more reply

fsflover1y ago

> crazy internet folks from back in the day

You mean Electronic Frontier Foundation? https://www.eff.org/issues/innovation

Workaccount21y ago

Probably the single biggest thing I learned growing up is that you can safely live by "Everyone is in it for themselves".

It's incredibly rare to find people who hold ideals that are detrimental to their own life.

3 more replies

Ekaros1y ago· 4 in thread

Considering prices for single work, this must be multi-billion dollar compensation.

Take for example 675k paid for 31 songs. So 20k a song. If we estimate book to be say 10MB that would 8 million works. So I think reasonable compensation is something along 163 billion. Not even 10 years of net income. Which I think is entirely fair punishment.

ricardobeat1y ago

Beyond the absurdity of those amounts, the funny thing is that the authors wouldn’t ever see a dime of that money. Not in the music case, not in this one either. Fairness?

karel-3d1y ago

Meta argues that it's fair use, and that they just downloaded, and never seeded, all the torrents.

4 more replies

pinoy4201y ago

For creating a backup of library genesis. No. They should be awarded a philanthropic prize.

striking1y ago

There's evidence of them seeding back as little as possible. I'm not sure how that's "creating a backup".

2 more replies

postepowanieadm1y ago· 4 in thread

That's horrible! Magnet anyone?

addandsubtract1y ago

Anna's Archive: https://annas-archive.org

immibis1y ago

specifically https://annas-archive.se/torrents - this is a meta-project which aggregates illegal copyrighted material from other illegal projects. You absolutely should not download any material this page links to, although you can use it for the purpose of researching about shadow libraries.

pinoy4201y ago

Library genesis

ykonstant1y ago

Weird shenanigans are happening in libgen at the moment; better go through Anna's Archive to look for the items you want, it will link you to the corresponding mirrors more reliably.

At least this has been the recent experience of a friend who used libgen and anna's archive to download legal, public domain works!

1 more reply

tremarley1y ago· 4 in thread

ebooks are a 1-2 mb each max. 81.7 TB are a lot of books, like 42-85 million books.

weberer1y ago

The article says they got datasets from Anna's Archive. It was most likely the scihub/libgen torrent which is 96.0 TB right now and contains 92,872,581 files. That's about 1 megabyte per file.

https://annas-archive.org/datasets

1 more reply

thunkingdeep1y ago

I’ve got 70-80mb pirated books, I think because of the illustrations. Guess it depends on the book.

mateus11y ago

I don’t think they’re using picture heavy book for LLM training, no?

8 more replies

squigz1y ago

It could be anywhere from a few million to a hundred million

https://annas-archive.org/datasets

JW_000001y ago· 3 in thread

I don't understand why it's even a question that Meta trained their LLM on copyrighted material. They say so in their paper! Quoting from their LLaMMa paper [Touvron et al., 2023]:

> We include two book corpora in our training dataset: the Gutenberg Project, [...], and the Books3 section of ThePile (Gao et al., 2020), a publicly available dataset for training large language models.

Following that reference:

> Books3 is a dataset of books derived from a copy of the contents of the Bibliotik private tracker made available by Shawn Presser (Presser, 2020).

(Presser, 2020) refers to https://twitter.com/theshawwn/status/1320282149329784833. (Which funnily refers to this DMCA policy: https://the-eye.eu/dmca.mp4)

Furthermore, they state they trained on GitHub, web pages, and ArXiv, which are all contain copyrighted content.

Surely the question is: is it legal to train and/or use and/or distribute an AI model (or its weights, or its outputs) that is trained using copyrighted material. That it was trained on copyrighted material is certain.

[Touvron et al., 2023] https://arxiv.org/pdf/2302.13971

[Gao et al., 2020] https://arxiv.org/pdf/2101.00027

gameshot911OP1y ago

Critically, by torrenting they also directly distributed the copywritten material itself. That is a standalone infringement separate from any argument about trained LLMs.

2 more replies

Workaccount21y ago

There are two different things when it comes to discussing training LLM's on "copyright" protected data, and I almost never see people differentiate.

1.) Training on copyright that is publicly available. You write a poem and publish it online for the world to read. That is your IP, no one else can take it an sell it, but they are free to read and be inspired by it. The legalitly of training on this is in the courts, but so far seems to be going in favor of LLMs.

2.) Training on copyright that is not publicly available. These are pretty much pirated works or works obtained by backdoor to avoid paying for them. Your poem is behind a paywall and you never got paid, yet the poem is known by the LLM. This is just straight illegal, as you legally must pay to view the work. However there might be conditions here too like paying for access to an archive and then training on everything in it.

5 more replies

unraveller1y ago

Trained on doesn't mean significant inclusion in the final state.

Is it truly a violation of copyright when a user hacks out bits and pieces of easily restyled raw data points from a model to look samey? what about if it takes two models? Might be time to accept humans are just cooked in their ability to discern attempts at direct plagiarism - just as it is hard to discern Sky voice from Her voice.

peterclary1y ago· 3 in thread

I strongly urge people to read Thomas Babington Macaulay's speeches on copyright, its aims, terms, and hazards. Very well reasoned and explained.

In particular, people often cited the case of authors who had died leaving a family in destitution, and claimed that copyright extension would be a fair way of preventing this, but in most cases the remaining family had never held the copyright; the author had initally sold the reproduction rights to a publisher who had then sat on the work without publishing it. The author, driven into penury, was then induced to sell the copyright to the publisher outright for a pittance. So in such cases a copyright extension only benefited the publisher, and indeed increased their incentive to extort the copyright.

kshri241y ago

> Thomas Babington Macaulay

The one who got Hindu Sanskrit books translated in a horrible manner and then claimed: "I have no knowledge of either Sanskrit or Arabic. But I have done what I could to form a correct estimate of their value. I have read translations of the most celebrated Arabic and Sanskrit works. I have conversed both here and at home with men distinguished by their proficiency in the Eastern tongues. I am quite ready to take the Oriental learning at the valuation of the Orientalists themselves. I have never found one among them who could deny that a single shelf of a good European library was worth the whole native literature of India and Arabia."

This chap will educate us on copyright?

No thanks!

3 more replies

bbor1y ago

I’m a huge IP hater and am sure that happens, but to be fair, letting copyright extend past death also increases the amount the author can sell it for in the first place.

1 more reply

golergka1y ago

> in most cases the remaining family had never held the copyright; the author had initally sold the reproduction rights to a publisher

He was able to sell it because it is something valuable, exactly because of the copyright protections. Regardless of whether author sells the rights or not, he and his family would equally be better off with copyright.

1 more reply

nyoomboom1y ago· 3 in thread

Remembering Aaron Swartz in this moment

stingraycharles1y ago

Which was arguably more innocent — scientific papers.

piyuv1y ago

Meta is not “innocent”, and comparing this instance with Swartz is a huge offense to his legacy.

2 more replies

qup1y ago

Would Aaron have preferred us to download the material and train the AI?

RobotToaster1y ago· 3 in thread

Before I decided my opinion on this I need to know their ratio.

adamsocrat1y ago

Article states: Meta also allegedly modified settings "so that the smallest amount of seeding possible could occur"

malfist1y ago

Big tech taking and not giving back, where have I heard this before?

MaKey1y ago

Damn leechers!

1 more reply

lrvick1y ago· 3 in thread

This should be legal. Copyright law does more harm than good.

The only ethical problem here is that only Meta sized companies can afford to pay the "damages" for such blatant law violations at worst, or the fees of their lawyers at best.

maronato1y ago

Copyright law does more harm than good to individuals who just want to learn and enjoy content without profiting from it.

Companies like Meta and OpenAI, however, should definitely have to pay to use the hard work of humans to train their AI.

pleeb1y ago

If an individual was the one tormenting almost 82 TB of copyrighted books, the damages they would have to pay would be in the trillions (mostly because of how broken the copyright law system is)

moffkalast1y ago

If only these corporations with vested interests in permissive copyright would put their money where their mouth is with lobbying for a change. Or is that only allowed when they're trying to do something scummy? I forget.

Havoc1y ago· 3 in thread

Really curious what the judges are going to do here.

Horse has functionally bolted on this already

I’m guessing slap on wrist despite courts going after individual for a couple of movies torrented pretty hard

aprilthird20211y ago

Is there any other possible outcome than a fine? That too one which will not really affect Meta's overall earnings

Havoc1y ago

Ideally we have a conversation about how we as society have ended up in a situations where we have a two tier justice system.

At a minimum the starting point of discussion here should be that if life ruining $80,000 per item is an acceptable fine for individuals then why is it not the same for corporations. Which would probably get you a number in the trillions at which point we could have a discussion about reforming this entire system.

But yes realistically slap on wrist is what is going to happen here.

empath751y ago

The reality of the situation is that the economic value and utility of AI is going to cause the laws to be restructured around them.

ksynwa1y ago· 3 in thread

A good chance for federal prosectutors to "send a message" as they did with Aaron Swartz but I don't see things going that way.

acomjean1y ago

If you were wondering why meta was making a lot of donations to the new government (including settling a lawsuit for 25 million with the New president, 1 million to the inauguration)…. I suspect there will be no federeal charges.

The rules have always seemed different for corporations regardless.

https://www.businessinsider.com/trump-settles-lawsuit-meta-m...

Nasrudith1y ago

Well of course, bullies always prefer targets that can't fight back. That itself is unfortunately a basis of the legal system from it being run on flawed monkey brains. Why else is hitting vulnerable children okay but getting into a consensual bar fight illegal?

courseofaction1y ago

Even after JSTOR declined to press charges in that case. Despicable. The US has dug the hole it's going down.

mnsu1y ago· 3 in thread

So according to some AI, the damages awarded per infringed work is ~$750 minimum in the US. 80TB of books, each let's say 10MB on average, would be 8 million works. So Meta should pay 6 billion USD for their copyright infringement?

gorbachev1y ago

Minimum doesn't cover willful copyright infractions, for which maximum penalty is $150K per work. That comes out to quite a different number.

oersted1y ago

Nice calculation, that’s actually quite doable for them, they have already been paying similar fines for a while.

timeon1y ago

Prosecutors filed for Swartz 50 years of imprisonment and $1 million in fines.

Can you calculate how many years that would be for Mark and his people?

1 more reply

nprateem1y ago· 3 in thread

If you're an author with a book likely to have be hoovered up, I wonder what you'd get from the fb models if you asked "complete this in the style of [author] in [book]: [quite a long excerpt]"

If you get a direct quote then you're good with your claim, surely.

Nemo_bis1y ago

That's the NYT's case. Not necessarily very strong. https://www.techdirt.com/2024/03/05/openais-motion-to-dismis...

unraveller1y ago

The way it works counts if you bring prompting into it. It could easily have learned enough style chops of [author] from other sources to mimic/predict those stanzas from raw data points.

Whatever the ruling one thing is for sure, plagiarism is no longer the sincerest form of flattery. The human authors are out for AI blood on this.

aprilthird20211y ago

I believe that is part of this lawsuit pretty much

passwordoops1y ago· 3 in thread

Eye for an eye. Meta losses rights to 81.7 TB of IP. Transcribed into a text file

cma1y ago

Meta already does that to themselves every year or so, deleting all internal communications.

They've thrown away a huge amount of communication to source code commit reinforcement training data as a result. They do it to avoid emails making it into trials like this.

zaik1y ago

No large company will ever consider training a public LLM on all their internal communications.

1 more reply

yodsanklai1y ago

> Meta already does that to themselves every year or so, deleting all internal communications.

Aren't they obligated by law to keep all internal communication?

3 more replies

palata1y ago· 3 in thread

Good, we know it. Nothing will happen, because nothing happens to billionaires and their companies. Musk is proving it every day now.

jokethrowaway1y ago

This is why we need to abolish the government. If the government doesn't have any power, they can't do preferential treatment to their cronies.

Enough with laws for thee but not for me!

ArnoVW1y ago

I was having difficulty figuring out if this was parody or not. But I guess the username checks out.

palata1y ago

The problem is precisely that those billionaires are too powerful. If anything, we need to abolish the billionaires.

bmsleight_1y ago· 2 in thread

So if I torrented and seeded, I would be doing it for my own entertainment, not commercially. I expect big copy-write holders to come after myself. If Meta does it - I guess they have better lawyers ?

Could make interesting case law.

unification_fan1y ago

> Could make interesting case law.

Yeah, to perpetuate this system where only those who can afford lawyers get to benefit

echoangle1y ago

Since it’s case law, everyone would benefit from the precedent

2 more replies

wnevets1y ago· 2 in thread

My ISP will shut off my internet if it catches me torrenting copyrighted material but if you're a massive corporation that steals TBs of data its barely a blip in the news.

freeAgent1y ago

Wouldn't it be amazing if all of Meta's ISPs cut them off for torrenting? One can dream...

gkbrk1y ago

You should look into changing your ISP, or at least get a VPN.

bigmattystyles1y ago· 2 in thread

The question is, if they could and would have paid for each book, would it be ok to train the LLM on them? I'm talking about prior books, I'm sure new books have language forbidding their use to train LLMs at the point of sale. But legally, how does using a book to train a LLM differ from a teacher learning from a book and teaching its contents to their pupils. Obviously, the LLM can do so at scale, but is there a legal difference?

dragonwriter1y ago

> The question is, if they could and would have paid for each book, would it be ok to train the LLM on them?

Whether training on AI model on an array of diffentent works, many of which are copyright protected, is itself a copyright violation, in addition to or distinct from any copyright violation that goes on gathering the dataset for training (and separate from any copyright violation in the actual or intended use of the LLM), remains to be resolved as a legal question, and may or may not have a simple yes or no answer (or the same answer under every system of copyright laws globally).

My inclination is that it is probably generally not a violation in US law, but that's not something I am very confident in; how the definitions of copy and derivative work apply to determine if it would be without fair use, and how fair use analysis applies, are not clear from the available precedent.

> But legally, how does using a book to train a LLM differ from a teacher learning from a book and teaching its contents to their pupils.

It is very clear, by looking at how US copyright law is written and even more clear in its history of application, that information stored in brains of people are without exception neither copies nor new works that can be derivative works under US law, and so cannot be infringing, no matter how you gain them. It’s also very clear in the statute itself and the case law that data in media used by artificial digital computers, on the other hand, can constitute copies or derivative works that can be infringing. Even if the process is arguably similar in legally relevant manners, copyright law is critically focussed on the result and whether it is a particular kind of thing which can be infringing, not just the process.

1 more reply

CryptoBanker1y ago

A LLM is not a person. That is the legal difference...until we have Citizens United v2

perihelions1y ago· 2 in thread

Best way to "punish" Meta is to slash the Gordian knot and abolish copyright. Level the playing field, incrementally, for everyone else who isn't a trillion-dollar corporation.

The alternative is a futile legalistic attack against a monopoly entity too powerful to be meaningfully punished. That won't accomplish anything useful. It would, rather, help cement this status quo, where copyright infringement is selectively legal or illegal, for different entities at the same time; and companies like Meta thrive arbitraging that difference. You can't defeat Meta—but you can help dig them a moat.

miltonlost1y ago

Ridding copyright would level the playing field for individuals and companies????!!!! Getting rid of laws that protect the individual only will help the larger empowered businesses.

1 more reply

nkrisc1y ago

What's the alternative to copyright then? Anything I create will be instantly reproduced and sold for less than I can afford to by some entity far larger and more efficient than me.

> Level the playing field, incrementally, for everyone else who isn't a trillion-dollar corporation.

There is no level playing field when you have individuals and trillion-dollar companies in the same market.

1 more reply

rvz1y ago· 2 in thread

Maybe you should go after the worst offender (OpenAI) first before going after Meta, since the latter already gave back their model away for free for everyone and the architecture.

We will know why OpenAI isn't getting investigated.

hruzgar1y ago

So true. It seems like there is a controlled operation to shut open models down starting with Meta. Obviously they can't go after deepseek atm

unraveller1y ago

Could be why OpenAI paid them so much, to go after their open-source competition hardest of all.

abigail951y ago· 2 in thread

This reminds me of Peter Sunde's "komimashin"

https://www.engadget.com/2015-12-21-peter-sunde-kopimashin.h...

It's obviously absurd to enforce copyright as bytes are copied around instead of as it is used. Training an LLM is a different thing than re-hosting and giving away copies to other people.

If you don't want people to transform your works - keep them private. You don't own ideas.

golly_ned1y ago

As the article says, Meta /was/ giving away copies to other people by seeding the libgen torrents. This isn't the usual case of "should companies be allowed to train on books".

1 more reply

henriquemaia1y ago

Thanks for the link. I wondered what that word meant.

From the article: Kopimashin, as in Copy Machine.

iimaginary1y ago· 2 in thread

We need better laws that would create a better way to do this legally whilst compensating rights holders.

miltonlost1y ago

We need better justice system that enforces the laws we have in the books that would help compensate right owners when big companies in emails pirate terabytes of data.

SketchySeaBeast1y ago

I really don't think that Meta did this because the alternative would have been too onerous; they are a huge org, they could work through whatever loopholes required. They did it because it would have cost money and there will be no penalty for not paying.

1 more reply

gameshot911OP1y ago· 1 in thread

Beyond illegal downloading and distribution of copyrighted content, the article also describes how Meta staff seemingly lied about it in depositions (including, potentially, Mark Zuckerberg himself).

malfist1y ago

Huh, a big tech CEO lied to us?

Flippant response I know, but too many people worship at the alter of the job creater and believe these folks are moral upstanding citizens

zackmorris1y ago· 1 in thread

Is there a concept in the legal system of first-come-first-served that could be used as precedent?

What I mean is: when someone is prosecuted for copyright infringement, but Meta isn't, then could the case be put on hold until Meta is found guilty and pays a fine?

Also maybe the fine on the later case would have to be proportional to the prior case. So if Meta pays $1 per infringement, the penalty might be $1 for torrenting something else (which is immaterial and not worth the justice system's time) so pretty much all copyright infringement cases would get thrown out.

It reminds me of how mainstream drug addicts get convicted and spend years in prison, while celebrities get off with a warning or monetary fine.

hnfong1y ago

Lawyers (and hence, judges) are really good at arguing why the earlier case does not apply in a present case, even if most reasonable people would think the two cases are essentially the same.

It's a fundamental part of lawyer training, and if they want to let BigCorp go and bring the hammer down on the little guy, they can make up a hundred reasons for it.

liendolucas1y ago· 1 in thread

For some misterious reason I can't see Zuckerberg in front of a judge facing 50 years imprisonment. Anyone can?

I truly hope that whoever takes the case goes after Meta with 1000 times the pressure that was put on Swartz, but honestly I don't expect much just as the top comment precisly expressed.

And if we are going to be fair please also let's not forget about the other usual suspects, or anyone thinks they are falling behind?

impossiblefork1y ago

There are other countries than the US though and if rightsholders wish to sue, lawsuits can happen there too.

Several EU countries, Switzerland, South Korea, Japan, etc. are viable countries to sue from. Even in Japan which has a law specifically permitting training on copyrighted material you must still obtain it legally-- i.e. you must license it.

1 more reply

woadwarrior011y ago· 1 in thread

I wonder what happened to the related OpenAI training GPT3 on the books3 dataset story[1] from ~2 years ago?

[1]: https://www.wired.com/story/battle-over-books3/

gundmc1y ago

I think this one is different because the legality of training on copyrighted material is an open legal question while distributing/seeding copyrighted material is decidedly illegal.

openplatypus1y ago· 1 in thread

Something tells me uncle Donald will exonerate his new favourite lapdog from any criminal or civil liability.

Terr_1y ago

IANAL but the pardon power (A) only extends to criminal punishments, not civil liabilities and (B) copyright lawsuits can be launched by anybody, not just the Department of Justice.

So, barring further Might Makes Right shit--which I'm not willing to fully rule out--Trump can't fully shield Zuckerberg et al.

2 more replies

sva_1y ago· 1 in thread

> By September 2023, Bashlykov had seemingly dropped the emojis, consulting the legal team directly and emphasizing in an email that "using torrents would entail ‘seeding’ the files—i.e., sharing the content outside, this could be legally not OK."

I'm pretty sure you can theoretically download torrents without seeding, although this is frowned upon. If they really seeded (with full bandwidth?) that's indeed pretty brazen.

It is sort of strange that Meta is being singled out here though, and sort of sad considering they at least release the model weights. What's the signal? Do illegal shit to be competitive, but make sure there is no evidence?

voidUpdate1y ago

You can, in transmission for example you can just set the seed percentage to 0%. I recognise that this makes me a bad torrenter, but I've been told in the past that my ISP wont be too happy about me seeding, and they already do something screwy to torrents I access through the surface web, so I'm just playing it safe

1 more reply

9999000009991y ago· 1 in thread

"Say they hood robin, ain't that a b*, take from the poor and give to the rich."

- Ice Cube.

Meta will face no consequences. Say your a small publisher and you'd like a bit of compensation. If you dare sue Meta can just blacklist your books on its platforms. Even if they don't, you probably don't have the money to sue one of the biggest companies on earth.

I think copyrights should be limited to 25 years after first publication. This would fix plenty of issues and give the AIs of the world plenty to learn from.

Who am I kidding, Meta will take what they will. For that author making 20k a year, be honored to be of use to Meta.

bwfan1231y ago

can people vote with their feet, and leave the platform ?

but the masses are addicted to the slop that meta feeds them.

seydor1y ago· 1 in thread

We have at least 4 types of ill-defined concepts of property in the 21st century , largely due to our laziness, intellectual inertia and lack of motivation to make forward-thinking definitions for the coming age of AI and ubiquitous access to all information and all communication.

1) the concept of copyright is as old as the word suggests (copies are the least of our worries going forward - it should be possible to define processes for exploitation of ideas in a fair way)

2) we allow humans to learn from other people's ideas and transform them to commercial products and the same should happen for AIs in the future

3) we have an ill-defined concept of "personally identifying information" which gives people ownership to information that others have created via their own means - there should be better ways to ensure a level of privacy (but not absolute privacy) without overly-broad, nonsensical definitions of what is personally protected information

4) We allow social media and other telecommunications media to arbitrarily censor people's speech without recourse. This turns people's speech to property of the social media companies and imposes absolute power on it. This makes zero sense and is abusive towards the public at large. We need legal protections of speech in all media, not just state-owned media.

thfuran1y ago

>we have an ill-defined concept of "personally identifying information" which gives people ownership to information that others have created via their own means - there should be better ways to ensure a level of privacy (but not absolute privacy) without overly-broad, nonsensical definitions of what is personally protected information

What information about me could a corporation create via its own means that would be legally protected but shouldn't be? PII is generally information that a corporation collects. Unless you mean that my cellphone provider creates the association between my name and phone number and should therefore be able to do with it as they please?

1 more reply

z71y ago· 1 in thread

How about a consequentialist argument? In some fields, AI has already surpassed physicians in diagnosing illnesses. If breaking copyright laws allows AI to access and learn from a broader range of data, it could lead to earlier and more accurate diagnoses, saving lives. In this case, the ethical imperative to preserve human life outweighs the rigid enforcement of copyright laws.

KolmogorovComp1y ago

There’s nothing particular to AI about your comment, it’s a general downside of IP.

1 more reply

StefanBatory1y ago· 1 in thread

I as a individual would be liable to pay ~1000$ of damages if I'd downloaded a movie in Germany or Poland and the publisher would get to me.

I'm going to assume as it's a corporation, then the laws no longer apply.

Anamon1y ago

That's okay, they should just charge The Zuck with it personally; I'd be fine with that.

ezekiel681y ago· 1 in thread

Unless Meta 'fessed up to this (which seems unlikely), the headline here is missing the word "allegedly".

papercrane1y ago

Meta admitted to the torrenting more than a month ago. The reason this is in the news is because some of the emails discussing it have been unsealed.

panki271y ago

They could have at the very least seeded some more, to give something back to the, uh, community.

belter1y ago

"Supposedly, Meta tried to conceal the seeding by not using Facebook servers while downloading the dataset to "avoid" the "risk" of anyone "tracing back the seeder/downloader" from Facebook servers, an internal message from Meta researcher Frank Zhang said, while describing the work as in "stealth mode." Meta also allegedly modified settings "so that the smallest amount of seeding possible could occur," a Meta executive in charge of project management, Michael Clark, said in a deposition..."

They will be getting a lot of Frommer Legal letters...

HPsquared1y ago

If you owe the bank $1,000 it's your problem; if you owe the bank $1,000,000,000 it's the bank's problem.

651y ago

I'm more interested in piracy not being highly prosecuted than I am in Meta getting punished for this. I'm not trying to spend 20 years in jail for pirating a TV show.

fsflover1y ago

Support EFF if you think that the copyright laws should be changed and also applied equally to all: https://www.eff.org/issues/innovation

jokethrowaway1y ago

Great, can we get the full Kim Dotcom treatment for Zuckenberg now?

I'm also ok with abolishing copyright all together if he's too untouchable

kelseyfrog1y ago

The usual copyright cartel is up in arms, crying theft. But here’s the truth: intellectual property is a state-enforced monopoly, not real property.

Property is based on scarcity - if you take my car, I no longer have a car. But if you copy my book, I still have my book. No loss, no theft, just an outdated legal fiction designed to stifle innovation and enrich rent-seeking middlemen. An no, loss of potential sales doesn't count - it's like being able to claim a lottery ticket has real value.

Copyright was never about protecting creators—it’s about locking down ideas, preventing competition, and extracting endless fees. Shakespeare borrowed, tech companies iterate, and science thrives on free exchange. The idea that knowledge should be locked away indefinitely is absurd.

Meta’s mistake wasn’t using the data - it was pretending copyright still matters. AI is exposing the system for what it is: obsolete. The future belongs to those who create without asking permission.

caterwhal1y ago

Really strange how much torrenting is demonized by all of these companies and ISPs when individuals want to use it but when a company like Meta uses it there is so little scrutiny.

ofou1y ago

Who would have known that BitTorrent, shadow libraries, and seeders will help to train the best AI models out there, that adds a whole new meaning to a "seed".

gorbachev1y ago

Previous: https://news.ycombinator.com/item?id=42673628

aucisson_masque1y ago

You wouldn't download a car.

nickpsecurity1y ago

That they’d focus on file sharing over transformation or outputs is exactly the risk I warned the companies about in my AI report. Most datasets, like RefinedWeb and The Pile, also require sharing copyrighted workers between people who are not licensed to do that. Many works also prohibit commercial use or have patents on them.

They need to make datasets which don’t have this problem or have entities in Singapore train the foundation models within their rules. The latter has a TDM exemption that would let AI’s use much of the Internet, maybe GPL code, licensed/purchased works they digitize, etc. Very flexible.

nullfield1y ago

I think everyone can see that whatever

(imo not in accordance with the Constitution, after absurdities like deciding “limited time” the way mathematicians might define something of some order of infinity)

the alleged social contract was is not functional the way it was intended, and we see who benefits and who loses.

mass dynamic editing for vitriol and profanity occurred while writing this comment in order to remain within site rules

stevage1y ago

Wow, I'm actually a bit shocked that senior levels of management at Meta were fine with torrenting pirated books. WTaF.

Meta does a lot of stuff I disagree with, but they're usually not just straight breaking the law.

scotty791y ago

Seeding it was probably most societally useful thing Meta ever did.

yalogin1y ago

LLMs are worse than search for figuring out what value a specific asset provides to the LLM. Atleast with search your work or page is not lost and still gets a click/user interaction, and may be give you a chance to monetize the interaction. However, LLMs just don’t have any such option. Gemini adds links but the links they add are completely editorialized by the LLM and need not reflect the original at all. So how does anyone ask for compensation even if they sue?

pjfin1231y ago

Copyright law needs major reform. We need to figure out a way to let authors monetize their work while not making complying with the law so burdensome. We've created a system where people who (understandably) ignore the law benefit at the expense of people trying to do the right thing.

ngneer1y ago

Sounds just like how Facebook got started, harvesting photos without permission. From the Wikipedia article, the Facebook precursor was known as Facemash. On Zuckerberg, "He hacked into the online intranets of Harvard Houses to obtain photos, developing algorithms and codes along the way. He referred to his hacking as "child's play.""

If I were younger, I would be livid.

toss11y ago

>>"vastly smaller acts of data piracy—just .008 percent of the amount of copyrighted works Meta pirated—have resulted in Judges referring the conduct to the US Attorneys’ office for criminal investigation.".....While Meta may be confident in its legal strategy despite the new torrenting wrinkle...

Zuckerberg has paid the vig several times [0,1,2], which is evidently the best legal strategy under this administration. OFC, considering there are already multiple payments, there is no assurance the vig payments won't substantially increase as the Capo sees more opportunity for profit.

[0] https://en.wikipedia.org/wiki/Vigorish

[1] https://www.politico.com/news/2025/01/29/meta-settles-trump-...

[2] https://www.bbc.com/news/articles/c8j9e1x9z2xo

buyucu1y ago

I love this. Large corpos should torrent more. Maybe we'll get better copyright law as a result.

thunder-blue-31y ago

You know the wierd thing is - I've never used Meta AI. I've never thought of using it. The only product of FB i use is whatsapp, however I've not seen/heard any of my friends using Meta AI for FB,IG,Whatsapp. I really don't understand what their ROI here is...

asjir1y ago

I thought about it for a full day, and I have one idea for how to handle copyrighted data training. It would need to be open / regulated and training till double descent would need to be disallowed, to make sure that the model is not memorizing the data.

kpgraham1y ago

Damn! One of my old books can be found in the Anna's Archive search. The book has been out of print for years. I pity the Meta users who get results based on something that I wrote. (Check Anna's for 'Keith P. Graham', and the first book listed is mine.)

srameshc1y ago

At OpenAI we have seen some employees expressed their concern publicy about the moral grounds on which company was acting. We never heard about it from anyone at Meta but there were some jokes ofcourse. I guess everything is fair in AI and Corporates.

api1y ago

One of the largest businesses of the Internet to date has been piracy. Individual informal piracy has been the smallest component of this. By far the largest has been corporate mass-scale piracy, and LLMs are probably the largest heist to date. They've literally downloaded the sum total of all human thought and knowledge, compressed it into queryable lossy compression models (which is what LLMs are), and are selling it back to us.

Meta, with its "open weights" models, is one of the least guilty parties, since at least they've made the resulting blobs of mass piracy available to us. Same with Mistral, Deepseek, etc.

ClosedAI, Google, and others have all probably done this and more and refuse to make even the model available.

I think the way to deal with this is very simple:

If you have trained your model on works to which you do not have rights or permission, the resulting model is not copyrightable and cannot be sold. It must either be kept for research purposes only or released free of charge and in the public domain. All these models that have been trained on pirated works should become public domain.

Of course now that we have full capture of the US Federal Government I'm sure any suggestion like that would be neutralized with one bribe to Trump.

flojo1y ago

Did they at least seed back?

lvl1551y ago

I’d think people can get together to put this on a public space strictly for training purposes and have the consortium of some sort get paid per use.

But we live in this stupid society where you have to move mountains to change things an inch.

Der_Einzige1y ago

The only bad thing about this is that small time players who do it are treated poorly (Aaron Swartz). IP de-facto not existing for AI companies is a feature, not a bug.

The fact that most of the world embraced hardcore copyright troll ludditism when the means of their (badly paying creative) jobs economic production was democratized implies that most people do not believe in any "egalitarianism" and especially not the left-wing form many profess to believe in. Certainly not "information wants to be free" or any of the other idealist shit that I or Aaron Swartz believed in. What meta did was software communism - full stop. They literally released their models to the public! I support all of this 10000%. The only issue is that they're not open enough (fully open source the dataset)

So, unironically, good! Thank you, please pirate more! Please destroy the US IP system while you're at it. Copyright abolitionism is good and thank you Zuckerberg!

pilimi_anna1y ago

We're grateful to Meta for helping seed and backup our torrents. The more copies the better. Thank you Meta, for helping preserve humanity's legacy! :)

djyaz12001y ago

“Behind every great fortune lies a great crime” -Honoré de Balzac

antirez1y ago

Copy-right is not learn/train-right. That said Meta full its mouth with open source while they release models that are not SOTA nor usable for commercial purposes.

black_puppydog1y ago

Wouldn't it be a real shame if the entirety of US constitution, laws, and legal precedence went out the window these days, and the only thing left unscathed was the rotten mess that is copyright law? Just saying, this might be the moment to burn it to the ground. Not that it makes up for any of the other stuff going on, but why waste a perfectly good crisis?

maxwell1y ago

I'm sure they'll throw the book at them.

cratermoon1y ago

We're starting to find out that Meta ruined LibGen for the rest of use who used it like a library. Just like how Google screwed over libraries by sending interns to the Stanford library to checkout books they scanned into Google Books. Not to increase shared knowledge or preserve human artificats, but to put them all in a museum and, to paraphrase Joni Mitchell, charge the people a dollar and a half just to see 'em.

esarbe1y ago

It's okay - they are multi-billion company. Rules don't apply to them.

Rules are just for us peasants.

dansitu1y ago

I'm fine with them using my books to train an open source model, but it would have been nice to be asked.

1 more reply

lewdev1y ago

It's okay when large corporations download cars. But when you do it, you'll be in trouble.

breppp1y ago

Yes it smells bad but facebook did the right thing (at least for facebook)

After OpenAI trained their models on the famed books2 dataset, and seeing the technological implications of ChatGPT, there was a good chance they would let them get away with it.

Would the USA really surrender its AI technological advantage for trivial matters like copyright? They would make some royalty arrangement and get it over with

mrinterweb1y ago

Remember people getting sued insane amounts of money per-song they torrented. If we applied that precedent to Meta, Meta would need to declare bankruptcy. https://www.cbsnews.com/news/file-sharing-mom-fined-19-milli...

ofslidingfeet1y ago

Yeah well, OpenAI compressed the whole internet into proprietary weights and is now providing access via paid subscription while the original internet gets deleted from our culture.

josefritzishere1y ago

Zuckerberg did more copyright infringement? Shocking!

losvedir1y ago

Hooray! Or wait, are we not doing that anymore?

waltercool1y ago

Based. Free knowledge to the people

zelphirkalt1y ago

Come on publishers! This is your chance! Now you can really show, how you will treat all copyright infringements equally and not only go after easy target. Show us, how you spend all that money in a lawsuit against Meta!

bloopbloopscoop1y ago

Death to intellectual property!

Refusing231y ago

their whole business is stealing data..

so its quite funny to see they freely share it too.

ocean_moist1y ago

At least they seeded!

snapcaster1y ago

The powerful do what they can, the weak suffer what they must

jfbaro1y ago

They are getting shittier and shittier

reverendsteveii1y ago

So they're gonna go through every book that was stolen and apply the appropriate penalty, right? Each copyrighted work has a minimum penalty of $750 under the DMCA. That will be applied fairly in order to ensure that the rights holder is made whole by the infringer, right?

It's so funny to see the law blatantly ignored by the overlords. Like, there isn't even a pretext anymore. They just steal what they want and budget for the fines and campaign donations to make the consequences go away.

uncomplexity_1y ago

did they not seed enough, is that the crime? lol

Pxtl1y ago

Laws are for poor people.

TZubiri1y ago

I love it. This plotline feels out of cryptonomicon or silicon valley series.

hackerbeat1y ago

One of the many reasons why Zuck’s been sucking up to Trump. He’s in desperate need of some Get-Out-Of-Jail-Free cards.

Same for all the other sleazy tech bros.

lazycog5121y ago

abolish knowledge rentiers

imgabe1y ago

Boo hoo.

We are trying to advance civilization here. To accumulate and make available all human knowledge to date. And you stand there with your hand out to stop this? You are a villain. There is no sympathy for you.

swozey1y ago

I deleted my facebook account about 10 years ago. Downloaded data, deleted. Not deactivated.

Nothing in my life made me ever want to go back except for when I got back into playing hockey, and all the hockey leagues use facebook to communicate a few months ago.

I made a new account, had to literally upload a picture of my face to pass verification.. and then a few days later I was immediately banned and couldn't use my account. I assume because they searched previous data and compared my face to find out I have a "deleted" (lol) account and matched me. I've assumed they'll only let me log in if i use my original 10 years ago deleted account.

Fuck meta. Fuck zuck.

1970-01-011y ago

And they're going to get away with it simply because if you or I openly did this the DMCA fines would be for a million trillion dollars. Since Meta shareholders can't stomach a million trillion dollars in fines, their lawyers will wave their magic wands and poof! No laws were broken!

elzbardico1y ago

Nothing is gonna happen. Just a slap on the hand. And we all from the intelectual work class, writers, journalists, programmers will be proletarized by LLMs that have been:

a) Financed via inflation/"cantillon effect" due to ZRP/Stimulus that absolutely flooded the market with funny money in the hand of the sharks. b) Trained upon copyrighted work without compensation. c) Trained upon open source without even asking politely for authorization.

The Robber Barons from the last century can't even get close to our modern Feudal Tech Lords.

Unless you're one of us that have amassed multi-generation wealth in a exit in the last 20 years, you're completely fucked.

j / k navigate · click thread line to collapse

938 comments

275 comments · 96 top-level

gizmo1y ago· 76 in thread

lolinder1y ago

post-it1y ago

Lucasoato1y ago

> If you plug a laptop into a closet at MIT to download some scientific papers you forfeit your life.

3 more replies

ysofunny1y ago

the english empire once tried to mantain a monopoly over steam loom machines

the americans cheated their way to competition,

heck, even before that, the english empire got jumpstarted by stealing gold from the spanish (who were themselves exploiting it away from aztec and other mexican natives)

choult1y ago

Hollywood became popular for filmmaking because they were literally the opposite side of the country from Thomas Edison and his patents...

3 more replies

miltonlost1y ago

People criming in the past is not an excuse for companies committing crimes today. You’re excusing lawlessness.

Cain killed Abel and got away with it!! I can kill someone today too!!!

3 more replies

nottorp1y ago

Interesting, if we're to trust what NotOpenAI and Facebook say about their IP, the US should pay the UK reparations for IP theft based on textile industry profits starting in the 1850s until today?

portaouflop1y ago

Why do I get sued when I share some BitTorrents but $bigcorp can just do it with 1000 scale without problems?

1 more reply

sebzim45001y ago

I don't think I've heard the term "English empire". Is it an attempt by the Scottish to pretend they weren't involved?

3 more replies

pockmarked191y ago

> Spotify's music library was also pirated in the early days.

3 more replies

marcosdumay1y ago

> If you plug a laptop into a closet at MIT to download some scientific papers you forfeit your life.

Just to point, but the material in question was public domain, so nobody had even a copyrights claim over it.

1 more reply

Cthulhu_1y ago

Crunchyroll started off as a straight up piracy site, it now has millions of paying subscribers and was sold to Sony for over a billion a few years ago.

gnfargbl1y ago

I think if Google attempted to download the entirety of JSTOR with the express intent of making the full dataset freely available, then Google would also face legal consequences.

It's true, and relevant, that Google would feel those consequences much less sharply than Swartz did.

vintermann1y ago

1 more reply

josefx1y ago

Google book search was declared fair use and copyright holders ended up having to explicitly request removal of their works.

Apparently he would have gotten away with downloading the JSTOR database if he made it clear that he intended to only publish half of each paper.

coliveira1y ago

Yes, these companies are based on massive IP and copyright theft. And they still want to lecture others about their "property rights".

immibis1y ago

oblio1y ago

> Something to understand about capitalist competition (also in politics) is that it's a war.

Turns out there are 2 simultaneous wars there. One where companies and individuals compete ruthlessly.

And another one where if non profit associations of individuals form, guns come out.

soheil1y ago

Aaron committed suicide and FBI going after him was meant more as a lesson to the other kids at MIT than anything.

yowzadave1y ago

> Youtube was initially a ghost town (it started as a dating site) and it only got traction once people started uploading copyrighted TV shows to it

belter1y ago

"Zuckerberg was at White House for meetings on Thursday" - https://www.reuters.com/world/us/zuckerberg-was-white-house-...

Wowfunhappy1y ago

> Based on the encyclopedic knowledge LLMs have of written works I assume all parties did the same.

I don't understand why you wouldn't just buy copies of the books. Seems like such a relatively inexpensive way to strengthen your legal case.

freeone30001y ago

Buying a copy of the book doesn’t grant you the right to copy it. That is what copyright is for.

2 more replies

londons_explore1y ago

Pretty sure that even if you gave a purchasing team enough money for retail price and a list of all books ever published, they wouldn't be able to buy even a quarter of them.

1 more reply

jokethrowaway1y ago

Buying the books won't automatically give you permission to use the content commercially

gosub1001y ago

1 more reply

cess111y ago

Too much paperwork, too much effort. These are important people, doing much more important stuff than whatever book authors do.

Or so they think, I think.

1 more reply

plasticbugs1y ago

electriclove1y ago

Some can pirate on a large scale and see no repercussions.

Some can steal from stores and see no repercussions.

Some can steal from others and see no repercussions.

Some can violently harm others and see no repercussions.

Some can damage property and see no repercussions.

Some can’t. This world is not right.

1 more reply

1vuio0pswjnm71y ago

"Spotify's music library was also pirated in the early days."

https://qz.com/1683609/how-the-music-industry-shifted-from-n...

sylario1y ago

And Hollywood was created on the west coast because for intellectual property it was still the far west and it allowed them to ignore patents on movie technologies.

1 more reply

cess111y ago

BrenBarn1y ago

1 more reply

wcfrobert1y ago

2 more replies

vkou1y ago

> Google itself got big by indexing other people's data without compensation.

So in other words, it got big by providing free user traffic to people's websites without asking for compensation?

You generally don't charge the phone book money to include you in it. It's actually the other way around.

sandeepkd1y ago

yurlungur1y ago

1 more reply

dcchambers1y ago

I guess the solution is to create a shell company for your illegal activities?

georgemcbay1y ago

1 more reply

Cumpiler691y ago

You must be new to billionaire business practices: break the rules first, ask for forgiveness later.

By the time the cheque comes, your illicit venture either went bust or you built a bilion dollar empire capable of buying the best lawyers and lobbying to walk away clean.

sneak1y ago

> If you plug a laptop into a closet at MIT to download some scientific papers you forfeit your life.

I’m opposed to copyright and pro-aaronsw, but the state did not kill him.

1 more reply

modzu1y ago

i know of a company that poisoned an entire town! thats terrorism if done by an individual. the company still exists, just paid a settlement and carried on...

3 more replies

pbh1011y ago

> Google itself got big by indexing other people's data without compensation

Weird framing given how much value was and is still placed on Google driving traffic to you

mrkeen1y ago

For Google's case the order was reversed.

Google used to send customers to your site. Now they try to show you the information on their site so that the customer doesn't need to go to your site.

1 more reply

joshstrange1y ago

Even before the LLM-craze Google was showing their Answers box or whatever it was called at the top of the results that told you the answer (sometimes) so that you didn’t have to visit any website.

1 more reply

newsclues1y ago

Comprehensive intellectual property needs to happen for the modern (digital) era.

Basically the entire legal system needs to be retooled and rethought for computers.

actionfromafar1y ago

Looks like the entire legal system is being retooled at the moment.

threeseed1y ago

No we just need to enforce the existing laws.

And the legal system is for humans not computers.

2 more replies

yard20101y ago

RIP Aaron Swartz

soheil1y ago

So be a company? Last I checked it costs a couple of hundred dollars to form an LLC, what am I missing?

cyanydeez1y ago

Mmm, the broader point is: laws are are as real as the cash you can pay a lawyer to fight.

smugma1y ago

Spotify was born as a response to piracy. Why do you say their catalog was pirated?

mrtesthah1y ago

Don’t forget the original developers of Skype also created Kazaa first.

djmips1y ago

Doesn't Google have their own internal scanning of books?

ctrlp1y ago

The sooner people learn this lesson, the sooner it might change.

chanux1y ago

Corporations are people. Just a notch above the regular kind.

Izikiel431y ago

So, might makes right, a tale as old as humanity

whatever11y ago

How does that prosecutor sleep at night?

observationist1y ago

You can't have it both ways. "But they didn't know" or other post-hoc claims of innocent people publishing content to the web being misled or confused or abused is infantilizing nonsense.

The web wouldn't have been as amazing and revolutionary and liberating if the fundamental public and open nature of its systems was private and walled off by default.

Other people corrected your Spotify take.

Screw those people. They lie, cheat, and steal, and somehow have gotten away with fooling the world into thinking they're the good guys.

larodi1y ago

The most outrageous thing about the whole story is that smart people (like here and not only) knew this all since day one. They been uncovering this the whole time.

We have a saying where I come from - no mercy for the chicken, no laws for the millions. I thought it was a local thing at first, it turned is how the world goes. Nothing new under the sun, indeed.

1 more reply

nostrademons1y ago

A bigger lesson might be "don't get caught until you're big enough to destroy the people suing you."

Likewise, individuals like Donald Trump or Elon Musk get away with all sorts of illegal shit, because they are big enough to shut down the court systems prosecuting them.

illegalmemory1y ago

" If you plug a laptop into a closet at MIT to download some scientific papers you forfeit your life."

This is exactly what I immediately thought while reading the article. It almost feels like the legal system only punishes general public, while most of these guys are above it.

rchaud1y ago

6 more replies

veggieroll1y ago

Wilhoit’s law:

> There must be in-groups whom the law protects but does not bind, alongside out-groups whom the law binds but does not protect.

1 more reply

rahton1y ago

The legal system is built to favor large corps and capital owners. See Katharina Pistor books for instance.

1 more reply

jamesbfb1y ago

RIP Aaron

censorfree1y ago

>This is exactly what I immediately thought while reading the article. It almost feels like the legal system only punishes general public, while most of these guys are above it.

Welcome to the modern day aristocracy. Not only what you mentioned, this world is also divided into a group of insider who can get capital from 0 - 2%, while rest of us has a cost of 17%, 22% or 30%?

isaacremuant1y ago

It doesn't "seem". The entire system in most countries works, by design, that way because the people in power trade in influence at a different plane.

1 more reply

G_o_D1y ago

Money speaks ! Money buys !

threeseed1y ago

> Google itself got big by indexing other people's data without compensation

Wrong.

b) The difference here is that OpenAI, Meta etc have not even tried to honour the wishes of copyright holders. They just considered everything as theirs.

c) Google grew big because it had no ads, fast interface and PageRank was significantly better. It wasn't because it had the most comprehensive index.

karamanolev1y ago

> Web site owners chose to make it available to Google.

Strong disagree. Since robots.txt is optional and the default is "crawl me as you please", website owners don't "choose to make it available", they just don't choose to make it non-available.

1 more reply

RALaBarge1y ago

To your first point, the op said without compensation, not without permission.

tobyhinloopen1y ago

a) If you don't have a robots.txt, you're indexed by default. It's opt-out, not opt-in. If you do nothing, you're being indexed.

1 more reply

veggieroll1y ago

Robots.txt is irrelevant after hiQ Labs v. LinkedIn (2019)

fredgrott1y ago

point c is wrong...they had ads since the original yahoo contract....

1 more reply

boesboes1y ago

Wrong. Google ignores robots.txt entirely

1 more reply

yoavm1y ago· 15 in thread

We all like hating big corporations, especially Meta, and people seem to use this as an opportunity to advocate for punishing them. I think it's wiser to advocate for changing our IP laws.

_Algernon_1y ago

We're sick of the double standards.

https://en.wikipedia.org/wiki/Aaron_Swartz#United_States_v._...

https://en.wikipedia.org/wiki/Aaron_Swartz#Death

While Aaron Swartz was bullied to suicide, these corporations will walk free and make billions. I say give every tech CEO the Swartz treatment, then change the law.

4 more replies

palata1y ago

You're conflating different problems.

Big corporations are too big, they should just not exist. When you have corporations more powerful than the government of the biggest states, it's a bug, not a feature.

7 more replies

lrvick1y ago

I truly hope Meta has a serious security issue that burns their company to the ground.

That said, I want them to burn for the right reasons.

Downloading data that should be available to the public is not one of them.

lblume1y ago

Exactly. Everyone should have the right to have access to this.

1 more reply

yodsanklai1y ago

1 more reply

Ekaros1y ago

First punish them. Then change the laws.

DaSHacka1y ago

I bet you and my "first build the product, then worry about security" manager would get along.

2 more replies

blueboo1y ago

aprilthird20211y ago

I think most of the public is probably in favor of stronger IP laws now that big corps are threatening to make them jobless with IP-disrespecting AIs

rchaud1y ago

Something tells me stronger IP laws will be drafted by holders of that IP, with little if any regard to the potential for job losses for regular people from AI.

1 more reply

freeAgent1y ago

The point is about the hypocrisy and double-standards evinced by this behavior.

jillyboel1y ago

First we must prosecute Meta into committing suicide like was done to Aaron Swartz. After justice is served, we should change IP laws.

boesboes1y ago

They broke the law and should be punished for that. Whether the law should change is a separate discussion.

Also, change the law so this is legal for poor meta? smh..

miltonlost1y ago

Big corporations all like hating their consumers abd legal laws. You love committing crimes it seems.

DaSHacka1y ago

I fail to see how you arrived at GP being a hobbyist criminal based on their suggestion that IP laws need to be modernized.

peterbonney1y ago· 7 in thread

What we should have been doing all along is YOLO-ing everything. It's only illegal if you get caught. And if you get big enough before you get caught then the rules never have to apply to you anyway.

Suckers. All of us.

wrs1y ago

4 more replies

Barrin921y ago

>What we should have been doing all along is YOLO-ing everything

afandian1y ago

1 more reply

77pt771y ago

> It's only illegal if you get caught

Not quite. It's only illegal if you get caught and you are the wrong kind of person.

For the right kind of person not even a pat on the wrist.

clueless1y ago

yep, pretty much.

callc1y ago

This sort of mindset is devoid of morals and honor. Don’t fall into the this mindset trap.

Like when Trump said he is “smart” for evading taxes during the presidential debates (IIRC the first ones, not recent ones).

2 more replies

hall0ween1y ago

<Tether's ears burning>

mik19981y ago· 7 in thread

greeniskool1y ago

adamsb61y ago

I wonder how much more libgen traffic can be attributed to the lawsuit.

When Metallica sued Napster, for many people the reaction was, "wait I can download music for free?"

luqtas1y ago

Libgen turns into a problem when you have a company developing generative AI with it, either giving money to GPU manufacturers or themselves with paid services (see OpenAI)

2 more replies

rafram1y ago

LibGen gives you access to a much smaller body of works than either of those. It’s a little more convenient. But the big difference is that it doesn’t compensate the author at all.

Just go to a real library.

intotheabyss1y ago

And what about the other billions of people on the planet that don't even have a library, let alone a doorstep to receive a first world delivery service.

Cyph0n1y ago

1. We are not talking about physical books.

2. DRM is built in to most purchased ebooks, which means you can’t consume the book on any device. “Illegal” tools exist to circumvent this.

3. Large ebook stores - like other digital stores - essentially lend you a copy of the book. So when they are forced to pull a book, they’ll pull your access too.

Of course, now that the big players have consumed/archived the entire book dump, they can go ahead and kill it to prevent others from doing the same thing.

mik19981y ago

No one sells scans of older books, which are often sparsely available in obscure (often private) libraries.

1 more reply

fimdomeio1y ago· 4 in thread

jeroenhd1y ago

I'm all for chopping up copyright law. But until we do so, companies like Meta need to be treated just like everyone else.

That means lawsuits, prison sentences, and millions in fines. And that's just the piracy part, there's also the lying/fraud part.

1 more reply

stefan_1y ago

1 more reply

fsflover1y ago

> crazy internet folks from back in the day

You mean Electronic Frontier Foundation? https://www.eff.org/issues/innovation

Workaccount21y ago

Probably the single biggest thing I learned growing up is that you can safely live by "Everyone is in it for themselves".

It's incredibly rare to find people who hold ideals that are detrimental to their own life.

3 more replies

Ekaros1y ago· 4 in thread

Considering prices for single work, this must be multi-billion dollar compensation.

ricardobeat1y ago

Beyond the absurdity of those amounts, the funny thing is that the authors wouldn’t ever see a dime of that money. Not in the music case, not in this one either. Fairness?

karel-3d1y ago

Meta argues that it's fair use, and that they just downloaded, and never seeded, all the torrents.

4 more replies

pinoy4201y ago

For creating a backup of library genesis. No. They should be awarded a philanthropic prize.

striking1y ago

There's evidence of them seeding back as little as possible. I'm not sure how that's "creating a backup".

2 more replies

postepowanieadm1y ago· 4 in thread

That's horrible! Magnet anyone?

addandsubtract1y ago

Anna's Archive: https://annas-archive.org

immibis1y ago

pinoy4201y ago

Library genesis

ykonstant1y ago

Weird shenanigans are happening in libgen at the moment; better go through Anna's Archive to look for the items you want, it will link you to the corresponding mirrors more reliably.

At least this has been the recent experience of a friend who used libgen and anna's archive to download legal, public domain works!

1 more reply

tremarley1y ago· 4 in thread

ebooks are a 1-2 mb each max. 81.7 TB are a lot of books, like 42-85 million books.

weberer1y ago

The article says they got datasets from Anna's Archive. It was most likely the scihub/libgen torrent which is 96.0 TB right now and contains 92,872,581 files. That's about 1 megabyte per file.

https://annas-archive.org/datasets

1 more reply

thunkingdeep1y ago

I’ve got 70-80mb pirated books, I think because of the illustrations. Guess it depends on the book.

mateus11y ago

I don’t think they’re using picture heavy book for LLM training, no?

8 more replies

squigz1y ago

It could be anywhere from a few million to a hundred million

https://annas-archive.org/datasets

JW_000001y ago· 3 in thread

I don't understand why it's even a question that Meta trained their LLM on copyrighted material. They say so in their paper! Quoting from their LLaMMa paper [Touvron et al., 2023]:

Following that reference:

> Books3 is a dataset of books derived from a copy of the contents of the Bibliotik private tracker made available by Shawn Presser (Presser, 2020).

(Presser, 2020) refers to https://twitter.com/theshawwn/status/1320282149329784833. (Which funnily refers to this DMCA policy: https://the-eye.eu/dmca.mp4)

Furthermore, they state they trained on GitHub, web pages, and ArXiv, which are all contain copyrighted content.

[Touvron et al., 2023] https://arxiv.org/pdf/2302.13971

[Gao et al., 2020] https://arxiv.org/pdf/2101.00027

gameshot911OP1y ago

Critically, by torrenting they also directly distributed the copywritten material itself. That is a standalone infringement separate from any argument about trained LLMs.

2 more replies

Workaccount21y ago

There are two different things when it comes to discussing training LLM's on "copyright" protected data, and I almost never see people differentiate.

5 more replies

unraveller1y ago

Trained on doesn't mean significant inclusion in the final state.

peterclary1y ago· 3 in thread

I strongly urge people to read Thomas Babington Macaulay's speeches on copyright, its aims, terms, and hazards. Very well reasoned and explained.

kshri241y ago

> Thomas Babington Macaulay

This chap will educate us on copyright?

No thanks!

3 more replies

bbor1y ago

I’m a huge IP hater and am sure that happens, but to be fair, letting copyright extend past death also increases the amount the author can sell it for in the first place.

1 more reply

golergka1y ago

> in most cases the remaining family had never held the copyright; the author had initally sold the reproduction rights to a publisher

1 more reply

nyoomboom1y ago· 3 in thread

Remembering Aaron Swartz in this moment

stingraycharles1y ago

Which was arguably more innocent — scientific papers.

piyuv1y ago

Meta is not “innocent”, and comparing this instance with Swartz is a huge offense to his legacy.

2 more replies

qup1y ago

Would Aaron have preferred us to download the material and train the AI?

RobotToaster1y ago· 3 in thread

Before I decided my opinion on this I need to know their ratio.

adamsocrat1y ago

Article states: Meta also allegedly modified settings "so that the smallest amount of seeding possible could occur"

malfist1y ago

Big tech taking and not giving back, where have I heard this before?

MaKey1y ago

Damn leechers!

1 more reply

lrvick1y ago· 3 in thread

This should be legal. Copyright law does more harm than good.

The only ethical problem here is that only Meta sized companies can afford to pay the "damages" for such blatant law violations at worst, or the fees of their lawyers at best.

maronato1y ago

Copyright law does more harm than good to individuals who just want to learn and enjoy content without profiting from it.

Companies like Meta and OpenAI, however, should definitely have to pay to use the hard work of humans to train their AI.

pleeb1y ago

If an individual was the one tormenting almost 82 TB of copyrighted books, the damages they would have to pay would be in the trillions (mostly because of how broken the copyright law system is)

moffkalast1y ago

Havoc1y ago· 3 in thread

Really curious what the judges are going to do here.

Horse has functionally bolted on this already

I’m guessing slap on wrist despite courts going after individual for a couple of movies torrented pretty hard

aprilthird20211y ago

Is there any other possible outcome than a fine? That too one which will not really affect Meta's overall earnings

Havoc1y ago

Ideally we have a conversation about how we as society have ended up in a situations where we have a two tier justice system.

But yes realistically slap on wrist is what is going to happen here.

empath751y ago

The reality of the situation is that the economic value and utility of AI is going to cause the laws to be restructured around them.

ksynwa1y ago· 3 in thread

A good chance for federal prosectutors to "send a message" as they did with Aaron Swartz but I don't see things going that way.

acomjean1y ago

The rules have always seemed different for corporations regardless.

https://www.businessinsider.com/trump-settles-lawsuit-meta-m...

Nasrudith1y ago

courseofaction1y ago

Even after JSTOR declined to press charges in that case. Despicable. The US has dug the hole it's going down.

mnsu1y ago· 3 in thread

gorbachev1y ago

Minimum doesn't cover willful copyright infractions, for which maximum penalty is $150K per work. That comes out to quite a different number.

oersted1y ago

Nice calculation, that’s actually quite doable for them, they have already been paying similar fines for a while.

timeon1y ago

Prosecutors filed for Swartz 50 years of imprisonment and $1 million in fines.

Can you calculate how many years that would be for Mark and his people?

1 more reply

nprateem1y ago· 3 in thread

If you're an author with a book likely to have be hoovered up, I wonder what you'd get from the fb models if you asked "complete this in the style of [author] in [book]: [quite a long excerpt]"

If you get a direct quote then you're good with your claim, surely.

Nemo_bis1y ago

That's the NYT's case. Not necessarily very strong. https://www.techdirt.com/2024/03/05/openais-motion-to-dismis...

unraveller1y ago

The way it works counts if you bring prompting into it. It could easily have learned enough style chops of [author] from other sources to mimic/predict those stanzas from raw data points.

Whatever the ruling one thing is for sure, plagiarism is no longer the sincerest form of flattery. The human authors are out for AI blood on this.

aprilthird20211y ago

I believe that is part of this lawsuit pretty much

passwordoops1y ago· 3 in thread

Eye for an eye. Meta losses rights to 81.7 TB of IP. Transcribed into a text file

cma1y ago

Meta already does that to themselves every year or so, deleting all internal communications.

They've thrown away a huge amount of communication to source code commit reinforcement training data as a result. They do it to avoid emails making it into trials like this.

zaik1y ago

No large company will ever consider training a public LLM on all their internal communications.

1 more reply

yodsanklai1y ago

> Meta already does that to themselves every year or so, deleting all internal communications.

Aren't they obligated by law to keep all internal communication?

3 more replies

palata1y ago· 3 in thread

Good, we know it. Nothing will happen, because nothing happens to billionaires and their companies. Musk is proving it every day now.

jokethrowaway1y ago

This is why we need to abolish the government. If the government doesn't have any power, they can't do preferential treatment to their cronies.

Enough with laws for thee but not for me!

ArnoVW1y ago

I was having difficulty figuring out if this was parody or not. But I guess the username checks out.

palata1y ago

The problem is precisely that those billionaires are too powerful. If anything, we need to abolish the billionaires.

bmsleight_1y ago· 2 in thread

Could make interesting case law.

unification_fan1y ago

> Could make interesting case law.

Yeah, to perpetuate this system where only those who can afford lawyers get to benefit

echoangle1y ago

Since it’s case law, everyone would benefit from the precedent

2 more replies

wnevets1y ago· 2 in thread

My ISP will shut off my internet if it catches me torrenting copyrighted material but if you're a massive corporation that steals TBs of data its barely a blip in the news.

freeAgent1y ago

Wouldn't it be amazing if all of Meta's ISPs cut them off for torrenting? One can dream...

gkbrk1y ago

You should look into changing your ISP, or at least get a VPN.

bigmattystyles1y ago· 2 in thread

dragonwriter1y ago

> The question is, if they could and would have paid for each book, would it be ok to train the LLM on them?

> But legally, how does using a book to train a LLM differ from a teacher learning from a book and teaching its contents to their pupils.

1 more reply

CryptoBanker1y ago

A LLM is not a person. That is the legal difference...until we have Citizens United v2

perihelions1y ago· 2 in thread

Best way to "punish" Meta is to slash the Gordian knot and abolish copyright. Level the playing field, incrementally, for everyone else who isn't a trillion-dollar corporation.

miltonlost1y ago

Ridding copyright would level the playing field for individuals and companies????!!!! Getting rid of laws that protect the individual only will help the larger empowered businesses.

1 more reply

nkrisc1y ago

What's the alternative to copyright then? Anything I create will be instantly reproduced and sold for less than I can afford to by some entity far larger and more efficient than me.

> Level the playing field, incrementally, for everyone else who isn't a trillion-dollar corporation.

There is no level playing field when you have individuals and trillion-dollar companies in the same market.

1 more reply

rvz1y ago· 2 in thread

Maybe you should go after the worst offender (OpenAI) first before going after Meta, since the latter already gave back their model away for free for everyone and the architecture.

We will know why OpenAI isn't getting investigated.

hruzgar1y ago

So true. It seems like there is a controlled operation to shut open models down starting with Meta. Obviously they can't go after deepseek atm

unraveller1y ago

Could be why OpenAI paid them so much, to go after their open-source competition hardest of all.

abigail951y ago· 2 in thread

This reminds me of Peter Sunde's "komimashin"

https://www.engadget.com/2015-12-21-peter-sunde-kopimashin.h...

It's obviously absurd to enforce copyright as bytes are copied around instead of as it is used. Training an LLM is a different thing than re-hosting and giving away copies to other people.

If you don't want people to transform your works - keep them private. You don't own ideas.

golly_ned1y ago

As the article says, Meta /was/ giving away copies to other people by seeding the libgen torrents. This isn't the usual case of "should companies be allowed to train on books".

1 more reply

henriquemaia1y ago

Thanks for the link. I wondered what that word meant.

From the article: Kopimashin, as in Copy Machine.

iimaginary1y ago· 2 in thread

We need better laws that would create a better way to do this legally whilst compensating rights holders.

miltonlost1y ago

We need better justice system that enforces the laws we have in the books that would help compensate right owners when big companies in emails pirate terabytes of data.

SketchySeaBeast1y ago

1 more reply

gameshot911OP1y ago· 1 in thread

Beyond illegal downloading and distribution of copyrighted content, the article also describes how Meta staff seemingly lied about it in depositions (including, potentially, Mark Zuckerberg himself).

malfist1y ago

Huh, a big tech CEO lied to us?

Flippant response I know, but too many people worship at the alter of the job creater and believe these folks are moral upstanding citizens

zackmorris1y ago· 1 in thread

Is there a concept in the legal system of first-come-first-served that could be used as precedent?

What I mean is: when someone is prosecuted for copyright infringement, but Meta isn't, then could the case be put on hold until Meta is found guilty and pays a fine?

It reminds me of how mainstream drug addicts get convicted and spend years in prison, while celebrities get off with a warning or monetary fine.

hnfong1y ago

Lawyers (and hence, judges) are really good at arguing why the earlier case does not apply in a present case, even if most reasonable people would think the two cases are essentially the same.

It's a fundamental part of lawyer training, and if they want to let BigCorp go and bring the hammer down on the little guy, they can make up a hundred reasons for it.

liendolucas1y ago· 1 in thread

For some misterious reason I can't see Zuckerberg in front of a judge facing 50 years imprisonment. Anyone can?

I truly hope that whoever takes the case goes after Meta with 1000 times the pressure that was put on Swartz, but honestly I don't expect much just as the top comment precisly expressed.

And if we are going to be fair please also let's not forget about the other usual suspects, or anyone thinks they are falling behind?

impossiblefork1y ago

There are other countries than the US though and if rightsholders wish to sue, lawsuits can happen there too.

1 more reply

woadwarrior011y ago· 1 in thread

I wonder what happened to the related OpenAI training GPT3 on the books3 dataset story[1] from ~2 years ago?

[1]: https://www.wired.com/story/battle-over-books3/

gundmc1y ago

I think this one is different because the legality of training on copyrighted material is an open legal question while distributing/seeding copyrighted material is decidedly illegal.

openplatypus1y ago· 1 in thread

Something tells me uncle Donald will exonerate his new favourite lapdog from any criminal or civil liability.

Terr_1y ago

IANAL but the pardon power (A) only extends to criminal punishments, not civil liabilities and (B) copyright lawsuits can be launched by anybody, not just the Department of Justice.

So, barring further Might Makes Right shit--which I'm not willing to fully rule out--Trump can't fully shield Zuckerberg et al.

2 more replies

sva_1y ago· 1 in thread

I'm pretty sure you can theoretically download torrents without seeding, although this is frowned upon. If they really seeded (with full bandwidth?) that's indeed pretty brazen.

voidUpdate1y ago

1 more reply

9999000009991y ago· 1 in thread

"Say they hood robin, ain't that a b*, take from the poor and give to the rich."

- Ice Cube.

I think copyrights should be limited to 25 years after first publication. This would fix plenty of issues and give the AIs of the world plenty to learn from.

Who am I kidding, Meta will take what they will. For that author making 20k a year, be honored to be of use to Meta.

bwfan1231y ago

can people vote with their feet, and leave the platform ?

but the masses are addicted to the slop that meta feeds them.

seydor1y ago· 1 in thread

1) the concept of copyright is as old as the word suggests (copies are the least of our worries going forward - it should be possible to define processes for exploitation of ideas in a fair way)

2) we allow humans to learn from other people's ideas and transform them to commercial products and the same should happen for AIs in the future

thfuran1y ago

1 more reply

z71y ago· 1 in thread

KolmogorovComp1y ago

There’s nothing particular to AI about your comment, it’s a general downside of IP.

1 more reply

StefanBatory1y ago· 1 in thread

I as a individual would be liable to pay ~1000$ of damages if I'd downloaded a movie in Germany or Poland and the publisher would get to me.

I'm going to assume as it's a corporation, then the laws no longer apply.

Anamon1y ago

That's okay, they should just charge The Zuck with it personally; I'd be fine with that.

ezekiel681y ago· 1 in thread

Unless Meta 'fessed up to this (which seems unlikely), the headline here is missing the word "allegedly".

papercrane1y ago

Meta admitted to the torrenting more than a month ago. The reason this is in the news is because some of the emails discussing it have been unsealed.

panki271y ago

They could have at the very least seeded some more, to give something back to the, uh, community.

belter1y ago

They will be getting a lot of Frommer Legal letters...

HPsquared1y ago

If you owe the bank $1,000 it's your problem; if you owe the bank $1,000,000,000 it's the bank's problem.

651y ago

I'm more interested in piracy not being highly prosecuted than I am in Meta getting punished for this. I'm not trying to spend 20 years in jail for pirating a TV show.

fsflover1y ago

Support EFF if you think that the copyright laws should be changed and also applied equally to all: https://www.eff.org/issues/innovation

jokethrowaway1y ago

Great, can we get the full Kim Dotcom treatment for Zuckenberg now?

I'm also ok with abolishing copyright all together if he's too untouchable

kelseyfrog1y ago

The usual copyright cartel is up in arms, crying theft. But here’s the truth: intellectual property is a state-enforced monopoly, not real property.

caterwhal1y ago

Really strange how much torrenting is demonized by all of these companies and ISPs when individuals want to use it but when a company like Meta uses it there is so little scrutiny.

ofou1y ago

Who would have known that BitTorrent, shadow libraries, and seeders will help to train the best AI models out there, that adds a whole new meaning to a "seed".

gorbachev1y ago

Previous: https://news.ycombinator.com/item?id=42673628

aucisson_masque1y ago

You wouldn't download a car.

nickpsecurity1y ago

nullfield1y ago

I think everyone can see that whatever

(imo not in accordance with the Constitution, after absurdities like deciding “limited time” the way mathematicians might define something of some order of infinity)

the alleged social contract was is not functional the way it was intended, and we see who benefits and who loses.

mass dynamic editing for vitriol and profanity occurred while writing this comment in order to remain within site rules

stevage1y ago

Wow, I'm actually a bit shocked that senior levels of management at Meta were fine with torrenting pirated books. WTaF.

Meta does a lot of stuff I disagree with, but they're usually not just straight breaking the law.

scotty791y ago

Seeding it was probably most societally useful thing Meta ever did.

yalogin1y ago

pjfin1231y ago

ngneer1y ago

If I were younger, I would be livid.

toss11y ago

[0] https://en.wikipedia.org/wiki/Vigorish

[1] https://www.politico.com/news/2025/01/29/meta-settles-trump-...

[2] https://www.bbc.com/news/articles/c8j9e1x9z2xo

buyucu1y ago

I love this. Large corpos should torrent more. Maybe we'll get better copyright law as a result.

thunder-blue-31y ago

asjir1y ago

kpgraham1y ago

srameshc1y ago

api1y ago

Meta, with its "open weights" models, is one of the least guilty parties, since at least they've made the resulting blobs of mass piracy available to us. Same with Mistral, Deepseek, etc.

ClosedAI, Google, and others have all probably done this and more and refuse to make even the model available.

I think the way to deal with this is very simple:

Of course now that we have full capture of the US Federal Government I'm sure any suggestion like that would be neutralized with one bribe to Trump.

flojo1y ago

Did they at least seed back?

lvl1551y ago

I’d think people can get together to put this on a public space strictly for training purposes and have the consortium of some sort get paid per use.

But we live in this stupid society where you have to move mountains to change things an inch.

Der_Einzige1y ago

The only bad thing about this is that small time players who do it are treated poorly (Aaron Swartz). IP de-facto not existing for AI companies is a feature, not a bug.

So, unironically, good! Thank you, please pirate more! Please destroy the US IP system while you're at it. Copyright abolitionism is good and thank you Zuckerberg!

pilimi_anna1y ago

We're grateful to Meta for helping seed and backup our torrents. The more copies the better. Thank you Meta, for helping preserve humanity's legacy! :)

djyaz12001y ago

“Behind every great fortune lies a great crime” -Honoré de Balzac

antirez1y ago

Copy-right is not learn/train-right. That said Meta full its mouth with open source while they release models that are not SOTA nor usable for commercial purposes.

black_puppydog1y ago

maxwell1y ago

I'm sure they'll throw the book at them.

cratermoon1y ago

esarbe1y ago

It's okay - they are multi-billion company. Rules don't apply to them.

Rules are just for us peasants.

dansitu1y ago

I'm fine with them using my books to train an open source model, but it would have been nice to be asked.

1 more reply

lewdev1y ago

It's okay when large corporations download cars. But when you do it, you'll be in trouble.

breppp1y ago

Yes it smells bad but facebook did the right thing (at least for facebook)

After OpenAI trained their models on the famed books2 dataset, and seeing the technological implications of ChatGPT, there was a good chance they would let them get away with it.

Would the USA really surrender its AI technological advantage for trivial matters like copyright? They would make some royalty arrangement and get it over with

mrinterweb1y ago

ofslidingfeet1y ago

Yeah well, OpenAI compressed the whole internet into proprietary weights and is now providing access via paid subscription while the original internet gets deleted from our culture.

josefritzishere1y ago

Zuckerberg did more copyright infringement? Shocking!

losvedir1y ago

Hooray! Or wait, are we not doing that anymore?

waltercool1y ago

Based. Free knowledge to the people

zelphirkalt1y ago

bloopbloopscoop1y ago

Death to intellectual property!

Refusing231y ago

their whole business is stealing data..

so its quite funny to see they freely share it too.

ocean_moist1y ago

At least they seeded!

snapcaster1y ago

The powerful do what they can, the weak suffer what they must

jfbaro1y ago

They are getting shittier and shittier

reverendsteveii1y ago

uncomplexity_1y ago

did they not seed enough, is that the crime? lol

Pxtl1y ago

Laws are for poor people.

TZubiri1y ago

I love it. This plotline feels out of cryptonomicon or silicon valley series.

hackerbeat1y ago

One of the many reasons why Zuck’s been sucking up to Trump. He’s in desperate need of some Get-Out-Of-Jail-Free cards.

Same for all the other sleazy tech bros.

lazycog5121y ago

abolish knowledge rentiers

imgabe1y ago

Boo hoo.

swozey1y ago

I deleted my facebook account about 10 years ago. Downloaded data, deleted. Not deactivated.

Nothing in my life made me ever want to go back except for when I got back into playing hockey, and all the hockey leagues use facebook to communicate a few months ago.

Fuck meta. Fuck zuck.

1970-01-011y ago

elzbardico1y ago

Nothing is gonna happen. Just a slap on the hand. And we all from the intelectual work class, writers, journalists, programmers will be proletarized by LLMs that have been:

The Robber Barons from the last century can't even get close to our modern Feudal Tech Lords.

Unless you're one of us that have amassed multi-generation wealth in a exit in the last 20 years, you're completely fucked.

j / k navigate · click thread line to collapse