Companies aggressively protect their own intellectual property but have no qualms about violating the IP rights of others. Companies. Individuals have no such privilege. If you plug a laptop into a closet at MIT to download some scientific papers you forfeit your life.
All the sad poor people who might be hurt were already paid. The caterer on your favorite show is not getting residuals. NBC also isn't going to stop making TV shows because that is all they can do. Content creators also existed on the internet long before that was a job. They just did it because they cared about it not for ad money. If you really want to support the artist directly go to a concert or just mail them a check. If you can't actually identify a person who might be hurt, then do not care.
Oh no, that TV show I'll forget about in a year cost me $15/mo instead of $60 of blurays.
I jump in my cars and hit a button and music plays. Almost any music I want. That's amazing.
I'm also not pirating games. I'm not 12 without a job. I have a job. I pay developers for their work. I want more games, like Kingdom Come 3, to come out.
Weird ass comment. You seriously think we're going to put our lives on hold to.. what, fight "digital media"? You think I care about netflix? Or societies use of it? I haven't used netflix in years. I don't know anybody under 40 with a netflix account. Everyone on your end of the pirate spectrum uses debrid nowadays, anyway.
Next you're going to tell people to install the "Black XP Windows" edition to not support Microsoft and they all get malware and their credit cards stolen because they installed some pirated and modified cracked windows. Genius.
MSNBC just cancelled Andrea Mitchells TV show, today, because she brought in no younger audiences. So yes, shows do get cancelled by not being watched.
This comment was upvoted? Hn needs a break. This is some I'm 14 and edgy bullshit that sounds like it belongs on an eastern european piracy forum.
I'm pretty much at the point now where I don't buy the "copyright incentivizes creation" argument any more. Copyright, like advertising, incentivizes creation by enormous corporations, but also like advertising it incentivizes creations that overwhelmingly have little value.
Creative individuals don't need copyright to be incentivized to create—they need a safety net that gives them the freedom to spend time on the creativity that naturally wants to bubble out. If the goal is to encourage creativity, copyright is a lousy and enormously expensive substitute for Universal Basic Income.
This is what Meta tried to do, quietly download and use the data, to do research and advance their LLMs, without trying to establish any legal precedents or pick up fights.
In case anybody here doesn't know, that's a reference to Aaron Swartz, an activist (and Reddit co-founder) that was risking 35 years in prison and a $1 million fine just for downloading a lot of academic papers from JSTOR. He eventually took his life because of the pressure. May his soul rest in peace.
End users, not YouTube employees, right? And they would take things down following DMCA requests and what not, right? So, pretty much following the law?
> Google itself got big by indexing other people's data without compensation
Scraping public websites to build a search index isn't the same as making LLMs that can recreate the source verbatim devoid of even attribution. I do agree there's an argument to be had about the LLM's transformative nature in the end though.
> Spotify's music library was also pirated in the early days
Not any version generally available to the public, and with the copyright holder's permission to do so.
the americans cheated their way to competition,
heck, even before that, the english empire got jumpstarted by stealing gold from the spanish (who were themselves exploiting it away from aztec and other mexican natives)
I'm saying it's business as usual, but also, culture doesn't work like tangible physical widgets so we must stop letting a few steal this boon of digital copying by means of silly ideas like DRM, copyright, patents. all means to cause scarcity
"Brno’s fortunes were changed forever when a young freemason called Franz Hugo Salma set out for England in 1801. He intended to steal the plans for the most modern textile machinery in the world. His crime, the first recorded act of industrial espionage, boosted the competitiveness of Moravian textiles. Soon after smuggling the plans out disguised as a worker, and handing them over to Brno’s fledgling textile industry, Brno became the most important textile centre in the Habsburg empire."
You can even go see some of the original plans in a museum:
"Eleven designs are still preserved in the library of the Rájec chateau. They form a unique set of documents demonstrating both the level of wool processing technology at the turn of the late 18th and early 19th centuries, as well as the aims and means of the relatively rare business of industrial espionage at that time."
https://www.gotobrno.cz/en/brno-phenomenon/this-is-brno-kate... https://www.gotobrno.cz/en/place/salm-reifferscheidt-palace/
Cain killed Abel and got away with it!! I can kill someone today too!!!
The issue here is not copyright/patents/etc - the issue is that the law is applied selectively — the issue is that Aaron Schwartz is dead for sharing knowledge with the public and Zuccborg is a billionaire building his torment nexus
The copyright holders then approved their concept, and subsequently Spotify got the rights to offer their service to customers. Everybody won.
I want to know more, please enlighten me (anyone who knows). I read the book "The Spotify Play" and it made it seem like the pirated music was an internal-only thing and not something available to customers. Is that true?
So while it was using pirated media, it was sanctioned by the rights holders for the experiment of building Spotify.
Another interesting note, in the early days of spotify, the app would saturate your upload bandwidth while using it. Given their close ties to utorrent, I always assumed that's how they were affording the bandwidth as well.
Pretty brilliant way to bootstrap I guess; they didn't have to pay for bandwidth or content until they already had contracts in place
Just to point, but the material in question was public domain, so nobody had even a copyrights claim over it.
It's true, and relevant, that Google would feel those consequences much less sharply than Swartz did.
I've spoken to several very wealthy/powerful people and tried to get them to negotiate a large-scale content license with the various publishers that would allow researchers and individuals to access more research in lower-friction ways. None of them (NIH, Schmidt, etc) were really interested.
Apparently he would have gotten away with downloading the JSTOR database if he made it clear that he intended to only publish half of each paper.
The limit is what you can actually get away with, not what the rules say you can get away with, and the system aggressively selects players who recognize this. It's amoral - there is no "ought", only "is". An actor gets punished or not, with absolutely no regard to whether it "should" get punished. One thing is consistent: following the rules as written means you lose.
You can see it in Y Combinator (and other) startups. The biggest ex-startups are things like AirBNB (hotels but we don't follow the rules but we don't get punished for not following them) and Uber (taxis but we don't follow the rules but we don't get punished for not following them).
One way to not get punished for not following the rules is to invent a variation of the game where the rules haven't been written yet. I again refer you to AirBNB and Uber; Omegle also comes to mind, although they didn't monetize.
Viewed in this light, Aaron Swartz's mistake was not the part where he downloaded journal articles, but the part where he got caught downloading journal articles. Shadow library sites are doing the same thing, minus the getting caught. So are Meta and Google and OpenAI. sci-hub is only involved in a lawsuit because it got caught and is now in the stage where it finds out whether it gets punished or not.
Turns out there are 2 simultaneous wars there. One where companies and individuals compete ruthlessly.
And another one where if non profit associations of individuals form, guns come out.
MegaUpload did the same, kim dotcom got raided in his sleep by FBI in New Zealand! So no I don't buy your reductionist argument, there are forces at play that allow companies with founders with the likes of Google to get away with it but not others.
To this day, there are a huge number of videos that show copyrighted content on YouTube; they are usually crappy clips, reversed and with different music playing in the background to avoid automated detection.
I don't understand why you wouldn't just buy copies of the books. Seems like such a relatively inexpensive way to strengthen your legal case.
Or so they think, I think.
Some can steal from stores and see no repercussions.
Some can steal from others and see no repercussions.
Some can violently harm others and see no repercussions.
Some can damage property and see no repercussions.
Some can’t. This world is not right.
"Ek, who had been the CEO of the piracy platform uTorrent, founded Spotify with his friend, another entrepreneur named Martin Lorentzon. Both-Ek at 23 and Lorentzon 37-were already millionaires from the sales of previous businesses. The name Spotify had no particular meaning, and was not associated with music. According to Spotify Teardown, the company developed a software for improved peer-to-peer network sharing, and the founders spoke of it as a general "media distribution platform." The initial choice to focus on music, the founders said at the time, was because audio files are smaller than video files, not because of a dream of saving music.
In 2007, when Spotify first publicly tested its software, it allowed users to stream songs downloaded from The Pirate Bay, a service for unlicensed downloads. By late 2008, Spotify would convince music labels in Sweden to license music to the site, and unlicensed music was removed. From there, Spotify would take off across Europe and then the world."
https://qz.com/1683609/how-the-music-industry-shifted-from-n...
I think more people, potentially anyways, would feel similar to to this if it applied even somewhat equally.
Instead, companies can seemingly do whatever they please whereas lawyers will send letters to your home for downloading a single episode of game of thrones.
So in other words, it got big by providing free user traffic to people's websites without asking for compensation?
You generally don't charge the phone book money to include you in it. It's actually the other way around.
IMO part of the reason the SV tech bros are embracing right wing grift culture so publicly now is that this method, which had been serving them well for decades, doesn't really work without the infinite free money lending spigot being wide open.
By the time the cheque comes, your illicit venture either went bust or you built a bilion dollar empire capable of buying the best lawyers and lobbying to walk away clean.
I’m opposed to copyright and pro-aaronsw, but the state did not kill him.
https://en.wikipedia.org/wiki/Asbest
https://www.youtube.com/watch?v=cy3piCUPIkc - VICE documentary and visit video. I think it contains an interview with an American woman who suffered from WR Grace and Company's asbestos mining and manufacturing in the USA, she says "they knew, they knew". WRG faced 129,000 personal injury claims and set asude $3 Bn for settling asbestos related lawsuits.
Weird framing given how much value was and is still placed on Google driving traffic to you
Google used to send customers to your site. Now they try to show you the information on their site so that the customer doesn't need to go to your site.
Basically the entire legal system needs to be retooled and rethought for computers.
And the legal system is for humans not computers.
That's how the internet works. If you want private content, you need to put up a gate mechanism of some sort with authentication or other methods of restricting access. Without that, you are literally having your server "serve" the content to whoever asks for it, without restriction or exception, without ToS or meaningful contract or agreements.
You can't have it both ways. "But they didn't know" or other post-hoc claims of innocent people publishing content to the web being misled or confused or abused is infantilizing nonsense.
The web wouldn't have been as amazing and revolutionary and liberating if the fundamental public and open nature of its systems was private and walled off by default.
Your take on YouTube going viral initially over copyrighted content isn't correct, either - it was ease of use and access. It was fairly popular by the time Google bought it, and once it was reachable and advertised by google itself, it exploded, because by that time, everyone had defaulted to using google for search.
Other people corrected your Spotify take.
The reason they pirated is because it is functionally impossible to gain access to the data in any other way. For consumers, there are lots of old shows, music, and other content that aren't accessible, so they turn to piracy. A vast majority of the time, if content is accessible, people will pay and do the technically legal and "right" thing.
Publishers exploit authors and content creators in the name of "platforming" and "marketing" , effectively doing as little as possible to take 90%+ of the value of a product and providing as little as possible to the producer of content or books or music. They get by on technicalities and have captured the legal arena entirely, with any attempt at reform or revolution meeting a messy death at the hands of lawyers and big money publishers.
Screw those people. They lie, cheat, and steal, and somehow have gotten away with fooling the world into thinking they're the good guys.
Copying bits and bytes is not stealing, and the ones trying to shill that narrative are trying to fool as many people as possible into giving them more money without any return of value in kind. I'd download the hell out of a car. Pirate everything.
And in their face, with all the fierce ignorance, broligarchs deny, evade and totally pretend this never happened. The most non open company of all even went to lengths to accuse others of stealing their IP - not theirs to begin with.
Just think of it - why did all major content platforms closed their APIs the day after GPT-2 got the word going…? Cause they knew all this very well - the content is precious and needed. They been doing it all along. Distilling the essence of world’s writing and digital imagery they had no right to.
We have a saying where I come from - no mercy for the chicken, no laws for the millions. I thought it was a local thing at first, it turned is how the world goes. Nothing new under the sun, indeed.
Napster got shut down for widespread enabling of copyright infringement. So did numerous other filesharing startups, including Travis Kalanick's first startup, Scour. Lots of small startups get put out of business all the time for being sued and not having the money to defend themselves.
Likewise, individuals like Donald Trump or Elon Musk get away with all sorts of illegal shit, because they are big enough to shut down the court systems prosecuting them.
Google's genius was in staying under the radar and aligning their incentives with everyone that might dislike them, until they were big enough that they could simply crush anyone that might dislike them.
This is exactly what I immediately thought while reading the article. It almost feels like the legal system only punishes general public, while most of these guys are above it.
> There must be in-groups whom the law protects but does not bind, alongside out-groups whom the law binds but does not protect.
If you do something wrong as "part of your job" then you're typically not held responsible and accountable but the company is (the exceptions being spectacular fraud: Enron, VW diesel).
It's not hard to see how this can go off the rails.
It’s because the legal system is not about justice, it’s about money
Most people can’t afford lawyers or expensive legal battles
On the other hand, individuals and organizations with a lot of money get to weaponize and exploit the legal system to their advantage
“To my friends, anything; to my enemies, the law”
In more general terms, the legal system punishes what can be made a profit or an example when punishing.
Also, I don't think the legal system itself wants to get too much into "big institutions against the work of others", save for the fictional TV representations of smart lawyers and clever arguments, 99.9% of the legal system output is copy/paste.
I think Aaron Swartz went to Harvard, not MIT
Welcome to the modern day aristocracy. Not only what you mentioned, this world is also divided into a group of insider who can get capital from 0 - 2%, while rest of us has a cost of 17%, 22% or 30%?
That's why democracy often feels "failed" in that no change can be achieved because "it's just more of the same". Few Lobbyists representing the interests of a few people have more power than millions voting differently.
"This problem will be solved in the favor of the (party) which has the most money to throw into the problem" (paraphrase mine).
So, yeah.
People often elevate deeply flawed figures to heroic status when those figures seem to challenge authority or "the system." This happens especially with individuals who present themselves as outsiders fighting the establishment, have a compelling personal struggle narrative, or voice grievances that resonate with public frustrations
Trump fits this pattern - his supporters overlook concerning behaviors and statements because they see him as fighting a system they distrust. Like Manning and Swartz, his mental state and fitness are often ignored in favor of the "hero against the system" narrative.
This dynamic creates a feedback loop where legitimate criticism becomes harder to discuss rationally.
For some reason, whenever you're a billionaire or company, things suddenly get so difficult that you can claim that it's impossible to be held accountable for anything. Murder, insider trading, laundering, treason, etc.
OpenAI complained about this, as did Google and everyone else. If your company can't exist without stealing data, then it's not a viable company. Companies don't have a constitutional right to exist.
Wrong.
a) Robots.txt which defines what content you wish to make available to third parties predates every search engine including Google. Web site owners chose to make it available to Google and search engines have respected their wishes despite it not being in their best interest.
b) The difference here is that OpenAI, Meta etc have not even tried to honour the wishes of copyright holders. They just considered everything as theirs.
c) Google grew big because it had no ads, fast interface and PageRank was significantly better. It wasn't because it had the most comprehensive index.
Strong disagree. Since robots.txt is optional and the default is "crawl me as you please", website owners don't "choose to make it available", they just don't choose to make it non-available.
What we should have been doing all along is YOLO-ing everything. It's only illegal if you get caught. And if you get big enough before you get caught then the rules never have to apply to you anyway.
Suckers. All of us.
No it isn't. The actual sucker attitude is copying what they do. You should act morally and with integrity out of respect for yourself. I never had any illusions that large tech companies act with respect towards the law, but it also has nothing to do with me.
Not quite. It's only illegal if you get caught and you are the wrong kind of person.
For the right kind of person not even a pat on the wrist.
Like when Trump said he is “smart” for evading taxes during the presidential debates (IIRC the first ones, not recent ones).
It’s absolutely despicable. Have a moral compass. Treat people fairly. Be nice. Let’s be better than toddlers who haven’t learned yet that hitting is bad, and you shouldn’t do it even if mommy and daddy aren’t in the room.
> We include two book corpora in our training dataset: the Gutenberg Project, [...], and the Books3 section of ThePile (Gao et al., 2020), a publicly available dataset for training large language models.
Following that reference:
> Books3 is a dataset of books derived from a copy of the contents of the Bibliotik private tracker made available by Shawn Presser (Presser, 2020).
(Presser, 2020) refers to https://twitter.com/theshawwn/status/1320282149329784833. (Which funnily refers to this DMCA policy: https://the-eye.eu/dmca.mp4)
Furthermore, they state they trained on GitHub, web pages, and ArXiv, which are all contain copyrighted content.
Surely the question is: is it legal to train and/or use and/or distribute an AI model (or its weights, or its outputs) that is trained using copyrighted material. That it was trained on copyrighted material is certain.
[Touvron et al., 2023] https://arxiv.org/pdf/2302.13971
[Gao et al., 2020] https://arxiv.org/pdf/2101.00027
1.) Training on copyright that is publicly available. You write a poem and publish it online for the world to read. That is your IP, no one else can take it an sell it, but they are free to read and be inspired by it. The legalitly of training on this is in the courts, but so far seems to be going in favor of LLMs.
2.) Training on copyright that is not publicly available. These are pretty much pirated works or works obtained by backdoor to avoid paying for them. Your poem is behind a paywall and you never got paid, yet the poem is known by the LLM. This is just straight illegal, as you legally must pay to view the work. However there might be conditions here too like paying for access to an archive and then training on everything in it.
Is it truly a violation of copyright when a user hacks out bits and pieces of easily restyled raw data points from a model to look samey? what about if it takes two models? Might be time to accept humans are just cooked in their ability to discern attempts at direct plagiarism - just as it is hard to discern Sky voice from Her voice.
In particular, people often cited the case of authors who had died leaving a family in destitution, and claimed that copyright extension would be a fair way of preventing this, but in most cases the remaining family had never held the copyright; the author had initally sold the reproduction rights to a publisher who had then sat on the work without publishing it. The author, driven into penury, was then induced to sell the copyright to the publisher outright for a pittance. So in such cases a copyright extension only benefited the publisher, and indeed increased their incentive to extort the copyright.
The one who got Hindu Sanskrit books translated in a horrible manner and then claimed: "I have no knowledge of either Sanskrit or Arabic. But I have done what I could to form a correct estimate of their value. I have read translations of the most celebrated Arabic and Sanskrit works. I have conversed both here and at home with men distinguished by their proficiency in the Eastern tongues. I am quite ready to take the Oriental learning at the valuation of the Orientalists themselves. I have never found one among them who could deny that a single shelf of a good European library was worth the whole native literature of India and Arabia."
This chap will educate us on copyright?
No thanks!
If you reject Macaulay on copyright because he was an imperialist, you can use the exact same logic to reject the arguments of essentially every person who ever lived. Very few humans who ever wrote anything important will perfectly align with your morality, and most will be horribly misaligned in at least one way.
Very nice of you to omit the following sentences of that excerpt, where it proceeds to develop its point on the argument for institution of an English-language based education system on British India. He praised how superior in quantity and quality were the Sanskrit or Arabic corpora, compared to European works, in the lyric/poetry. But that no technical or didactical literature amounted to even the most mundane of the European manuals like those used by then in England humble schools (and it seems completely plausible).
He was a fierce abolitionist. So much for accomplishing the mission of allegedly, judging by comments in this thread, 'deranged imperialist destruction and chaos imposition over the lesser ones'.
I'm not much versed into his speeches/stance on copyright, but I can vouch for the fact that the most honest and well-intended moves (not by him, by other figures) in defence of everyone's intellectual property were done in the same century. From the Twentieth onwards, it has been only twisted for the interest of a select few, and needless to ask where we are today in terms of caring about intellectual property of anybody.
[1] Just saw your other comment where you go on with his nauseating words. One just cannot comprehend that framing the past on the actual status quo is as futile as to not being even wrong, I guess?
> The one who got Hindu Sanskrit books translated in a horrible manner and then claimed: "I have no knowledge of either Sanskrit or Arabic. But
... Here's what they mean, from ChatGPT."
He was able to sell it because it is something valuable, exactly because of the copyright protections. Regardless of whether author sells the rights or not, he and his family would equally be better off with copyright.
copyright as written serves the interests of publishers who don't create valuable works more than the creators of the work...
When Metallica sued Napster, for many people the reaction was, "wait I can download music for free?"
Are AI-written books getting published?
If they start out-competing humans, is that bad? According to most naysayers, they can't do anything original.
Are people asking the AI for books? And then hoping it will spit it out a human-written book word for word?
Personally, I strongly believe that the aesthetic skills of humanity are one of our most advanced faculties — we are nowhere close to replacing them with fully-automated output, AGI or no.
LibGen gives you access to a much smaller body of works than either of those. It’s a little more convenient. But the big difference is that it doesn’t compensate the author at all.
Just go to a real library.
2. DRM is built in to most purchased ebooks, which means you can’t consume the book on any device. “Illegal” tools exist to circumvent this.
3. Large ebook stores - like other digital stores - essentially lend you a copy of the book. So when they are forced to pull a book, they’ll pull your access too.
Of course, now that the big players have consumed/archived the entire book dump, they can go ahead and kill it to prevent others from doing the same thing.
> Just go to a real library.
The thrill of waiting a week for a book to arrive or navigating the labyrinthine interlibrary loan system is truly a privilege that many can afford. And who needs instant access to knowledge when you can have the pleasure of paying for shipping or commuting to a physical library?
It's also fascinating that you mention compensating authors, as if the current publishing model is a paragon of fairness and equity. I'm sure the authors are just thrilled to receive their meager royalties while the rest of the industry reaps the benefits.
LibGen, on the other hand, is a quaint little website that only offers access to a vast, sprawling library of texts, completely free of charge and accessible to anyone with an internet connection. I'm sure it's totally insignificant compared to the robust and equitable systems you mentioned.
Your suggestion to "just go to a real library" is also a brilliant solution, assuming that everyone has the luxury of living near a well-stocked library, having the time and resources to visit it, and not having any other obligations or responsibilities. I'm sure it's not at all a tone-deaf, out-of-touch recommendation.
https://en.wikipedia.org/wiki/Aaron_Swartz#United_States_v._...
https://en.wikipedia.org/wiki/Aaron_Swartz#Death
While Aaron Swartz was bullied to suicide, these corporations will walk free and make billions. I say give every tech CEO the Swartz treatment, then change the law.
MIT students will get away with breaking bigger rules than community college students will.
If he was acting rationally and came to the conclusion that dying was better than spending X years in jail, he would have committed suicide after sentencing, not before any trial had even happened.
Two wrongs don't make a right. If a law is unjust, then what good is there in continuing to punish people who have broken it, just because other people have been punished in the past?
Either you think the law is just or unjust. If you think it's unjust, I don't possibly see how you think people should be punished for it. Meta wasn't responsible for what happened to Aaron Swartz.
Big corporations are too big, they should just not exist. When you have corporations more powerful than the government of the biggest states, it's a bug, not a feature.
The IP laws may need rethinking. Saying that they should disappear because big corporations are above the law doesn't help, though. First kill the big corporations, then think about fair laws. Changing the law now would not change anything since those corporations are already above the law.
It's not possible to kill big corporations before fair laws, because as you said yourself "corporations are already above the law"
Unfair laws don't apply to big corporations, they only apply to the people opposed to big corporations
It's akin to hamstringing a horse and saying you'll fix it when they win
The only distinction between corporations and governments is one of them are morally bankrupt arbiters of force.
For instance, what if google was still just serving search results w/ ads, and they never expanded that. How would you make them smaller?
I don’t know how you define powerful, but I highly doubt it is at that point.
Nor should big governments.
Nor should big countries, for that matter.
That said, I want them to burn for the right reasons.
Downloading data that should be available to the public is not one of them.
Also, change the law so this is legal for poor meta? smh..
That means lawsuits, prison sentences, and millions in fines. And that's just the piracy part, there's also the lying/fraud part.
Interestingly, a Dutch LLM project was sent a cease and desist after the local copyright lobby caught wind of it being trained on a bunch of pirated eBooks. The case unfortunately wasn't fought out in court, because I would be very interested to see if this could make that copyright lobby take down ChatGPT and the other AI companies for doing the same.
So a copyright warning letter in the mail from their ISP? Maybe someone should tell them about VPNs...
- Seed the torrent and publicly promote piracy pushing lawmakers.
- Contribute with digitisation and open access like Google did in the past.
- Make the part of their dataset that was pirated publicly accessible.
- Fight stupid copyright laws. I can't believe that copyright lasts more than 20 years. No field moves that slowly, and there should be tighter limits on faster moving fields.
You mean Electronic Frontier Foundation? https://www.eff.org/issues/innovation
It's incredibly rare to find people who hold ideals that are detrimental to their own life.
Flippant response I know, but too many people worship at the alter of the job creater and believe these folks are moral upstanding citizens
Could make interesting case law.
Yeah, to perpetuate this system where only those who can afford lawyers get to benefit
What I mean is: when someone is prosecuted for copyright infringement, but Meta isn't, then could the case be put on hold until Meta is found guilty and pays a fine?
Also maybe the fine on the later case would have to be proportional to the prior case. So if Meta pays $1 per infringement, the penalty might be $1 for torrenting something else (which is immaterial and not worth the justice system's time) so pretty much all copyright infringement cases would get thrown out.
It reminds me of how mainstream drug addicts get convicted and spend years in prison, while celebrities get off with a warning or monetary fine.
It's a fundamental part of lawyer training, and if they want to let BigCorp go and bring the hammer down on the little guy, they can make up a hundred reasons for it.
Take for example 675k paid for 31 songs. So 20k a song. If we estimate book to be say 10MB that would 8 million works. So I think reasonable compensation is something along 163 billion. Not even 10 years of net income. Which I think is entirely fair punishment.
The only ethical problem here is that only Meta sized companies can afford to pay the "damages" for such blatant law violations at worst, or the fees of their lawyers at best.
Companies like Meta and OpenAI, however, should definitely have to pay to use the hard work of humans to train their AI.
They will be getting a lot of Frommer Legal letters...
Whether training on AI model on an array of diffentent works, many of which are copyright protected, is itself a copyright violation, in addition to or distinct from any copyright violation that goes on gathering the dataset for training (and separate from any copyright violation in the actual or intended use of the LLM), remains to be resolved as a legal question, and may or may not have a simple yes or no answer (or the same answer under every system of copyright laws globally).
My inclination is that it is probably generally not a violation in US law, but that's not something I am very confident in; how the definitions of copy and derivative work apply to determine if it would be without fair use, and how fair use analysis applies, are not clear from the available precedent.
> But legally, how does using a book to train a LLM differ from a teacher learning from a book and teaching its contents to their pupils.
It is very clear, by looking at how US copyright law is written and even more clear in its history of application, that information stored in brains of people are without exception neither copies nor new works that can be derivative works under US law, and so cannot be infringing, no matter how you gain them. It’s also very clear in the statute itself and the case law that data in media used by artificial digital computers, on the other hand, can constitute copies or derivative works that can be infringing. Even if the process is arguably similar in legally relevant manners, copyright law is critically focussed on the result and whether it is a particular kind of thing which can be infringing, not just the process.
I truly hope that whoever takes the case goes after Meta with 1000 times the pressure that was put on Swartz, but honestly I don't expect much just as the top comment precisly expressed.
And if we are going to be fair please also let's not forget about the other usual suspects, or anyone thinks they are falling behind?
Several EU countries, Switzerland, South Korea, Japan, etc. are viable countries to sue from. Even in Japan which has a law specifically permitting training on copyrighted material you must still obtain it legally-- i.e. you must license it.
Horse has functionally bolted on this already
I’m guessing slap on wrist despite courts going after individual for a couple of movies torrented pretty hard
At a minimum the starting point of discussion here should be that if life ruining $80,000 per item is an acceptable fine for individuals then why is it not the same for corporations. Which would probably get you a number in the trillions at which point we could have a discussion about reforming this entire system.
But yes realistically slap on wrist is what is going to happen here.
Yes, of course.
It's quite possible that judges realize that if they restrict training data to licensed materials, LLMs will become stupid and China will overtake the US to become the leader in AI, and because that can't happen, they'll make up some reason to make training on unlicensed data legal. It's definitely fair use!
I'm not even joking. Last time the US Supreme Court basically said "Android is too important, we have to declare its use of Java API fair use."
The rules have always seemed different for corporations regardless.
https://www.businessinsider.com/trump-settles-lawsuit-meta-m...
So, barring further Might Makes Right shit--which I'm not willing to fully rule out--Trump can't fully shield Zuckerberg et al.
I'm pretty sure you can theoretically download torrents without seeding, although this is frowned upon. If they really seeded (with full bandwidth?) that's indeed pretty brazen.
It is sort of strange that Meta is being singled out here though, and sort of sad considering they at least release the model weights. What's the signal? Do illegal shit to be competitive, but make sure there is no evidence?
I'm also ok with abolishing copyright all together if he's too untouchable
The alternative is a futile legalistic attack against a monopoly entity too powerful to be meaningfully punished. That won't accomplish anything useful. It would, rather, help cement this status quo, where copyright infringement is selectively legal or illegal, for different entities at the same time; and companies like Meta thrive arbitraging that difference. You can't defeat Meta—but you can help dig them a moat.
I'm pretty sure I could list ten megacorps that would collapse overnight if copyright was abolished. The music groups, movie studios, streaming platforms...
> Level the playing field, incrementally, for everyone else who isn't a trillion-dollar corporation.
There is no level playing field when you have individuals and trillion-dollar companies in the same market.
- Ice Cube.
Meta will face no consequences. Say your a small publisher and you'd like a bit of compensation. If you dare sue Meta can just blacklist your books on its platforms. Even if they don't, you probably don't have the money to sue one of the biggest companies on earth.
I think copyrights should be limited to 25 years after first publication. This would fix plenty of issues and give the AIs of the world plenty to learn from.
Who am I kidding, Meta will take what they will. For that author making 20k a year, be honored to be of use to Meta.
but the masses are addicted to the slop that meta feeds them.
We will know why OpenAI isn't getting investigated.
At least this has been the recent experience of a friend who used libgen and anna's archive to download legal, public domain works!
Property is based on scarcity - if you take my car, I no longer have a car. But if you copy my book, I still have my book. No loss, no theft, just an outdated legal fiction designed to stifle innovation and enrich rent-seeking middlemen. An no, loss of potential sales doesn't count - it's like being able to claim a lottery ticket has real value.
Copyright was never about protecting creators—it’s about locking down ideas, preventing competition, and extracting endless fees. Shakespeare borrowed, tech companies iterate, and science thrives on free exchange. The idea that knowledge should be locked away indefinitely is absurd.
Meta’s mistake wasn’t using the data - it was pretending copyright still matters. AI is exposing the system for what it is: obsolete. The future belongs to those who create without asking permission.
https://www.engadget.com/2015-12-21-peter-sunde-kopimashin.h...
It's obviously absurd to enforce copyright as bytes are copied around instead of as it is used. Training an LLM is a different thing than re-hosting and giving away copies to other people.
If you don't want people to transform your works - keep them private. You don't own ideas.
From the article: Kopimashin, as in Copy Machine.
1) the concept of copyright is as old as the word suggests (copies are the least of our worries going forward - it should be possible to define processes for exploitation of ideas in a fair way)
2) we allow humans to learn from other people's ideas and transform them to commercial products and the same should happen for AIs in the future
3) we have an ill-defined concept of "personally identifying information" which gives people ownership to information that others have created via their own means - there should be better ways to ensure a level of privacy (but not absolute privacy) without overly-broad, nonsensical definitions of what is personally protected information
4) We allow social media and other telecommunications media to arbitrarily censor people's speech without recourse. This turns people's speech to property of the social media companies and imposes absolute power on it. This makes zero sense and is abusive towards the public at large. We need legal protections of speech in all media, not just state-owned media.
What information about me could a corporation create via its own means that would be legally protected but shouldn't be? PII is generally information that a corporation collects. Unless you mean that my cellphone provider creates the association between my name and phone number and should therefore be able to do with it as they please?
If you get a direct quote then you're good with your claim, surely.
Whatever the ruling one thing is for sure, plagiarism is no longer the sincerest form of flattery. The human authors are out for AI blood on this.
They need to make datasets which don’t have this problem or have entities in Singapore train the foundation models within their rules. The latter has a TDM exemption that would let AI’s use much of the Internet, maybe GPL code, licensed/purchased works they digitize, etc. Very flexible.
(imo not in accordance with the Constitution, after absurdities like deciding “limited time” the way mathematicians might define something of some order of infinity)
the alleged social contract was is not functional the way it was intended, and we see who benefits and who loses.
mass dynamic editing for vitriol and profanity occurred while writing this comment in order to remain within site rules
Meta does a lot of stuff I disagree with, but they're usually not just straight breaking the law.
They've thrown away a huge amount of communication to source code commit reinforcement training data as a result. They do it to avoid emails making it into trials like this.
Aren't they obligated by law to keep all internal communication?
If I were younger, I would be livid.
Zuckerberg has paid the vig several times [0,1,2], which is evidently the best legal strategy under this administration. OFC, considering there are already multiple payments, there is no assurance the vig payments won't substantially increase as the Capo sees more opportunity for profit.
[0] https://en.wikipedia.org/wiki/Vigorish
[1] https://www.politico.com/news/2025/01/29/meta-settles-trump-...
Meta, with its "open weights" models, is one of the least guilty parties, since at least they've made the resulting blobs of mass piracy available to us. Same with Mistral, Deepseek, etc.
ClosedAI, Google, and others have all probably done this and more and refuse to make even the model available.
I think the way to deal with this is very simple:
If you have trained your model on works to which you do not have rights or permission, the resulting model is not copyrightable and cannot be sold. It must either be kept for research purposes only or released free of charge and in the public domain. All these models that have been trained on pirated works should become public domain.
Of course now that we have full capture of the US Federal Government I'm sure any suggestion like that would be neutralized with one bribe to Trump.
But we live in this stupid society where you have to move mountains to change things an inch.
I'm going to assume as it's a corporation, then the laws no longer apply.
The fact that most of the world embraced hardcore copyright troll ludditism when the means of their (badly paying creative) jobs economic production was democratized implies that most people do not believe in any "egalitarianism" and especially not the left-wing form many profess to believe in. Certainly not "information wants to be free" or any of the other idealist shit that I or Aaron Swartz believed in. What meta did was software communism - full stop. They literally released their models to the public! I support all of this 10000%. The only issue is that they're not open enough (fully open source the dataset)
So, unironically, good! Thank you, please pirate more! Please destroy the US IP system while you're at it. Copyright abolitionism is good and thank you Zuckerberg!
Rules are just for us peasants.
I suspect that if the case is reasonable they will just convict, and quickly-- appeal denied and all simply because the laws are so straightforward.
After OpenAI trained their models on the famed books2 dataset, and seeing the technological implications of ChatGPT, there was a good chance they would let them get away with it.
Would the USA really surrender its AI technological advantage for trivial matters like copyright? They would make some royalty arrangement and get it over with
so its quite funny to see they freely share it too.
It's so funny to see the law blatantly ignored by the overlords. Like, there isn't even a pretext anymore. They just steal what they want and budget for the fines and campaign donations to make the consequences go away.
Same for all the other sleazy tech bros.
We are trying to advance civilization here. To accumulate and make available all human knowledge to date. And you stand there with your hand out to stop this? You are a villain. There is no sympathy for you.
Enough with laws for thee but not for me!
Nothing in my life made me ever want to go back except for when I got back into playing hockey, and all the hockey leagues use facebook to communicate a few months ago.
I made a new account, had to literally upload a picture of my face to pass verification.. and then a few days later I was immediately banned and couldn't use my account. I assume because they searched previous data and compared my face to find out I have a "deleted" (lol) account and matched me. I've assumed they'll only let me log in if i use my original 10 years ago deleted account.
Fuck meta. Fuck zuck.
a) Financed via inflation/"cantillon effect" due to ZRP/Stimulus that absolutely flooded the market with funny money in the hand of the sharks. b) Trained upon copyrighted work without compensation. c) Trained upon open source without even asking politely for authorization.
The Robber Barons from the last century can't even get close to our modern Feudal Tech Lords.
Unless you're one of us that have amassed multi-generation wealth in a exit in the last 20 years, you're completely fucked.