Please HN do your thing and prove me wrong or tell me the benefits to society outweigh the cons, because otherwise it's a depressing take.
Then again, I would happily train a model on Anna's Archive myself without feeling guilty about it, if I had the resources to do so.
I think it ranks very, very low on the list of bad things Facebook has done.
The legal battle initiated by the Authors Guild was vicious and it damaged and hampered Books and other entities and scanning projects like the Internet Archive.
Now Facebook pirates all the world's books, uses them without paying authors or publishers, and seemingly faces no consequences.
I would strongly prefer to live in a world where we could easily search and access all books, than one where the richest guys get to exploit them without consequences.
Facebook would quite literally sue you until you killed yourself if you did that. It’s not the same rules for them and for us.
Since deepseek it's clear that regulary capture just doesn't work in this case, only pure competition. If USA doesn't let NVIDIA sell GPUs to the world, Huawei will be happy to in a few years.
As content gets polluted with ai-garbage, human-generated content will regain value.
Here is a startup idea: A startup that protects creator's content from violations. For a nominal subscription fee, ensure content is not ripped off on major social platforms and AI. Track regulations in different jurisdictions, and keep a legal team on the beat. Sue big-tech for damages when found. ie, police for the creators.
HN - take it and run.
What has this to do with masculinity? Is this cheap misandry?
Piracy doesn't care about borders, and if US firms don't do it, Chinese firms will. It is a national competitiveness issue in that sense, which is the easiest argument to get the government to do things to weaken copyright.
Laying the groundworks for the eventual defeat of MAFIAA is a great benefit to society if it happens, and in my opinion outweighs the nonexistent "damage" they did. There wouldn't be llama if they didn't pirate the books, and the authors won't get paid anything either. Bonus points if we can get rid of DRM anti-circumvention as well.
For what it's worth Facebook doesn't seem to be doing the regulatory capture part, unlike "Open"AI and Anthropic.
I've been quite glad of Ubers and LLMs. I think it would be good if the LLMs could read the stuff Google Books scanned but was unable to make public. (this stuff https://www.reddit.com/r/books/comments/67fkkj/somewhere_at_...)
If only because any other outcome will radically empower corporations (and entire countries!) that don't GAF about copyright.
There. We good?
Also, in France and as an individual, if I were to openly torrent all the content of Libgen or Anna's Archive, I would have to spend the rest of my life in jail whereas Zuck can enjoy having more billions with the same behavior.
Some kind of rant: The world is truly fucked right now if such behaviors are rewarded, but at my grand old age of almost 50, I'm starting to move away from all this and it's relaxing: I live a simpler life, I buy from small creators, and I learn whatever I want. I have also gained a lot of time that I can now spend by helping people around me. Life can be fun if you ignore that you'll never be in control of all the bad behaviors around you.
Or you would driven by suicide for much smaller violations - https://en.wikipedia.org/wiki/Aaron_Swartz
I mean, one could even argue that what Aaron was doing was not even violating copyright, as he had access....
Yet this guy is dead, and the Facebook owners are instead going to get a nice raise.
They are at least giving the model they trained back to the community. https://www.llama.com/llama-downloads/
I think a solid argument can be made that facebook downloaded and shared pirated material with literally every ML employee they had at the time…
EDIT: to those who downvoted, why? Do you disagree with what I said?
They stole a billion donuts.
The real question is whether the value of these donuts is diminished by being ingested by the AI and it's still not clear.
There thousands of posts on HN defending digital piracy, mocking the government and groups like the RIAA that make poor analogies to theft of physical goods.
and between china and facebook, it is coin toss but I’d go with china :)
If you're just giving the result back to humanity, OK, there's a case for it being a fair trade, though there's still the question of how to handle the disruption of long tradition of valuing & compensating labor.
If you're using it to win capitalism, 50% tax seems starting stakes. Sure, lots of expertise and resources go into the data processing to produce a model that can talk to you, but without the data, the processing doesn't matter (and without the model, we could still do this collectively).
But even if it wasn't -- even if couldn't be -- you could use it to fund all kinds of public goods which would benefit living authors. Maybe even a basic income if it turns out to be successful enough.
It's a different pro-social bargain than direct compensation for the fruits of labor, but at least it's still a bidirectional one.
Isn't this a twofold misunderstanding of BitTorrent? I haven't used it much, but I've never believed BitTorrent to be popular for anonymity (is it even truly anonymous?), I thought it was popular because it makes downloading go faster by reducing bottlenecks. Also, choosing not to seed a file is extremely simple in every torrent client I've seen, so it seems a bit of a leap to conclude that Meta seeded the pirated books just because the protocol supports it.
I'm far more concerned about the simple fact that they downloaded pirated books at all than I am about the protocol they used to do it.
The key word here is "typically". Mutual sharing is designed into the protocol itself, particularly in the early seeding period, where higher priority is given to peers that re-share their portions of the torrent.
Yes, you can turn off uploads, but that's not the default in any client I've ever used. So I don't see anything wrong with saying the typical user will re-upload files.
I'm not objecting to the use of the word "typically", I'm objecting to the explicit suggestion that Meta might have seeded the pirated works. It's unlikely and unnecessary to suggest—there's plenty else wrong with this picture so there was no need to include this idea, it's just a distraction from the real problem.
I understand the outrage against mark zuckerberg for nurturing such culture and making the executive decision, but also understand that at least one engineer was involved in writing and executing the code that does the piracy (with Product Managers and other cross-function employees)
And given the importance and visibility of the work, it's pretty obvious - that person wouldn't be a low level engineer either (I'd assume they earn about 1 million dollar a year - IC7+ at Meta)
now comes the depressing statement - a solid engineer, who probably don't even need to work another day in their life, is being a puppet of mark zuckerberg and robbing creators.
This is depressing to me as an engineer.
"Meta's internal culture is seemingly hostile to ethical practice".
* https://news.ycombinator.com/item?id=42971446
Also "Meta claims torrenting pirated books isn't illegal without proof of seeding" (Feb 21):
> In a message found in another legal filing, a director of engineering noted another downside to this approach: “The problem is that people don’t realize that if we license one single book, we won’t be able to lean into fair use strategy”
I hadn't heard that idea before. Any IP law experts able to give useful context on that one?