"I got the loot, Steve!"
I feel like the distillation stuff will end up in court if they try to sue an American company about it. We'll see what a judge says.
Stole? Courts have ruled it's transformative, and it very obviously is.
AI doomerism is exhausting, and I don't even use AI that much, it's just annoying to see people who want to find any reason they can to moan.
The courts have ruled that AI outputs are not copyrightable. The courts have also ruled that scraping by itself is not illegal, only maybe against a Terms of Service. Therefore, Anthropic, OpenAI, Google, etc. have no legal claim to any proprietary protections of their model outputs.
So we have two things that are true:
1) Anthropic (certainly) violated numerous TOS by scraping all of the internet, not just public content.
2) Scraping Anthropic's model outputs is no different than what Anthropic already did. Only a TOS violation.
Regardless of whether LLM training amounts to theft, thieves are still allowed to put locks on their own doors.
"not copyrightable" doesn't imply they can't frustrate attempts to scrape data.
Actually, not anymore as a result of OpenAI and Anthropic's scraping. For example, Reddit came down hard on access to their APIs as a response to ChatGPT's release and the news that LLMs were built atop of scraping the open web. Most of the web today is not as open as before as a result of scraping for LLM data. So, no, no one is perfectly free to scrape the web anymore because open access is dying.
Yes, rich and poor are equally forbidden from sleeping under bridges.
Anthropic paid a lot of money for a moat and want to guard it. It is not wrong, in any sense of the word, for them to do so.
Try this: If you want to train a model, you’re free to write your own books and websites to feed into it. You’re not free to let others do that work for you because they don’t want you to, because it cost them a lot of time and money and secret sauce presumably filtering it for quality and other stuff.
Do you hear the words coming out of your mouth?
Is the work of others less valid than the work of a model?
And everyone is free to consume all the free information.
Courts have ruled it's not, and I don't think anyone is arguing it's okay.
>but it is not okay for others to try to do the same to an AI model?
The steelman version is that it's okay to do it once you acquired the data somehow, but that doesn't mean anthropic can't set up roadblocks to frustrate you.
Your legal argument is all over the place as well. What is more relevant here: what the courts ruled or what you consider obvious? How is distillation less transformative than scraping? How does courts ruling that scraping to train models is legal relate to distillation?
Nobody is scoring you on neutrality points for not using AI much and calling this doomerism is just a thought-terminating cliche that refuses to engage with the comment you're replying.
In fact, your comment is not engaging with anything at all, you're vaguely gesturing towards potentitial arguments without making them. If you find discussing this exhausting then don't but also don't flood the comments with low effort whining.
It's cool to see Noah Wyle getting his due these days (The Pitt).
[0]: https://www.anthropic.com/news/detecting-and-preventing-dist... [1]: https://news.ycombinator.com/item?id=46578701
Unfortunately (for the publishers, at least) it didn't work to stop Anthropic and Anthropic's attempts to prevent others will not work either; there has been much distillation already.
The problem of letting humans read your work but not bots is just impossible to solve perfectly. The more you restrict bots, the more you end up restricting humans, and those humans will go use a competitor when they become pissed off.
Tech people are funny, with these takes that businesses do/should adhere to absolute platonic ideals and follow them blindly regardless of context.
There is a reason we don't do things. That reason is it makes the world a worse place for everyone. If you are so incredibly out of touch with any semblance of ethics at all; mayhaps you are just a little bit part of the problem.