Sarcasm, considering the source of their own training data?
We've been developing DataClaw for this: https://github.com/peteromallet/dataclaw
Though only in particular situations, like when it’s done to them and not when they do it. Cause they have the power and are morally right and know better than you. And if you question this at all, well you’re a threat to American values and a supporter of the Chinese and leading to the break down of Democracy.
This isn’t a type of reasoning argument or manipulation tactic used by the rich throughout history to trick the naive and gullible masses or anything like that. Trust me, I’m rich and I’m morally right. /sarcasm
Yeah, remind me - is it Plato's descendants that people are concerned about here, or is it every single author who had any work in Anna's Archive, any work published online, any work published on github, etc?
I think that people are probably upset about the harm to living people who had their work stolen by Meta and other LLM companies - regardless of license, terms of use, or any other attempted protection.
There are methods like Habitual Reasoning Distillation or Inverted Reasoning Traces [1] that can help.
While there are reasons to hide the intermediate tokens from a IP protection stand point, there is also a need to hide more effective and efficient generating that doesn’t fit the R1 claims of an aha moment that has been debunked, but is a consumer expectation.
While hidden intermediate tokens do increase the difficulty, it is not a from barrier in itself, especially as they are billed, given information about their length.
To be clear, if Anthropic was using totally licensed data, I'd be sympathetic to these claims. But if you're going to pirate the world's creativity you'd better be willing to gimme dat shit for free[0].
[0] As said by Hungry Santa.
But Anthropic at least has openly admitted they try to detect that and interfere
>outrageous copyright infringement
>unethically scrapped data
Hahahahaha