Also, this is where having a patent portfolio helps these big companies.
Thats ~ 4 trillion dollars of companies betting that the law will say anyone may train an AI model on any public data, and anyone may use the output of that AI without compensating owners of the training data.
When 4 trillion dollars is at stake, not only do you put the best lawyers on the case, but you also pay congress to change the law if things aren't heading your way.
I'm pretty sure now that the debate of AI ownership is a foregone conclusion - nobody owns AI outputs.
This would be fantastic imo. A new era of the commons.
>Adobe
I disagree here. Adobe has trained only on public domain and their own stock images. So why would adobe be against training on unlicensed data being an infringement? It would eliminate much of their competition...
Adobe is lying. They are relying on general ignorance about the technology to get away with it.
Adobe has not shown how they train the text encoders in Firefly, or what images were used for the text-based conditioning (i.e. "text to image") part of their image generation model. They are almost certainly using CLIP or T5, which are trained on LAION2b, an image dataset with the very problems they are trying to address, C4 (a text dataset similarly encumbered) and similar.
bUt nO oNe eLsE hAs bRoUgHt tHiS uP. It's so arcane for non-practitioners. Talk about this directly with someone like Astropulse, who monetizes a Stable Diffusion model: no confusion, totally agrees with me. By comparison, I've pinged the Ars Technica journalist who just wrote about this issue: crickets. Posted to the Adobe forum: crickets. E-mailed them on their specific address for this: crickets. I have no idea why something so obvious has slipped by everyone's radar!
but at the same time, they put ToS that you may not train a new LLM using the output of their LLM...
Right now, I think it's more that they don't want US v TeensyStartup to be the case that sets precedent.
By stepping in with these indemnification clauses, they aren't betting $4T that they're sure to win. They're just reserving a (much smaller) open check to protect against losing because of somebody else's lawyers.
They may win, they may lose, but they want to make sure they're the ones who get to fight for it either way.
Adobe have offered indemnification for Firefly: https://techcrunch.com/2023/06/26/adobe-indemnity-clause-des...
"With Firefly, Adobe will also be offering enterprise customers an IP indemnity, which means that Adobe would protect customers from third party IP claims about Firefly-generated outputs."
Here's Microsoft for their Copilot (which I do not think is the same thing as GitHub Copilot): https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot...
"To address this customer concern, Microsoft is announcing our new Copilot Copyright Commitment. As customers ask whether they can use Microsoft’s Copilot services and the output they generate without worrying about copyright claims, we are providing a straightforward answer: yes, you can, and if you are challenged on copyright grounds, we will assume responsibility for the potential legal risks involved."
And for GitHub Copilot: https://github.com/features/copilot/#faq
"What if I’m accused of copyright infringement based on using a GitHub Copilot suggestion?
GitHub will defend you as provided in the GitHub Copilot Product Specific Terms."
That links to a document which says this:
"If your Agreement provides for the defense of third party claims, that provision will apply to your use of GitHub Copilot, including to the Suggestions you receive. Notwithstanding any other language in your Agreement, any GitHub defense obligations related to your use of GitHub Copilot do not apply if you have not set the Duplicate Detection filtering feature available in GitHub Copilot to its “Block” setting."
I don't understand the "If your Agreement provides for the defense of third party claims" bit though.
The cynical part of me wants to say it shows that they have high confidence they can manipulate the legal system enough to dictate the outcome of any challenges.
the EU in particular is likely to pay less than zero attention to the interests of large US tech companies
the liability they're taking on here could be absolutely gigantic
"Vendor lock your data and workload on OUR platform and we'll shield you forever!"
The fear being that you'll perform experiments outside of their cleverly scoped sandbox where they get to amortize your training data into theirs for free.
Basically "sure, you can bring your toys over to play in our sandbox. You just have to leave them here so we can be sure you don't eclipse our capability or market share without us."
I think this is saying "you have to let our lawyers argue on your behalf. If you fight it with your own shitty lawyers and lose, we won't pay your losses".
Sorry if stupid question.
Companies are way too comfortable boldly lying to their customers. If they really put my interests first, they'd give me their services for free.
In fact we can see first hand how Google search and other Google products have gotten worse because they give them away for free and as a result have to make money by selling their customers eyeballs to advertisers.
If they gave you their services for free, there'd be a limited amount of time until they could no longer give away those services at all (e.g. they run out of money). You can still technically put your own financial gains "first" if they're a necessity to being in a position to put someone else first longer-term.
It's the same reason you give what you can to charity, not just give 100% of your cash every time you have any.
The bad faith is on Google's part-- they use vague slogans that signal generosity and kindness, while their actions are exploitative and borrow from the ethics and TTPs of malware authors. Last time, it was "don't be evil." Now, they're putting users first. How kind of them!
Anyone brave enough to adopt "don't be evil" as a motto deserves scrutiny when they find need to change it. It's a warrant canary whose absence speaks for itself.
"Ok, prove it" is a challenge not enough of today's conspicuous bullshitters are confronted with.
I am not a lawyer but it does not seem just to me that the creators of the training data should receive no compensation. What happened to "data is the new oil" ??
But instead they have betted their whole company on it - ie. ~$1.5 Trillion
That means they're really sure.
I also find it interesting that generative AI for images seems missing? I wonder if this is intentionally selective. Also possible I'm misunderstanding where Imagen etc. lives in the listed products
"Copyright law generally protects the fixation of an idea in a “tangible medium of expression,” not the idea itself, or any processes or principles associated with it." -- https://strebecklaw.com/idea-expression/
By tokenizing the data an AI bypasses the tangible particular expression that can be copyrighted under the Copyright Act, and takes away just the concepts. On generation, those concepts are converted back into tangible human expression that's unlikely to be protected by a copyright.
The indemnification means that Google engineers have convinced Google lawyers that this is in fact the case.
We shall see whether courts agree with this.
The second part here (after "similarly") seems like a big asterisk, no? So Google can just duck out if they don't think you added enough citations? or you didn't ask the AI where every piece of the output is coming from?
https://gizmodo.com/google-says-itll-scrape-everything-you-p...
Also good way to build up the Art230 for AI via precedents