They were referring to the fact that everything ChatGPT is built on is other peoples work. Beyond the actual building of the model details, there is nothing that ChatGPT owns. All the content they use to train, all of the art they use to train. Everything is stolen/used without permission. Obviously there is more to it than that, because you published it on the internet. But that's a different topic.
All intellectual property is inherently stolen. Just let it go.
I don't see any world where it matters in the slightest. When it comes to how we deal with currently available training data nothing will change, first because of politics but also because people want the LLMs superpower more than they want to protect IP of a few individuals. And I firmly believe that no human training data that has not been produced and publishes today will play any significant role in future AI development.
We are simply too slow.
I'll bet if someone outside of our IP jurisdiction figured out a way to reliably and thoroughly reverse engineer the most complex commercial software from binaries so people could spit out a working, fully-customized copy of a commercial application from a prompt, and the entirety of the software development market would soon collapse, the tenor of this conversation would be very different.
Maybe the people with the very ethically defensible stance that private property is theft would be totally fine with OpenAI knocking down your home to build their new headquarters without compensating you? Imagine the progress! (hint: they probably wouldn't be ok with it)
None of this stuff exists in a vacuum. None of it.
Artists are already starting to completely paywall their content.
How far do we let AI scraping and incorporation go? Just say "fuck it" until there's nothing left to scrape other than content also made by AI?
>Just say 'fuck it' until there's nothing left to scrape other than content also made by AI?
Sounds good to me! There will always be people making free art, and AI will make this much easier.
The thing that I think people are missing is that AI-generated content CAN be used to improve AI models. There is no requirement that the input data is created without AI.
Furthermore, AI-generated content on the internet is not random; it is curated content. Generally speaking people don't post every image they generate with Stable Diffusion, they only post the best images. If you consider engagement metrics and user feedback (upvotes etc), they can be a valuable and useful part of a training set.
I fear our views on this issue are wholly incompatible.
Strangly enough, it's only interested in promoting permissionless innovation when it stands to profit. It plunders the commons, and gives nothing unencumbered back.
ChatGPT-4 is built on real peoples time.
I am all for training AIs, but at least exhibit some self-consistency in your arguments!
Your argument is a reductio ad absurdum to "everything is made of atoms and no one ones atoms, ergo no one owns anything."
My concerns mostly lie with the fact it's owned largely by $MSFT rather than a more "open source" contributing to society entity. But again that's a much different topic.
It shouldn't have a place, but so long as people require the ownership of their own concepts to gain food and shelter, it has to.
Now that doesn't mean you can't license your work for exclusive use by humans and explicitly forbid AI training data in the license applied to your work, but you'd have to do that when you publish it, not retroactively.