What's your read on what's going to happen with AI?
Will companies be allowed to train on copyrighted works? Seems like we'll fall behind international competition or supercharge monopolies if we don't allow it.
Japan and China permit training on copyrighted works. China goes a step further and allows AI outputs to be copyrighted.
Really interested in what insiders think or know about this.
Companies are allowed to train on copyrighted works - or, to be more precise, there is no prohibition in copyright law on them doing so. On the other hand, there are no particular protections.
The real question is to what extent an AI generator can return its copyrighted training data as a non-de minimis output. In other words, can I get one of the existing copyrighted works by putting in a prompt? This is something that AI developers are trying to avoid, but it's actuallg a pretty tricky problem.
Allowing AI output to be copyrighted is one of the worst ideas in copyright. Thankfully, the US Constitution as interpreted by the courts only allows for copyright to inhere in works of human creativity.
By the way, while I'm very excited about certain "AI" things, I have a very poor opinion of the merit of generative AI — and I mean in theory we well as how it stands today.
Mine would be different than what we have now, it'd be 20 years or artists lifetime, whichever is shorter - then 10 year long renewals are possible after that, but the cost of the renewal would ratchet up with each renewal.
I've also considered using a percentage of revenue for the work - basically a tax on the revenue from that work, as a condition for the right of monopoly on it - which would also ratchet upwards with each renewal.
I'd also consider a use it or lose it strategy for copyright like trademark, meaning if you are not making the work available for purchase/license within the copyright renewal period, for reasonable terms, you lose the ability to renew it.
Mine is mostly designed to deal with orphaned works, ensuring they enter public domain in a predictable way, I think the biggest issue with our existing copyright system isn't enriching Disney - they're still putting those works out there, making them available - its all the works being lost to the sands of time.
I would have to actually write something blog-length about this.
So as much as I want there to be a fair use case here, the artists have a real point. If someone can break the memorization without losing significant validation/test set performance, that might go a long way.
But even then, artists don't want their style copied either, and that's problematic to me in that if a human does it, that's OK, but if an AI does it, it's not? Yes I get the ease of asking an AI to do it vs a 10K+ hours artist, but, well, more or less the same to me on a geological time scale.
In the next year, I'm hoping to Patreon/Kickstart project that offers two major funding tiers. Hitting the lowest tier means it will use AI to create assets, and hitting the higher tier will use humans instead. My response to this brouhaha is to throw the controversy right back at the people creating it in the first place and ask if they're willing to walk their fancy talk on this subject with their wallets.
I’m curious what you mean by this. If I’m not allowed under copyright law to make a personal copy of the latest Pixar movie or watch it without permission or payment, even if I’m not sharing it with anyone else, what under the law allows a company to make a copy and train on it? I thought I understood copyright law to not only prohibit redistribution of copyrighted works without permission, but also to prohibit consumption of copyrighted works without permission? Is that accurate? In that sense, I would have thought copyright law does prohibit companies from training on copyrighted works.
I own hundreds of paperback books. Copyright law does not limit what I can learn from them.
It may be that assembling a corpus for training is illegal, but if so, that would be true even if it was never used for training. The act of training an AI is orthogonal to the collection of the corpus.
You aren't, but in many cases it doesn't matter because, even for USians, you're protected by the fair use exception. And even in some other cases like for educators and archivists where it extends further.
When we get new laws for the training of neural networks, I would expect their spirit to be based on the state of this exception.
What does that mean, in more specific language?
If I create a poster in Photoshop, it is under copyright? What about if I use a smart fill plugin? What about if I use a prompt plugin?
Hypothetically if such a system existed where you passed in a copyright image, and then got a prompt to generate it, would that be sufficient to show some kind infringement?