> Copyright holders worry about how to exercise control over the use of "their" creative material for training models; but that begs the question of whether copyright holders ever had, or should have, a right to any such control. If a human can read a book and learn from it, and then write their own books, why shouldn't a computer?
There’s a small amount of irony in an article that’s discussing copyright, and the invisible but critical context of information, then dismissing the context of copying when it comes to copyright, as well as confusing what copyright protects. I’m certain the author knows that copyright does not protect ideas, it does not protect “colour”, it deliberately only protects the “bits”. In US copyright law this is called the “fixation” of a work. The Berne Convention uses similar terminology: “works shall not be protected unless they have been fixed in some material form.”
AI’s “learning” has a different colour than human learning. This has been debated at length on HN and elsewhere, and in the courts, but it’s definitely wildly misleading to compare ChatGPT training on all books ever written and then being distributed (for a profit) to everyone, to one human reading one book and learning something from it.
IP courts will have some truly novel questions before them this century.
The flip side is that this is why the article’s discussion about randomness and monkeys on typewriters is irrelevant to copyright law. It’s a copyright violation to produce the same “fixation” no matter how you do it. If you generated a random sequence of characters, and it happened to match a NYT best selling book, you violate the book author’s copyrights, and claiming it was random isn’t a viable defense. Intent to copy can make it worse, but lack of intent does not absolve. There is precedent for people coming up independently with the same songs and one being successfully sued.
Do note that there are other laws that might cover plagiarism of ideas, trademarks, code, etc., copyright isn’t the only consideration, but copyright seems to be often misunderstood. We definitely have some novel questions because of the scale of AI’s copying, the nature of training and the provenance of the training data, and because of AI’s growing ability to skirt copyright law while actually copying.