Is the content that LLMs produce enough to rise to the level of copyright infringement? Is the fact that a company trained their LLM on your data, with the knowledge it would be used for outputs (=profit), enough that all of their outputs should be considered, to at least a minuscule degree, influenced by your work? How would ChatGPT's "training" differ from, say, another journalist who reads the NYT, and subconsciously uses that to help provide better services?
None of us can answer these questions definitively. The courts hearing these sorts of arguments were a foregone conclusion. I think a lot of the large LLMs (certainly OpenAI competitors) are going to breathe a sigh of relief that this is happening sooner rather than later, so they know where the legal lines are to be drawn.
Though, call me jaded, but I can’t help but doubt that the _actual_ content creators, the writers themselves, will see any of the money should The Times win or settle the case.
Subsequently selling (or extracting compensation for) those works to AI companies is an emergent revenue stream.
I suppose the NYT isn’t legally obligated to share that revenue fairly with the authors, but it’d be awful nice if they did.
Increasingly, the distinction between core model training and fine-tuning might become ambiguous (how ?). Considering this, we might witness a trend where custom 'add-ons' for AI models become commoditized. Imagine simply downloading a "New York Times" pack to enhance your unofficial "pirate" language model.
Any news or speculations on these cases?