Production open access LLMs do probably need a front-end filter with a fine tuned RAG model that identifies and prevents spitting out copyrighted material. I fully support this.
But we shouldn't be preventing the development of a technology that in 99.99% of usecases isn't doing that and can used for everything from diagnosing medical issues to letting coma patients communicate with an EEG to improving self-driving car algorithms because some random content producer's works were a drop in the ocean of content used to learn relationships between words and concepts.
The edge cases where a model is rarely capable of reproducing training data don't reflect infringement of training but of use. If a writer learns to write well from a source is that infringement? Or is it when they then write exactly what was in the source that it becomes infringement?
Additionally, now that we can use LLMs to read brain scans and have been moving towards biological computing, should we start to consider copying of material to the hippocampus a violation of the DMCA?
No comments yet.