I'm not making an argument from grievance about my own code being plagiarized. I actually don't care if my own code is used without even the attribution required by the permissive licenses it's released under; I just want it to be used. I do also write proprietary code, but that's not in the training datasets, as far as I know. But the training datasets do include code under a variety of open-source licenses, both permissive and copyleft, and some of those developers do care how their code is used. We should respect that.
As for our tendency to disrespect the copyrights of art, clearly we've always been in the wrong about this, and we should respect the rights of artists. The fact that we've been in the wrong about this doesn't mean we should redouble the offense by also plagiarizing from other programmers.
And there is evidence that LLMs do plagiarize when generating code. I'll just list the most relevant citations from Baldur Bjarnason's book _The Intelligence Illusion_ (https://illusion.baldurbjarnason.com/), without quoting from that copyrighted work.
https://arxiv.org/abs/2202.07646
https://dl.acm.org/doi/10.1145/3447548.3467198
https://papers.nips.cc/paper/2020/hash/1e14bfe2714193e7af5ab...