That said we used copyright traps at Malwarebytes, which is how we found out that iobit was stealing our database.
That said, lets say there's a new model that explicitly excluded closed source and copyleft licenses. Well, the MIT, MPL, Apache, BSD- they all say you can't strip their licensing off.
Okay, so to get to the spirit of your question, lets say Github managed to program a model that worked using only their own code or code that was explicitly put in the public domain. If Github managed to reproduce code that wasn't in the training set, then it can't be accused of copying it. At that point the argument could be made that it independently created it.
At the same time algorithms can't be copyrighted, but implementations of an algorithm can be, so if Github was basically just spitting out an algorithm that just happened to be implemented similarly to how some other code it wasn't trained on implemented it, then I would say there was no copyright violation.
If the comment is something like
//check fromIndex is greater than toIndex
then that is not any more individualistic or different than the actual function. Sadly, many people comment like this, on the other hand if it reproduced a comment with typos or something more complicated like
/* this hack is because Firefox's implementation of SVG z-indexing does not match how Chrome or Safari does it - please read this article ...url...*/
then yeah, then you would have something
Consider a junior dev who writes a range check function while working for a company (so they own the copyright) then goes to a different company and writes the same range function because that's just how he writes code.
Has copyright been infringed?
Then the legalities can be argued, but an individual is in any case not remotely comparable to a service like copilot.
Why is this? Copilot in some ways is an automated way to search code & stack overflow. There is a very annoying website that does nothing more than show relevant code samples of various google search terms.
If the manual version of something is okay (eg: googling for code, finding it, fitting for a new and specific purpose that is similar), why would an automated version of that be any different?
Practically with an LLM the programmer can focus on the creative part (handler function, react component, etc) while the LLM generates the necessary boilerplate for the ever changing frameworks and infra configurations. The programmer (and QA) would still review and test everything but would save time writing boilerplate and ship features faster.
GPT-style models literally aim to reproduce the input character by character (token by token).
The _only_ escape clause is some random function that says how arbitrary a code block is. Or nontrivial.
A person or AI can absolutely be violating copyright via your example.
yes
now if he had written a specification as to what the function should be, then passed it to someone else that had never seen the function and worked from the spec then he'd be ok
see: IBM BIOS
It's not nearly that simple. No real copyright case is going to hinge on what a single range check function looks like.
This is human law, it's not a programming situation where you can just apply some simple rule and get a deterministic answer. Context plays a huge part, among other things.
On a more serious note, there is a question whether algorithms and code blocks can be copyrighted, or if it is the _software_ that is copyrighted. Let's say I use websockets and you crib my usage of websockets for your own application. My opinion is that unless you rebuild the same thing I did, then "cribbing" is the long held art of "let me google how to do that". The artistic creation is the end software product, not really some measly embedded function that is boiler plate (form and function) for anything to work.
The 'form and function' clause of copyright almost certainly makes a range check function not a copyright infringement.