> it would be as if I would copy parts of other propriety code and copy paste it into my own codebase.
It's not copy-pasted; it's compressed in a lossy manner. Even GPT4 has nowhere near enough memory to store the entirety of its training data in a non-lossy compression format. Just likes how humans compress the information we read.
contains several examples of people who were able to look at pages and recite them back. That is actually a much stronger ability than GPT since GPT has presumably looked at them 100 times.
>Just likes how humans compress the information we read.
Humans don’t have the scale machines have and moreover humans aren’t sevices, that argument doesn’t fly.
I really think NYTs data isn’t that important and nor crucial, LLMs could’ve just elided it. However, it’s more about training on copyrighted data in general which is kind of crucial for OpenAi, they trained their LLMs indiscriminately on copyrighted content without any plan to share any profits.
You're kind of proving my comment pretending they are akin to a human brain instead of an evolved form of statistics mixed with code, aka transformer model.
Let alone that it's a centralised model that's being distributed for a fee.