>
the overwhelming majority of input it has in-fact seen somewhere in the corpus it was trained on.But it thinks just great on stuff it wasn't trained on.
I give it code I wrote that is not in its training data, using new concepts I've come up with in an academic paper I'm writing, and ask it to extend the code in a certain way in accordance with those concepts, and it does a great job.
This isn't regurgitation. Even if a lot of LLM usage is, the whole point is that it does fantastically with stuff that is brand new too. It's genuinely creating new, valuable stuff it's never seen before. Assembling it in ways that require thinking.