>What if I learned to code based only on your huge repo of GPL code? I'd just be remixing your GPL code at that point, right? Will you brand all of my output as being GPL as well?
This never happens, you will first learn from a book or tutorials.
But your idea is sound, have Microsoft buy books from the authors and train the LLM on those books then have the LLM solve new problems. If is an AI and not a text interpolating tool then it should be able to learn like humans from a few books.