> Really the conversation should be reframed to be something along the lines of “is it even ethical for these companies to offer their LLM as anything other than open source”? The answer, if you look into what they do, is “probably not”.
Agreed, but I'm personally of the opinion that this is true for all intellectual endeavors. Intellectual property is the great sin of our generation, and I hope we eventually learn better.
And I think you've hit at the heart of the matter: the push for open source training data has never been about the definition of open source, it's always been a cover for complaints about where the data was sourced from. Which is also why we won't see it any time soon—not until the lawsuits wind their way through the courts, and even then only if the results are favorable towards training.