Or rather the quality of the training data?
Calling these models open source is like calling a binary open source because you can download it.
Which in this day and age isn't far from where were at.
Mistral models are one example, they never released pre training data and there are many fine tunes.
Is anyone else just assuming at this point that virtually everyone is using the pirated materials in The Pile like Books3?
Are you sure or is it the literal opposite and you’re just speculating?