The irony is that openAI and Meta themselves might be in flaky ground for having trained models on other people data with dubious rights to do so in many instances, and then using it to produce output commercially.
But this is a new frontier and enforcement might be effectively not possible unless new legislation requires reproducibility and audits on the data sets or something like that.
But without that, how do you know exactly how did they arrive at a given set of weights with Montecarlo algorithms and arbitrary fine tuning? You basically don't know what was there and you cannot prove they didn't achieve those results with perfectly clean data.
PS: https://medium.com/geekculture/list-of-open-sourced-fine-tun...