This does bring up the issue of actual legal protections, though.
If you are training an LLM on the open web, or things posted for everyone to view for free, than that is OK I guess when it comes to legal ramifications. (Definitely not a lawyer)
When you start using data that you really don't have the rights too...and somehow someone finds out that their protected data is included in the dataset...then what?