The law is slow and is always playing catch up in terms of prosecution, it’s not clear today because this kind of copyright has never been an issue before. Usually it’s just outright stealing content that was protected, no one ever imagined “training” to be a protected use case, humans “train” on copyrighted works all the time, ideally copyrighted works they purchased for said purpose… the same will start to apply for AI, you have to have rights to the data for that purpose, hence these deals getting made. In the meantime it’s ask for forgiveness not permission, and companies like Google (less openAI) are ready to go with data governance that lets them remove copyright requested data and keep the rest of the model working fine
Let’s also be clear that making deals with Reddit isn’t stealing from creators, it’s not a platform where you own what you type in, same on here this is all public domain with no assumed rights to the text. If you write a book and openAI trains on it and starts telling it to kids at bed time, you 100% will have a legal claim in the future, but the companies already have protections in place to prevent exactly that. For example if you own your website you can request the data not be crawled, but ultimately if your text is publicly available anyone is allowed to read it, and the question it is anyone allowed to train AI on it is an open question that companies are trying to get ahead on.