> "data implies factual information"
They used the word DATA, not content, DATA...
The argument that is going to be made, that your copy right work stands. That the model doesn't care about your document it cares that "the" was used N number of times and its relationships to other words. That information isnt your work, and it is factual. That "data" only has value is when it's weighted against all the "data" put into the system, again not your work at all. (We would say thats information derived, but it will be argued that it is transformed).
> You can not copyright factual information
https://www.techdirt.com/2007/11/27/yet-again-court-tells-ml...
The MLB has been trying to copyright baseball stats forever. The court keeps saying "you cant copyright facts".