But you call it "stealing," others call it "copying."
Stealing takes, from someone, something they own.
As long as discussion of a work that has published is not impeded, the public is not harmed even by these 50-years after life copyrights other than by that they are accumulated by certain companies who themselves become problems.
When someone decides to use someone's work without compensation he is, even though he is not deprived of the work itself, still robbed. But it's not a theft of goods, it's theft of service. The copyright infringer isn't the guy who steals your phone, it's the guy who even you have done some work for but who refuses to pay.
With this view you can also believe, without hypocrisy, that what the LLM firms are doing is wrong while what Schwartz did was not, since the authors in question weren't deprived of any royalties or payments due to them due to due to the publishing model for scientific works.
What service? If somebody washes your windshield without you asking, it isn't a theft of service to not pay them. A theft of service arises from entering into an agreement and then failing to pay as stipulated in that agreement.
Copyright isn't an agreement you can choose whether to participate in. Copyright is a legal enforcement system that imposes legal liability even on those who don't use it. You may not see this legal liability as "harm", but it absolutely is. Arguing that copyright extends to training is arguing for a dramatic increase in the scope and power of this legal enforcement system.
You think of choosing somebody's particular text as the way of contracting him. Just as it isn't a restriction of your freedom of speech that going into restaurant and ordering a meal creates a contract to pay, so it isn't a restriction of your freedom of speech when you choose to seek out and repeat somebody's very particular text.
Why Harry Potter when you have any of hundreds of million of stories of similar sort that you could easily write yourself? When you choose that one, you choose it because it's already been prepared by somebody else, just as you choose restaurant because they've done work and have food ready for you. By choosing the one that's already written you accept that the author has done work for you.
I hadn't made that claim, but I will in now that you've brought it up. Art operates as part of a discussion, the reference to and re-use of prior art is a key part of the how that happens. There are sooo many cases of copyright being used to limit the freedom of expression, that this really isn't disputable. Copyright clearly restricts speech.
> By choosing the one that's already written you accept that the author has done work for you.
No I don't, at least not in a sense that's different from the shoulders of all the people that author learned from and so on. Cultural works exist and take on roles in our cultural semiology, our memes our language without our choice. You can coose to not engage with a work, but you can't choose which works will be culturally relevant or not.
When you publish something, it becomes part of our shared culture and no-one has an inalienable right to own that. The limited rights we granted to encourage commercial creativity have already snowballed out of control and now people are blythly buying into another dramatic expansion of them.