I think it is an RLHF problem and that you are right - this will blow up in the faces of the NYT.
Specifically, the NYT examples all seem to be cases where they asked the AI to repeat their articles verbatim? So they ask it to violate copyright and because it's a helpful bot with a good memory, it does so.
Solution: teach the model to refuse requests to repeat articles verbatim. It's easily capable of recognizing when it's being asked to do that. And that's exactly what OpenAI have now done.
So the direct problem the NYT is complaining about - a paywall bypass - is already rectified. Now it would seem to me like the case is quite weak. They could demand OpenAI pay them damages for the time ChatGPT wasn't refusing, but wouldn't they have to prove damages actually happened? It seems unlikely many people used ChatGPT as a paywall bypass for the NYT specifically in the past year. It only knows old articles. OpenAI could be ordered to search their logs for cases where this happened, for example, and then the NYT could be ordered to show their working for the value of displaying a single old article to a non-subscriber, and from that damages could be computed. But it wouldn't be a lot.
That's presumably why the case goes further and argues that OpenAI is in violation even when it isn't repeating text verbatim. That's the only way the NYT can get any significant money out of this situation.
But this case seems much weaker to me. Beyond all the obvious human analogies, there is precedent in the case of search engines where they crawl - and the NYT let them crawl - specifically to enable the creation of a derived data structure. Search engine indexes are understood to be fair use, and they actually do repeat parts of the page verbatim in their snippets. Google once even showed cached versions of whole pages. And browser makers all allow extensions in their stores that strip ads and bypass paywalls, and the NYT hasn't sued them over that either.