The obvious test for this would be to have the models produce an article from before and after its cutoff date and see if the output is still verbatim.
It would be a remarkable quirk of statistics that, if given all text on the internet except for the NYTimes back catalogue, a model would produce any NYT article.