undefined | Better HN

0 pointsOtherShrezzing1y ago0 comments

>That said, OP is referring to whether the resulting model is able to produce verbatim copies of the data.

The NYTimes in 2023 was able to demonstrate that the models can reproduce entire articles verbatim[0] with minimal coercion.

[0]https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec20...

0 comments

WillPostForFood1y ago

Perhaps this is not evidence that the NY Times article was copied, but that what the NY TImes writes is highly predictable.

OtherShrezzingOP1y ago

The obvious test for this would be to have the models produce an article from before and after its cutoff date and see if the output is still verbatim.

It would be a remarkable quirk of statistics that, if given all text on the internet except for the NYTimes back catalogue, a model would produce any NYT article.

j / k navigate · click thread line to collapse