undefined | Better HN

0 pointsdragonwriter1y ago0 comments

> That said, OP is referring to whether the resulting model is able to produce verbatim copies of the data.

While a tool being used to create infringing copies of some other work (whether or not it is the source material used to create the tool, and whether or not the infringing material is also verbatim copies) is relevant to whether the tool vendor is liable for contributory infringement for the infringing use of the tool, the absence of a capacity for creating such copies isn't usually enough to say that copying to make the tool isn't infringing.

(That said, generative AI tools, including LLMs specifically, have been shown to have the capacity to make such copies, to the extent that vendors of hosted models are now putting additional checks on output to try to mitigate the frequency with which verbatim copies of substantial portions of training-set works are produced, so arguing that LLMs can't do that is silly.)

0 comments

arh681y ago

> LLMs specifically, have been shown to have the capacity to make such copies

Exactly. I asked my Gemma how long of a quote it could give me of a given book, if I were the author & gave express permission, and I was a bit surprised it readily admitted it could

> Without Permission (Current Limit): Single sentence.

> With Broad Permission (Full Reproduction Allowed): I could theoretically quote the entire book.

Eye-opening (for me, at least).

j / k navigate · click thread line to collapse