undefined | Better HN

0 pointsfuryofantares2y ago0 comments

> Just learn to recognize and punish plagiarism via RLHF.

I'm not sure how your proposal would actually work. To recognize plagiarism during inference it needs to memorize harder.

Kinda funny if it works though. We'd first train them to copy their training data verbatim, then train them not to.

That is how it works, right? They're trained to copy their training data verbatim because that's the loss function. It's just that they're given so much data that we don't expect this to be possible for most of the training data given the parameter count.

0 comments

2 comments · 2 top-level

joe_the_user2y ago

I don't think you could use RLHF to stop plagerism. RLHF can be used to teach what "angry response" is because you look at the text itself for qualities. A plagerized text doesn't have any special qualities aside from "existing already", which you can only determine by looking at the world.

One thing you might do is use a full-text search database of the entire training data. If part of ChatGPT response is directly copied, give it the assignment of "please paraphrase this" and substitute the paraphrase into the response. This might slow ChatGPT down a lot - but it might not, I think an LLM is actually more computationally expensive than a full-text search by a lot.

1 more reply

matusp2y ago

I wouldn't say it is an unexpected behavior. I remember reading papers about this memorization behavior few years ago (e.g., [1] is from 2019 and I believe it is not the first paper about this). It should be expected from OpenAI to know that LMs can exhibit memorizing behavior even after seeing the sample only once.

[1] https://bair.berkeley.edu/blog/2019/08/13/memorization/

1 more reply

j / k navigate · click thread line to collapse