One thing you might do is use a full-text search database of the entire training data. If part of ChatGPT response is directly copied, give it the assignment of "please paraphrase this" and substitute the paraphrase into the response. This might slow ChatGPT down a lot - but it might not, I think an LLM is actually more computationally expensive than a full-text search by a lot.