Ok then explain why RL can’t be used to prevent certain behaviors please.
Why can’t a reward function be used to stop a model from saying something you know you don’t want it to say?
Also you share a screenshot of a chat asking to repeat the above and that’s your proof?
Share the raw link please.