Preventing certain behaviors does not mean you can make a model never output something. RL simply just doesn't work that way. In this instance, You are rating certain responses better and asking the model to predict like that. You can make it more likely to refuse a request but the idea that you can guarantee it won't is completely wrong. There is nothing open ai can do to make GPT-4 never do something. Nothing.
https://chat.openai.com/share/b7faf20c-b295-4d76-85a1-a15e04...