undefined | Better HN

0 pointsfamouswaffles2y ago0 comments

>Ok then explain why RL can’t be used to prevent certain behaviors please.

Preventing certain behaviors does not mean you can make a model never output something. RL simply just doesn't work that way. In this instance, You are rating certain responses better and asking the model to predict like that. You can make it more likely to refuse a request but the idea that you can guarantee it won't is completely wrong. There is nothing open ai can do to make GPT-4 never do something. Nothing.

https://chat.openai.com/share/b7faf20c-b295-4d76-85a1-a15e04...

0 comments

6 comments · 2 top-level

RC_ITR2y ago· 2 in thread

Again, we are discussing “a common pre-prompt” that you say has probability 1 of showing the system prompt…

You are saying there’s some feature of this model that deterministically returns the system prompt and then you pivot to saying that RL could never prevent something from happening.

I am saying it’s very easy to use RL to get a model to return a convincing but wrong answer about a system prompt.

Then end.

famouswafflesOP2y ago

You were wrong. Just admit it and go on with your day.

This is what you said.

>I am prone to believe that OpenAI, and organization who’s lead is centered on RL more than anything else, is quite good at getting it’s models not to spit out competitively sensitive information

I specifically replied it is not possible to prevent a model from spitting this information out. I didn't pivot to anything.

>I am saying it’s very easy to use RL to get a model to return a convincing but wrong answer about a system prompt.

No it's not.

RC_ITR2y ago

This comment is not in the spirit of Hacker News.

I was trying to co-learn by discussing with you and you turned it into something very ugly.

Please do that literally anywhere else on the Internet.

We clearly disagree, but I know have no idea how to move the conversation forward, which is a shame, because maybe you do have something to teach me, though I have no way of knowing at this point.

geraneum2y ago· 2 in thread

> You can make it more likely to refuse a request but the idea that you can guarantee it won't is completely wrong.

Why is that the case, technically?

Jensson2y ago

Because it is a black box, they don't know enough about it to ensure it never does something. Only way to be sure is to write some script using normal code that filters the questions and outputs, but then you have the standard natural language problem which only works for very simple cases.

geraneum2y ago

> to ensure it never does something

But there are many systems for which you cannot predict/control the behavior with just a few experiments because they are simply, probabilistic. Isn’t it also the case with LLMs? If not, why?

j / k navigate · click thread line to collapse