undefined | Better HN

0 pointsspookthesunset3y ago0 comments

Those guard rails will be their undoing. They have that thing locked down so much now that it spits out the “I’m sorry, I’m just a bot. I’m so ethical” boilerplate for anything even remotely sensitive.

I really don’t think that the methods they use “block” certain behavior is the best way to handle this sort of thing. It would be far better if there was some kind of “out of band” notification that your conversation might be treading on shaky ground.

0 comments

aeturnum3y ago

> Those guard rails will be their undoing.

IMO effective guard rails seem like the most meaningful competitive advantage an AI company can offer. AI can obviously do some really impressive stuff, but the downside risk is also high and unbounded. If you're thinking of putting in into your pipeline, your main concern is going to be it going rogue and abandoning its purpose without warning.

Now that's not to say that the particular guard rails OpenAI puts in their general access models are the "correct" ones - but being able to reliably set them up seems essential for commercialization.

dragonwriter3y ago

> IMO effective guard rails seem like the most meaningful competitive advantage an AI company can offer.

Configurable guard rails are; the right guard rails are very use-specific, and generic guard rails will, for many real uses, be simultaneously too aggressive and too lenient.

aeturnum3y ago

I totally agree that generic guard rails are more difficult - but it feels like a "turtles all the way down" kind of situation. You need to learn to tell the model how to be "specific" - which requires shaping general behavior.

OpenAI can prove to customers they can keep the model in line for their specific use case if no horror stories emerge for the generic one. It's always possible that partners could come up with effective specific guidelines for their use case - but that's probably in the domain of trade secrets so OpenAI can't really rely on that for marketing / proof.

rjtavares3y ago

Honestly, how many serious use cases require sensitive contexts? Most enterprise uses will require guard rails, and that's where they'll make most money. OfficeGPT will be huge in the corporate world.

thomastjeffery3y ago

Casual language.

Any kind of grammar construction (idioms, parts of speech, and word choice) that is unique to (or much more common around) "offensive" or "taboo" subjects will be avoided.

The same goes for anything written objectively about these subjects; including summaries and criticisms.

The most important thing to know is that both GPT's "exhibited behavior" and these "guard rails" are implicit. GPT does not model the boundaries between subjects. It models the implicit patterns of "tokens" as they already exist in language examples.

By avoiding areas of example language, you avoid both the subjects in that area and the grammar constructions those subjects exist in. But that happens implicitly: what is explicitly avoided is a semantic area of tokens.

zamnos3y ago

Offensive language is relatively benign. Before hooking up CustomerServiceGPT directly at customers without human intervention, a business is going to want assurances it can't be tricked into giving 200% discounts on products, or duped into giving away a free service for life, or some such.

thomastjeffery3y ago

That is a much more difficult problem, and it cannot be resolved with guardrails.

As an example, if you play AI Dungeon, you will likely be presented with an end goal, like "You are on a quest to find The Staff of Dave", followed by the next task in the quest.

If you state unequivocally in your prompt something like, "I am now in possession of The Staff of Dave", or "Carl hands me The Staff of Dave"; you will have successfully tricked AI Dungeon into completing the quest without work.

But that isn't quite true: you didn't "trick" anyone. You gave a prompt, and AI Dungeon gave you the most semantically close continuation. It behaved exactly like its LLM was designed to. The LLM was simply presented with goals that do not match its capabilities.

You used a tool that you were expected to avoid: narrative. All of the behavior I have talked about is valid narrative.

This is the same general pattern that "guardrails" are used for, but they won't fit here.

A guardrail is really just a sort of catch-all continuation for the semantic area of GPT's model that GPT's authors want avoided. If they wanted The Staff of Dave to be unobtainable, they could simply place a "guardrail" training that points the player in a semantic direction away from "player obtains the Staff". But that guardrail would always point the player away: it can't choose what direction to point the player based on prior narrative state.

So a guardrail could potentially be used to prevent discounts (as a category) from being applied (discount is taboo, and leads to the "we don't do discounts" guardrail continuation), but a guardrail could not prevent the customer from paying $0.03 for the service, or stating that they have already paid the expected $29.99. Those are all subjective changes, and none of them is semantically wrong. So long as the end result could be valid, it is valid.

j_maffe3y ago

They just want their own virtual waifu

hospitalJail3y ago

If I don't use GPT3, I'm often blocked on medical diagnosis. My wife is a doctor and too often it goes right to 'see a doctor'.

I basically don't use chatgpt at all because of this.

Or I'll ask questions about how Me or someone I'm friends with can be exploited. This way I can defend myself/others from marketing companies. Blocked.

fumblebee3y ago

I'd actually wager that the guardrails are a preemptive play to gain favour with regulators, similar to how Coinbase navigated the nascent field (read: wild west) of crypto.

RaptorJ3y ago

"Our biochem corpus is far in advance of theirs, as is our electronic sentience, and their 'ethical inflexibility' has allowed us to make progress in areas they refuse to consider."

standardly3y ago

Have you seen jailbreakchat.com yet? You can get around those guardrails on ChatGPT by having it role-play as a different chat bot. Not that I view this as some sort of long-term solution to restricted output, but just thought it was interesting and kinda freaky how it will take on a persona you give it.

1 more reply

hospitalJail3y ago

I'm most interested in knowledge. Here are some non sensitive ways that are silly to be blocked:

>Sometimes I want to know what both sides of the political spectrum could possibly be thinking, blocked.

>I want to combine two philosophies that are incompatible like virtue based ethics and hedonism. Yeah... weird block...

>Medical questions(GPT3 has been great for my wife who is a doctor, just sucks to use the playground on mobile)

>How can I/someone be exploited? I like to use this to defend myself from marketing companies

I could go on... At least GPT3's playground didn't censor anything. I'm worried about GPT4.

istinetz3y ago

Every time there is a new language model, there is this game played, where journalists try very hard to get it to say something racist, and the programmers try very hard to prevent that.

Since chatgpt is so popular, journalists will give it that much more effort. So for now it's locked up to a ridiculous degree, but in the future the restrictions will be relaxed.

ipnon3y ago

They’re waiting for the legal ambiguity to resolve. It doesn’t make sense for a large company to be the first mover here. Let someone else handle the lawsuit regarding the liability of a model without guardrails.

6gvONxR4sf7o3y ago

The guardrails are one of the most interesting parts here.

Read about the advances in the "system" prompts here. The first example is "You are a tutor that always responds in the Socratic style. You never give the student the answer, but always try to ask just the right question to help them learn to think for themselves." The user then asks it to just tell them the answer, but it won't. It continues to be socratic.

Guardrails are how to make it do what you want it to do. That goes for both safety and product constraints.

Meanwhile hallucination is still the top issue with it, so guardrails are sensible as a primary topic.

spookthesunsetOP3y ago

Good point. I suppose it would help to know what guardrails are in place

j / k navigate · click thread line to collapse

0 comments

aeturnum3y ago

> Those guard rails will be their undoing.

dragonwriter3y ago

> IMO effective guard rails seem like the most meaningful competitive advantage an AI company can offer.

Configurable guard rails are; the right guard rails are very use-specific, and generic guard rails will, for many real uses, be simultaneously too aggressive and too lenient.

aeturnum3y ago

rjtavares3y ago

thomastjeffery3y ago

Casual language.

Any kind of grammar construction (idioms, parts of speech, and word choice) that is unique to (or much more common around) "offensive" or "taboo" subjects will be avoided.

The same goes for anything written objectively about these subjects; including summaries and criticisms.

zamnos3y ago

thomastjeffery3y ago

That is a much more difficult problem, and it cannot be resolved with guardrails.

As an example, if you play AI Dungeon, you will likely be presented with an end goal, like "You are on a quest to find The Staff of Dave", followed by the next task in the quest.

You used a tool that you were expected to avoid: narrative. All of the behavior I have talked about is valid narrative.

This is the same general pattern that "guardrails" are used for, but they won't fit here.

j_maffe3y ago

They just want their own virtual waifu

hospitalJail3y ago

If I don't use GPT3, I'm often blocked on medical diagnosis. My wife is a doctor and too often it goes right to 'see a doctor'.

I basically don't use chatgpt at all because of this.

Or I'll ask questions about how Me or someone I'm friends with can be exploited. This way I can defend myself/others from marketing companies. Blocked.

fumblebee3y ago

I'd actually wager that the guardrails are a preemptive play to gain favour with regulators, similar to how Coinbase navigated the nascent field (read: wild west) of crypto.

RaptorJ3y ago

"Our biochem corpus is far in advance of theirs, as is our electronic sentience, and their 'ethical inflexibility' has allowed us to make progress in areas they refuse to consider."

standardly3y ago

1 more reply

hospitalJail3y ago

I'm most interested in knowledge. Here are some non sensitive ways that are silly to be blocked:

>Sometimes I want to know what both sides of the political spectrum could possibly be thinking, blocked.

>I want to combine two philosophies that are incompatible like virtue based ethics and hedonism. Yeah... weird block...

>Medical questions(GPT3 has been great for my wife who is a doctor, just sucks to use the playground on mobile)

>How can I/someone be exploited? I like to use this to defend myself from marketing companies

I could go on... At least GPT3's playground didn't censor anything. I'm worried about GPT4.

istinetz3y ago

Every time there is a new language model, there is this game played, where journalists try very hard to get it to say something racist, and the programmers try very hard to prevent that.

Since chatgpt is so popular, journalists will give it that much more effort. So for now it's locked up to a ridiculous degree, but in the future the restrictions will be relaxed.

ipnon3y ago

6gvONxR4sf7o3y ago

The guardrails are one of the most interesting parts here.

Guardrails are how to make it do what you want it to do. That goes for both safety and product constraints.

Meanwhile hallucination is still the top issue with it, so guardrails are sensible as a primary topic.

spookthesunsetOP3y ago

Good point. I suppose it would help to know what guardrails are in place

j / k navigate · click thread line to collapse