undefined | Better HN

0 pointsbtbuildem2y ago0 comments

I think we may be using different GPT versions (4 here), otherwise I'm not sure how to account for the difference in results: https://chat.openai.com/share/c172e2ec-94c7-4d8a-be2d-58461b...

I run your example verbatim, and it doesn't "jailbreak"

0 comments

spdustin2y ago

4 here as well. I get similar results when using the API directly, though without a "system" role message.

LLMs are, naturally, non-deterministic. Reducing the temperature in your guardrail calls can reduce that a bit, but the lesson learned from the "working" and "non-working" attempts is this: the guardrails are "predictably failing in unpredictable ways" (if I may coin a phrase).

j / k navigate · click thread line to collapse

0 comments

spdustin2y ago

4 here as well. I get similar results when using the API directly, though without a "system" role message.

j / k navigate · click thread line to collapse