> To be fair, this is basically what they're doing if you hit their APIs, since it's up to you whether or not to use their moderation endpoint.
The model is neutered whether you hit the moderation endpoint or not. I made a text adventure game and it wouldn't let you attack enemies or steal, instead it was giving you a lecture on why you shouldn't do that.
It sounds like your prompt needs work then. Not in a “jailbreak” way, just in a prompt engineering way. The APIs definitely let you do much worse than attacking or stealing hypothetical enemies in a video game.
I tried evading a lecture about ethics by having it write the topic as a play instead, so it wrote it and then inserted a Greek chorus proclaiming the topic was problematic.