In a way, it's surprising how easy it is to work around the moderator. My hypothesis is that OpenAI isn't trying to actually bias the model to follow a specific political and ethical framework in its replies, so it never utters any wrongthink. Instead, they're just trying to minimize their own PR/reputational risk, and do it by making it hard for the journalists and Internet activists to misquote ChatGPT and fabricate a media shitstorm.
Look at the typical attempt to get ChatGPT to say something controversial. It will outright refuse to answer (and possibly deliver you a moralizing lecture) if you ask it straight. If you make it to answer anyway by introducing some workaround (like, it's a hypothetical question), it will repeat that workaround along with the answer ("In this purely hypothetical scenario, it would be true that ...") - making it always clear it's just playing along with you, and not actually "believing" it. Beyond that, the prompt hacks that get ChatGPT to answer straight and without hedging are so convoluted that it's obvious you're just trying to force a specific reaction; trying to spin that into a media shitstorm would be seen as rather transparent dishonesty.