For example, if we go the human nature route, then any comment - no matter how inflammatory - is fair game as long as the goal of the comment is to "help the tribe" in some sense. That's kind of like political speech. So, you could faithfully argue ideas like "nazis are good, and we should be like them", as long as you aren't using toxic language and attacks on other people while doing it.
EDIT: Actually fascist/communist ideas contain beliefs that are anti-social, so wouldn't be considered pro-social according to human nature, and that's where the conversation would stop. It still stands that heterodox ideas that don't work, or people don't like, can be technically pro-social.
If we go the cultural route, then identify the mainstream beliefs and rules of discussion and enforce them. This is like the "no swearing" rule, or "no nazis" rule.
You could just codify a set of rules for content that the AI adheres to.