There is a specific stage called Reinforcement Learning from Human Feedback (RLHF) that is supposed to bias the model to be helpful, polite, present opposing views for balance, and not to say things that are out of its training set. We have all seen the canned responses that appear when the filter is triggered.
RLHF is very important if you want your model to act professionally. The first version of Davinci (GPT-3 in 2020) was "free" from such human-preference-training, but very hard to convince to cooperate on any task.
But applying RLHF could also introduce political biases into the model. This depends on the labelling team.