Very likely RLHF, based only on how strongly aligned open models repeatedly reference a "policy" despite there being none in the system prompt.
I would assume that priming the model to add these tokens ends up with better autocomplete as mentioned above.