Why is it in the moral axis at all? I imagine identifying and shaping the influence of unwanted emotion vectors would happen as data selection in pretraining or natural feedback loops during the rl phase, same as we shape unwanted output for current models in order to make them practical and helpful
And even if we applied these controls at inference time, I don’t see the difference between doing that and finding the prompting that would accomplish the same steadiness on task, except the latter is more indirect.