Sure, but I’m curious if it would serve to provide
some self-regulation.
E.g., all of this “thinking” trend that’s happening. It would be interesting if the model does a first pass, scored its individual outputs, then reviews its scores and censors/flags scores that are low.
I know it’s all “made up”, but generally I have a lot of success asking the model to give 0-1 ratings on confidence for its answers, especially for new niche questions that are likely out of the training set.