I think this behavior is being somewhat demonstrated in newer models. I've seen GPT-3.5 175B correct itself mid response with, almost literally:
> <answer with flaw here>
> Wait, that's not right, that <reason for flaw>.
> <correct answer here>.
Later models seem to have much more awareness of, or "weight" towards, their own responses, while generating the response.