What does this mean? If I have a blind, deaf, paralyzed person, who could only communicate through text, what would the signs be that they were self aware?
Is this more of a feedback loop problem? If I let the LLM run in a loop, and tell it it's talking to itself, would that be approaching "self aware"?
(And by limitations I don’t mean “sorry, I’m not allowed to help you with this dangerous/contentious topic”.)
I think this behavior is being somewhat demonstrated in newer models. I've seen GPT-3.5 175B correct itself mid response with, almost literally:
> <answer with flaw here>
> Wait, that's not right, that <reason for flaw>.
> <correct answer here>.
Later models seem to have much more awareness of, or "weight" towards, their own responses, while generating the response.
You won't get an LLM outputting "wait, that's not right" halfway through their original output (unless you prompted them in a way that would trigger such a speech pattern), because no re-evaluation is taking place without further input.
No, that's one contiguous response from the LLM. I have screenshots, because I was so surprised the first time. I've had it happen many times. This was (as I always use LLM) direct API calls. In the first case it happened, it was with largest Llama 3.5. It usually only happens one shot, no context, base/empty system prompt.
> LLMs don't exhibit such an inner feedback loop
That's not true, at all. Next token prediction is based on all previous text, including the previous word that was just produced. It uses what it has said for what it will say next, within the same response, just as a markov chain would.
Then there are those who are simply narcissistic, and cannot and will not admit fault regardless of the evidence presented them.