Can LLMs accurately evaluate their own confidence? (opens in new tab)

(github.com)

2 pointsanerli1y ago2 comments

2 comments

2 comments · 1 top-level

anerliOP1y ago· 1 in thread

I ran a simple experiment to try and understand whether self-rated answer confidence reflects the actual probability of the LLM generating that answer.

I've always been skeptical of prompting techniques that ask the LLM to output a score or a confidence level numerically. The results from this experiment suggest that LLMs tend to understate their own confidence, and that "self-rated" scores prompted from LLMs may be generated more based on what the LLM thinks is a "safe" answer rather than an accurate representation.

The reason I'm curious about this area is because the startup I'm building does AI-powered E2E testing, and I'd like to more objectively figure out when a decision made by the agent is low-confidence so that it can be re-assessed.

hassleblad231y ago

I have noticed that asking an LLM to output a confidence score and the reason for assigning the confidence score, works really well. These are tangential to the actual task, but still improve the quality.

I wouldn't depend on the numerical value of the confidence score itself though. There is no way for the LLM to caliberate its confidence score wrt. multiple invocations on different data. I have found this metric to be mostly useless.

It works fine as a proxy to induce some thinking though.

j / k navigate · click thread line to collapse