On Early Detection of Hallucinations in Factual Question Answering (opens in new tab)

(arxiv.org)

2 pointskig2y ago2 comments

2 comments

kigOP2y ago

The researchers found that certain artifacts associated with LLM model generations could potentially indicate whether or not a model is hallucinating. Their results showed that the distributions of these artifacts were different between hallucinated and non-hallucinated generations. Using these artifacts, they trained binary classifiers to classify model generations into hallucinations and non-hallucinations. They also discovered that tokens preceding a hallucination can predict the subsequent hallucination before it occurs.

jruohonen2y ago

I didn't read the paper, but it seems they're trying to fix a ML model by a ML model. I am not sure whether that's a good idea, but I digress. Besides, how do they know what is a hallucination and what is a non-hallucination (cf. a similar debate on disinformation)?

j / k navigate · click thread line to collapse

2 comments

kigOP2y ago

jruohonen2y ago

j / k navigate · click thread line to collapse