Detecting hallucinations in large language models using semantic entropy

The author discusses using semantic entropy as a strategy to overcome confabulation in language models without requiring architectural changes. They introduce the concept of semantic uncertainty to measure the uncertainty in model generations and explain the process of estimating the semantic entropy. The method involves generating output sequences, clustering them by meaning, and calculating the semantic entropy. The author details how semantic uncertainty can detect confabulations in question-answering tasks and improve model accuracy. Various datasets and models are used in experiments, and the results are compared to baselines like embedding regression and the P(True) method.

https://www.nature.com/articles/s41586-024-07421-0