LLMs cannot find reasoning errors, but can correct them

In this paper, the authors discuss the process of self-correction in LLMs (Language Model Models) and its impact on output quality and style. They highlight that while self-correction has shown promise in improving LLM outputs, recent attempts to correct logical or reasoning errors have resulted in worse performances overall. To further understand this issue, the authors introduce a dataset called BIG-Bench Mistake, which consists of logical mistakes in Chain-of-Thought reasoning traces. They evaluate several state-of-the-art LLMs on this dataset and find that they generally struggle with finding logical mistakes. The authors also propose a backtracking method as a lightweight alternative to reinforcement learning methods for output correction, which shows significant improvements when given information on mistake location.

https://arxiv.org/abs/2311.08516