In this study, the authors explore the concept of Chain of Thought (CoT) as a tool to enhance the performance of large language models on arithmetic and symbolic reasoning tasks. They delve into the theoretical aspects of how CoT empowers decoder-only transformers to perform serial computations that are typically challenging for transformers with low depth. The research shows that with a certain number of CoT steps, transformers can solve complex problems previously limited to boolean circuits. Empirically, CoT significantly boosts accuracy in tasks that are difficult for parallel computation, showcasing its potential in improving model performance.
https://arxiv.org/abs/2402.12875