The authors introduce Meta Chain-of-Thought (Meta-CoT), expanding traditional Chain-of-Thought (CoT) by explicitly modeling the reasoning behind a specific chain. The framework showcases behaviors akin to in-context search and explores methods like process supervision and search algorithms to generate Meta-CoT. A training pipeline incorporating instruction tuning and reinforcement learning post-training is outlined. Open research questions include scaling laws, verifier roles, and discovering new reasoning algorithms. This work lays the foundation for more advanced reasoning in artificial intelligence.
https://arxiv.org/abs/2501.04682