DeepSeek: Advancing theorem proving in LLMs through large-scale synthetic data

Proof assistants like Lean have transformed mathematical proof verification, ensuring precise results. However, the advancement of large language models (LLMs) in formal theorem proving is limited by insufficient training data. To tackle this, a new approach generates a vast amount of Lean 4 proof data from high-school and undergraduate math competition problems. By translating natural language problems into formal statements and fine-tuning the DeepSeekMath 7B model on this synthetic dataset, impressive whole-proof generation accuracies of 46.3% and 52% on the Lean 4 miniF2F test were achieved, outperforming the baseline GPT-4. Surprisingly, the model successfully proved 5 out of 148 problems in the Lean 4 FIMO benchmark, whereas GPT-4 failed completely. This groundbreaking research showcases the potential of using large-scale synthetic data to enhance LLMs’ theorem-proving abilities, promising further advancements in this field.

https://arxiv.org/abs/2405.14333