Kimi K1.5: Scaling Reinforcement Learning with LLMs

Kimi k1.5 is a groundbreaking multi-modal model designed by the Kimi Team that outperforms GPT-4o and Claude Sonnet 3.5 by up to 550% on various benchmarks like AIME, MATH-500, and LiveCodeBench. Utilizing reinforcement learning (RL), Kimi k1.5 demonstrates improved reasoning capabilities across different modalities such as text and vision. Key features include long context scaling up to 128k, improved policy optimization with long-CoT, and a simplistic RL framework that avoids complex techniques. Kimi k1.5’s performance matches OpenAI’s o1 and achieves state-of-the-art reasoning results, offering a promising AI advancement. Testing is available via the Kimi OpenPlatform for interested users.

https://github.com/MoonshotAI/Kimi-k1.5