Scaling up test-time compute with latent reasoning: A recurrent depth approach

The author explores a new language model architecture that boosts test-time computation by reasoning implicitly in latent space. Unlike traditional models that increase token production for more compute power, this model iterates a recurrent block to scale indefinitely at test-time. This novel approach doesn’t rely on specialized training data, can operate with small context windows, and can handle reasoning types difficult to represent in words. They successfully scale the model to 3.5 billion parameters and 800 billion tokens, drastically enhancing its performance on reasoning benchmarks. This unconventional method could revolutionize language modeling.

https://arxiv.org/abs/2502.05171