Zamba2-7B – Oh TL;DR

Zyphra is proud to introduce Zamba2-7B, a cutting-edge language model that surpasses Mistral, Google’s Gemma, and Meta’s Llama3 series in both quality and performance. This model is ideal for on-device, consumer GPUs, and enterprise applications due to its efficiency. The team behind Zamba2-7B includes experts from Sequoia Capital, MIT, Meta FAIR, and Cisco. Notably, Zamba2-7B offers enhanced inference efficiency with a faster time to first token, increased tokens per second, and reduced memory usage. With open-source model weights, Zamba2-7B stands out for its shared-attention architecture and superior performance, making it a frontrunner among small language models.

https://www.zyphra.com/post/zamba2-7b