Paving the way to efficient architectures: StripedHyena-7B

Together Research is focused on developing new architectures for efficient sequence modeling. They have recently introduced the StripedHyena models, including StripedHyena-Hessian-7B (SH 7B) and StripedHyena-Nous-7B (SH-N 7B). These models combine attention and gated convolutions to achieve competitive performance with the best open-source Transformers. StripedHyena is faster and more memory efficient for training and generation, outperforming Transformers on long-context summarization tasks. The architecture design of StripedHyena is a result of extensive research on scaling laws and hybridization. The models also demonstrate improvements in scaling, fine-tuning, and faster inference. StripedHyena-Nous-7B is a chat model specifically designed for fine-tuning on long-context tasks. The researchers behind StripedHyena plan to continue pushing the boundaries of architecture design and explore larger models with longer context and multi-modal support. They acknowledge their collaborations with academic institutions and AI companies in making this work possible.

https://www.together.ai/blog/stripedhyena-7b

To top