JetMoE: Reaching LLaMA2 performance with 0.1M dollars

JetMoE-8B, created by Yikang Shen, Zhen Guo, Tianle Cai, and Zengyi Qin, surpasses Meta AI’s LLaMA2-7B despite costing less than $0.1 million to train, revealing that LLM training can be more affordable than believed. This model, with only 2.2B active parameters during inference, achieves superior performance to Gemma-2B. The unique architecture includes 24 blocks, each with two MoE layers. Surprisingly, JetMoE-8B outperforms models with similar computational costs like LLaMA-13B and DeepseekMoE-16B. This open-sourced, academia-friendly model can be fine-tuned with consumer-grade GPUs, making it accessible for many labs. Collaboration inquiries can be directed to Zengyi Qin.

https://research.myshell.ai/jetmoe

To top