Llama 3 implemented in pure NumPy

The Llama 3 model, recently unveiled at Meta, is causing a stir with its impressive scale and performance metrics. Despite some similarities to its predecessor, Llama 2, such as the use of GQA, the model still boasts significant enhancements. One unique feature is its method of normalizing activation values using RMSNorm, rather than traditional methods like Mini Batch. The model’s structure remains unchanged, and its implementation with only NumPy allows for a more intuitive understanding. The innovative RoPE embedding technique and Grouped-Query Attention (GQA) further set Llama 3 apart, ensuring optimal performance. Detailed design parameters and calculations ensure a comprehensive overview of this cutting-edge model.

https://docs.likejazz.com/llama3.np/