Maxtext: A simple, performant and scalable Jax LLM

MaxText is an open-source LLM written in Python/Jax, designed for Google Cloud TPUs and GPUs. It boasts high performance and scalability, simplifying optimization with Jax and XLA compiler. Users are encouraged to experiment with MaxText, and modify it for their needs. Notably, MaxText showcases impressive training results and can scale training to ~51K chips. Additionally, it supports various models like Llama2 and Mistral. MaxText distinguishes itself from similar implementations like MinGPT/NanoGPT and Megatron-LM by being pure Python and utilizing the XLA compiler for performance. Key features include stack trace collection for debugging and ahead-of-time compilation for faster startup times on target hardware. MaxText also allows automatic log uploads to Vertex Tensorboard, emphasizing user-friendliness and adaptability.

https://github.com/google/maxtext