Implementation of Google’s Griffin Architecture – RNN LLM

RecurrentGemma is a family of open-weights Language Models by Google DeepMind, based on the Griffin architecture, offering fast inference for generating long sequences. The repository provides model implementation and sampling/fine-tuning examples, with Flax being the recommended, optimized implementation. Un-optimized PyTorch is also available. The technical report gives training/evaluation details, while Griffin paper explains the model architecture. Dependencies can be installed using Poetry or pip, with model checkpoints in the Kaggle. Unit tests, example scripts, and Colab tutorials are provided. RecurrentGemma runs on CPU, GPU, or TPU with optimized TPU support. Contributions and bug reports are welcome. Note: This is not an official Google product.

https://github.com/google-deepmind/recurrentgemma