Llama2.c: Inference llama 2 in one file of pure C

llama2.c provides a code that allows users to inference a baby Llama 2 model in pure C. The code trains the Llama 2 LLM architecture in PyTorch, saves the weights to a binary file, and then loads it into a simple C file (_run.c_) for inference. The code currently supports fp32 inference and performs well on various systems. It is worth noting that this is not a production-grade library but rather a weekend project. Additionally, the code offers a model checkpoint for download and provides compile flags for optimizing performance. The author also mentions uploading a bigger checkpoint for more powerful inference. The content includes example output from the model, sample stories, and instructions for using the code. There are mentions of potential further improvements and future work.

https://github.com/karpathy/llama2.c