Reproducing GPT-2 in llm.c

For $20, you can reproduce the GPT-2 (124M) model in llm.c in 90 minutes, even with a single GPU. llm.c efficiently utilizes model flops, and training on a single 8X A100 80GB SXM node costs around $20. There are ongoing optimizations for tuning training, which can lead to significant improvements. The model trained on 10 billion tokens of FineWeb outperformed the OpenAI FineWeb validation dataset, although the distribution difference may affect fairness. The HellaSwag accuracy was used as a benchmark, showcasing improvements over GPT-2 and GPT-3 models, despite training on fewer tokens. The process for reproducing the results is detailed, with potential for visualizations and sampling for text generation.

https://github.com/karpathy/llm.c/discussions/481