Five Years of LLM Progress

In this article, the author discusses the evolution of the generative pre-trained transformer (GPT) line of work, focusing on the state-of-the-art (SOTA) models and the differences between them. The author highlights that while there are articles summarizing these papers, there is none that explicitly focuses on the differences between them. The author provides a summary of each model, including GPT, GPT-2, GPT-3, Jurassic-1, Megatron-Turing NLG, Gopher, and Chinchilla. The author also discusses the computational aspects and training techniques used in each model. It’s worth noting that the author finds it surprising that some papers lack detailed information on their engineering and training processes.

https://finbarr.ca/five-years-of-gpt-progress/

To top