Building an LLM from Scratch: Automatic Differentiation

The author is building a language model from scratch and has created a tech tree outlining the necessary steps. The first step is to learn how to differentiate with a computer, specifically using the backpropagation algorithm. The author then introduces the concept of creating a custom Tensor class for performing operations like addition, subtraction, and multiplication. They demonstrate the usage of these operations and discuss the idea of scalar derivatives. The author then explains how to store information about arguments and derivatives in Tensors, and how to calculate derivatives for nested functions using the chain rule. They introduce the concept of representing functions as graphs to simplify derivative calculations.

https://bclarkson-code.github.io/posts/llm-from-scratch-scalar-autograd/post.html

To top