In this blog post, the author provides an end-to-end example of the math involved in a transformer model. The goal is to explain how the model works by simplifying the math and reducing the dimensions of the model. The author recommends reading the “The Illustrated Transformer” blog for a more intuitive explanation of the transformer model. The blog post covers topics such as embedding the text, positional encoding, self-attention, and multi-head attention. The author also provides code to scale up the model. The goal of the encoder in the transformer model is to generate a rich embedding representation of the input text, which will then be passed to the decoder for generating the output text.
https://osanseviero.github.io/hackerllama/blog/posts/random_transformer/