I made a transformer by hand (no training)

In this web content, the author aims to explain and understand transformers and attention models better by manually creating a transformer to predict a simple sequence. The content provides a basic outline of the process, starting with picking a task and designing model dimensions and embedding weights. The author also explains matrix math and provides a refresher for readers who may need it. The content goes into detail about the design of the transformer block and the attention head. Overall, the author hopes that by going through this process, readers will gain a better understanding of transformers.

https://vgel.me/posts/handmade-transformer/

To top