In this article, the authors discuss a phenomenon called “grokking,” where models suddenly transition from memorizing their training data to generalizing on unseen inputs. They examine the training dynamics of a small model and reverse engineer the solution it finds, providing insight into mechanistic interpretability. They present experiments on tasks like modular addition and sequences of 1s and 0s to demonstrate how models move from memorization to generalization. The authors note that grokking is contingent on specific model constraints and hyperparameters, and they explore open questions about memorization, generalization, and larger models. They advocate for training simpler models to better understand larger ones.
https://pair.withgoogle.com/explorables/grokking/