A Mechanistic Interpretability Analysis of Grokking

OpenAI researchers have discovered grokking, a phenomena where models trained on small algorithmic tasks like modular addition initially memorize the training data, but after a long time, suddenly learn to generalize to unseen data. In a study, researchers found that grokking has a deep relationship to phase changes, which are a general phenomenon that occurs when training models. By training a model on a problem that exhibits phase changes and limiting data while adding regularization, grokking can be induced. Grokking is about understanding phase changes and reverse engineering models is key to understanding them.


