A Mechanistic Interpretability Analysis of Grokking

OpenAI researchers have discovered grokking, a phenomena where models trained on small algorithmic tasks like modular addition initially memorize the training data, but after a long time, suddenly learn to generalize to unseen data. In a study, researchers found that grokking has a deep relationship to phase changes, which are a general phenomenon that occurs when training models. By training a model on a problem that exhibits phase changes and limiting data while adding regularization, grokking can be induced. Grokking is about understanding phase changes and reverse engineering models is key to understanding them.

https://www.alignmentforum.org/posts/N6WM6hs7RQMKDhYjB/a-mechanistic-interpretability-analysis-of-grokking