Toy Models of Superposition (2022)

In the pursuit of understanding artificial neural networks, this paper delves into the complex concept of superposition, where models store additional features beyond their dimensions by tolerating interference. Using toy models, the authors discovered that superposition allows for compression while requiring nonlinear filtering. They explore how superposition leads to phase changes, complex geometric structures, and potentially performing computations in certain cases. By demonstrating that superposition occurs in neural networks, the paper suggests that it may impact interpretability research and offers a unique perspective on the structure of linear representations in neural networks.

https://transformer-circuits.pub/2022/toy_model/index.html#motivation