In this paper, the authors tackle the computational complexity and generalization issues in transformer models. They announce that transformers are not Turing complete, but propose a new architecture called Find+Replace transformers that is Turing complete. The authors empirically demonstrate that Find+Replace transformers outperform GPT-4 on challenging tasks, making them more effective at generalization. They highlight the potential of compiling arbitrary programs into Find+Replace transformers for interpretability research. By presenting this work, the authors aim to establish a theoretical foundation for multi-transformer architectures and encourage further exploration in this area.
https://openreview.net/forum?id=MGWsPGogLH