Turing Complete Transformers: Two Transformers Are More Powerful Than One

In this paper, the authors tackle the computational complexity and generalization issues in transformer models. They announce that transformers are not Turing complete, but propose a new architecture called Find+Replace transformers that is Turing complete. The authors empirically demonstrate that Find+Replace transformers outperform GPT-4 on challenging tasks, making them more effective at generalization. They highlight the potential of compiling arbitrary programs into Find+Replace transformers for interpretability research. By presenting this work, the authors aim to establish a theoretical foundation for multi-transformer architectures and encourage further exploration in this area.

https://openreview.net/forum?id=MGWsPGogLH