In this study, three new attention mechanisms are introduced to enhance the performance of Transformer models. The first, Optimised Attention, reduces parameters and matrix multiplications while maintaining performance. Efficient Attention further reduces parameters and matrix multiplications, resulting in faster processing. Lastly, Super Attention significantly outperforms standard attention in both vision and natural language tasks while using fewer parameters. The study provides rigorous mathematical comparisons and evaluates the mechanisms on various datasets. This groundbreaking research showcases the potential for improving efficiency and learning capabilities in Transformer models.
https://arxiv.org/abs/2403.01643