Hardware Acceleration of LLMs: A comprehensive survey and comparison

In this paper, the authors explore the acceleration of transformer networks for Large Language Models using hardware accelerators. They provide a thorough survey of research efforts in this area, comparing frameworks based on technology, processing platform, speedup, energy efficiency, performance, and more. One unique aspect is the extrapolation of results to the same technology for a fair comparison. The challenge lies in comparing schemes implemented on different process technologies. By implementing part of the models on FPGA chips, they aim to achieve a practical assessment of performance. This paper sheds light on the evolving field of LLMs and hardware acceleration.

https://arxiv.org/abs/2409.03384