Addition Is All You Need for Energy-Efficient Language Models

In this research, it is discovered that a floating point multiplier can be approximated by an integer adder with high precision, resulting in the development of the linear-complexity multiplication L-Mul algorithm. This new algorithm significantly reduces computation resources compared to traditional methods while achieving higher precision. By implementing L-Mul in tensor processing hardware, energy costs can be reduced by up to 95% for element-wise floating point tensor multiplications and 80% for dot products. Experimental results show that L-Mul with a 4-bit mantissa performs comparably to float8_e4m3 multiplications, and with a 3-bit mantissa, outperforms float8_e5m2. Surprisingly, using L-Mul in a transformer model can achieve equivalent precision as traditional methods in both fine-tuning and inference stages.

https://arxiv.org/abs/2410.00907

To top