Summing ASCII encoded integers on Haswell at almost the speed of memcpy

The author details a high-performing solution for summing 50 million ASCII-encoded integers efficiently, showcasing a novel algorithm. By utilizing SIMD instructions and carefully designed look-up tables, the program runs 320x faster than a naive C++ implementation, although it is over-fit to specific input specs and hardware. The algorithm iterates over 32 byte chunks, tracking decimal place sums efficiently. The post includes detailed source code explanations, prefetching techniques, and creative solutions for optimizing performance, such as compressing shuffle control masks. The author invites feedback and acknowledges contributions from the HighLoad community.

http://blog.mattstuchlik.com/2024/07/12/summing-integers-fast.html

To top