A performance analysis of Intel x86-SIMD-sort (AVX-512)

The author conducts a performance analysis of the Intel AVX-512 sort implementation, comparing it to other generic sort implementations, such as std::sort and vqsort. The analysis aims to put the popularized “10~17x” number into perspective and look at how hardware specific manual vectorization with wide AVX-512 SIMD is not the only way to write efficient software. The author also looks at the difficulty of benchmarking, highlighting that synthetic benchmarks may not be representative and exploring the factors that affect performance such as input size, type, and pattern, hardware prediction, and cache effects. The author provides detailed observations and graphs for different types, input sizes, and patterns, including unexpected results such as vqsort’s poor performance on small sizes and surprising results from comparing Linux and Windows machines.


To top