Here it is, the fastest general (and simple) binary search C++ implementation. The function, sb_lower_bound, provides the same interface as std::lower_bound but is twice as fast and shorter. It is called “branchless” because the if statement compiles down to a conditional move instruction rather than a branch/conditional jump. The article explores different compiler options, even faster versions, fully branchless implementations, and caveats. It also highlights a significant update and compares the performance of different versions of the code. The author suggests that sb_lower_bound is a good enough solution but encourages further exploration.
https://mhdm.dev/posts/sb_lower_bound/