Branchless UTF-8 Encoding

The author explores the challenge of encoding UTF-8 without branches, discussing a C function initially and moving to a Rust implementation. A clever solution involves counting leading/trailing zeros in the codepoint to determine the number of bytes required. While the first attempt had some branches, a revised version removes all branching for a truly branchless solution. The author admits this approach may not be as optimized as other techniques, such as DFA-based decoding or SIMD. Nonetheless, the code is released under the MIT license, welcoming others to explore and improve it. Special thanks to Nathan and Lorenz for their contributions.

https://cceckman.com/writing/branchless-utf8-encoding/

To top