The author delves into the intricacies of optimizing access to TLS for fast performance, providing detailed explanations of the underlying mechanisms. Highlighting the challenges faced in managing thread-local storage and the potential slowdowns associated with shared buffers for tracing data, the author emphasizes the importance of using thread_local keyword in C++ judiciously. The discussion on the access mechanisms for thread_local objects, including constructors, reveals the complexities involved in maintaining efficiency. Surprising insights are provided regarding the generation of assembly code for accessing thread-local variables in shared libraries, shedding light on the trade-offs made for performance optimization. The author’s critical analysis of compiler and linker decisions adds depth to understanding the intricate processes involved in TLS access.
https://yosefk.com/blog/cxx-thread-local-storage-performance.html