Python is not considered slow when it comes to dispatching bytecode, despite the slower core eval loop compared to a JIT. The major bottleneck is the Global Interpreter Lock (GIL), which forces serialization on parallel operations by extensions. Another issue is the expensive process of building PyObjects for blessed types like strings and dictionaries. One solution is to abandon blessed types and create custom types, but this comes with interoperability challenges. The unconventional solution is GIL Balm’ing, which involves adapting a native CPython type to be used outside the GIL context. This technique can improve performance, but it has limitations and is not suitable for all situations. GIL Balm’ing is often used in low-latency libraries and services to minimize latency.
https://blog.vito.nyc/posts/gil-balm/