Scientific computing often involves code that can be represented as wide, tree-like expressions. Vectorizing this code can improve performance by running multiple branches of the tree in parallel. To facilitate this process, the author created Vexpr, which converts readable expressions into vectorized code at runtime. The author conducted a series of experiments using Gaussian Process kernels to evaluate the impact of vectorization. The results showed that vectorization led to significant speed-ups in some cases, but not all. The CPU usage and GPU workload were also analyzed to provide a comprehensive understanding of the performance impact. The author concludes that while vectorization is generally beneficial, there are nuances and trade-offs that need to be considered.
https://probablymarcus.com/blocks/2023/10/19/vectorizing-wide-pytorch-expressions.html