LLaMA Now Goes Faster on CPUs

On Mar 31st, 2024, a talented developer unveiled 84 innovative matrix multiplication kernels for llamafile on their webpage. These kernels help enhance image reading and prompt evaluation speed by 30% to 500% when using F16 and Q8_0 weights on CPU. The improvements are particularly impressive on ARMv8.2+, Intel, and AVX512 computers. Llamafile, the brainchild of a Mozilla collaboration, aims to elevate the local LLM project experience and cater to a broader audience. Surprisingly, the developer’s optimized kernels outpace the established MKL by 2x, showcasing promising potential for future advancements in local language models.

https://justine.lol/matmul/