Heap-overflowing Llama.cpp to RCE

Patrick Peng delves deep into the exploitation of Llama.cpp’s Heap Maze, highlighting a unique system that deviates from classic ptmalloc exploitations. With Llama.cpp’s RPC components in focus, Peng uncovers security vulnerabilities that were patched by implementing rigorous memory checks at various stages of RPC processing. Despite these mitigations, an interesting heap-overflow vector is discovered in the ggml_backend_cpu_buffer_cpy_tensor method using ggml_nbytes to calculate Tensor size. Despite the layer of security checks, this vector proves to be exploitable. Peng’s detailed write-up provides insights into the intricate memory management system of Llama.cpp, offering a fascinating exploration of memory exploitation in AI projects.

https://retr0.blog/blog/llama-rpc-rce