LLM4Decompile: Decompiling Binary Code with LLM

LLM4Decompile is a groundbreaking open-source project dedicated to decompiling binary code using Large Language Models. The project focuses on re-compilability and re-executability, with a dataset of 4 billion tokens generated from compiling C code samples into assembly code. The evaluation benchmark, Decompile-Eval, assesses the decompiled code’s ability to be recompiled and pass test cases for functionality. Results show the importance of both re-compilability and re-executability in ensuring syntactic integrity and semantic correctness. The models, ranging from 1.3 billion to 33 billion parameters, are available for use, with different training variations offered. Overall, LLM4Decompile is a significant advancement in the field of decompilation.

https://github.com/albertan017/LLM4Decompile