The author discusses the effectiveness of using LLMs for listwise document ranking, particularly in the context of locating N-day vulnerabilities through patch diffing. They highlight that this technique simplifies complex security engineering tasks by reframing them as document ranking problems. The surprising aspect is the demonstration that GPT-4o mini can efficiently locate specific functions fixing vulnerabilities in large patch differentiations. The controversial aspect is the idea of using general purpose language models for security-related tasks. The author suggests applying document ranking to other offensive security challenges and proposes improvements such as analyzing and verifying the ranked results. They also hint at a potential talk on the success of LLMs similar to the success of fuzzing.
https://noperator.dev/posts/document-ranking-for-complex-problems/