There are two claims I’d like to make:
- LLMs can be used effectively1 for listwise document ranking.
- Some complex problems can (surprisingly) be solved by transforming them into document ranking problems.
I’ve primarily explored both of these claims in the context of using patch diffing to locate N-day vulnerabilities—a sufficiently domain-specific problem that can be solved using general purpose language models as comparators in document ranking algorithms. I demonstrated at RVAsec ‘24 that listwise document ranking can be used to locate the specific function in a patch diff that actually fixes a vulnerability described by a security advisory, and later wrote on the Bishop Fox blog in greater defense of listwise ranking by publishing a command-line tool implementation (raink
)
10 Comments
noperator
A concept that I've been thinking about a lot lately: transforming complex problems into document ranking problems to make them easier to solve. LLMs can assist greatly here, as I demonstrated at inaugural DistrictCon this past weekend.
obblekk
The open source ranking library is really interesting. It's using a type of merge sort where the comparator function is an llm comparing (but doing batches >2 for fewer calls).
Reducing problems to document ranking is effectively a type of test-time search – also very interesting!
I wonder if this approach could be combined with GRPO to create more efficient chain of thought search…
https://github.com/BishopFox/raink?tab=readme-ov-file#descri…
westurner
Ranking (information retrieval) https://en.wikipedia.org/wiki/Ranking_(information_retrieval…
awesome-generative-information-retrieval > Re-ranking: https://github.com/gabriben/awesome-generative-information-r…
rfurmani
Very cool! This is also one of my beliefs in building tools for research, that if you can solve the problem of predicting and ranking the top references for a given idea, then you've learned to understand a lot about problem solving and decomposing problems into their ingredients. I've been pleasantly surprised by how well LLMs can rank relevance, compared to supervised training of a relevancy score. I'll read the linked paper (shameless plug, here it is on my research tools site: https://sugaku.net/oa/W4401043313/)
Everdred2dx
Very interesting application of LLMs. Thanks for sharing!
m3kw9
That title hurts my head to read
mskar
Great article, I’ve had similar findings! LLM based “document-chunk” ranking is a core feature of PaperQA2 (https://github.com/Future-House/paper-qa) and part of why it works so well for scientific Q&A compared to traditional embedding-ranking based RAG systems.
hexator
This furthers an idea I've had recently that we (and the media) are focusing too much on creating value by making more ever more complex LLMs, and instead we are vastly underestimating creative applications of current generation AI.
moralestapia
Minor nitpick,
Should be "document ranking reduces to these hard problems",
I never knew why the convention was like that, it seems backwards to me as well, but that's how it is.
adamkhakhar
I'm curious – why is LLM ranking preferred over cosine similarity from an embedding model (in the context of this specific problem)?