Skip to content Skip to footer

0 items - $0.00 0

Hard problems that reduce to document ranking by noperator

10CommentsShare PostShare on Facebook Share on XShare by EmailSend Link

Vídeo

Hard problems that reduce to document ranking by noperator

ByHackTech February 25, 2025

10Comments

Share This Article

Sed ut perspiciatis unde.

Send to HN

There are two claims I’d like to make:

LLMs can be used effectively¹ for listwise document ranking.
Some complex problems can (surprisingly) be solved by transforming them into document ranking problems.

I’ve primarily explored both of these claims in the context of using patch diffing to locate N-day vulnerabilities—a sufficiently domain-specific problem that can be solved using general purpose language models as comparators in document ranking algorithms. I demonstrated at RVAsec ‘24 that listwise document ranking can be used to locate the specific function in a patch diff that actually fixes a vulnerability described by a security advisory, and later wrote on the Bishop Fox blog in greater defense of listwise ranking by publishing a command-line tool implementation (raink)

0Likes

Written by

HackTech

View all posts by HackTech

Show comments (10)

10 Comments

Post Author

noperator

Posted February 25, 2025 at 5:37 pm

A concept that I've been thinking about a lot lately: transforming complex problems into document ranking problems to make them easier to solve. LLMs can assist greatly here, as I demonstrated at inaugural DistrictCon this past weekend.

0Likes Log in to Reply
Post Author

obblekk

Posted February 25, 2025 at 5:56 pm

The open source ranking library is really interesting. It's using a type of merge sort where the comparator function is an llm comparing (but doing batches >2 for fewer calls).

Reducing problems to document ranking is effectively a type of test-time search – also very interesting!

I wonder if this approach could be combined with GRPO to create more efficient chain of thought search…

https://github.com/BishopFox/raink?tab=readme-ov-file#descri…

0Likes Log in to Reply
Post Author

westurner

Posted February 25, 2025 at 6:31 pm

Ranking (information retrieval) https://en.wikipedia.org/wiki/Ranking_(information_retrieval…

awesome-generative-information-retrieval > Re-ranking: https://github.com/gabriben/awesome-generative-information-r…

0Likes Log in to Reply
Post Author

rfurmani

Posted February 25, 2025 at 6:36 pm

Very cool! This is also one of my beliefs in building tools for research, that if you can solve the problem of predicting and ranking the top references for a given idea, then you've learned to understand a lot about problem solving and decomposing problems into their ingredients. I've been pleasantly surprised by how well LLMs can rank relevance, compared to supervised training of a relevancy score. I'll read the linked paper (shameless plug, here it is on my research tools site: https://sugaku.net/oa/W4401043313/)

0Likes Log in to Reply
Post Author

Everdred2dx

Posted February 25, 2025 at 6:42 pm

Very interesting application of LLMs. Thanks for sharing!

0Likes Log in to Reply
Post Author

m3kw9

Posted February 25, 2025 at 7:08 pm

That title hurts my head to read

0Likes Log in to Reply
Post Author

mskar

Posted February 25, 2025 at 7:56 pm

Great article, I’ve had similar findings! LLM based “document-chunk” ranking is a core feature of PaperQA2 (https://github.com/Future-House/paper-qa) and part of why it works so well for scientific Q&A compared to traditional embedding-ranking based RAG systems.

0Likes Log in to Reply
Post Author

hexator

Posted February 25, 2025 at 8:03 pm

This furthers an idea I've had recently that we (and the media) are focusing too much on creating value by making more ever more complex LLMs, and instead we are vastly underestimating creative applications of current generation AI.

0Likes Log in to Reply
Post Author

moralestapia

Posted February 25, 2025 at 8:17 pm

Minor nitpick,

Should be "document ranking reduces to these hard problems",

I never knew why the convention was like that, it seems backwards to me as well, but that's how it is.

0Likes Log in to Reply
Post Author

adamkhakhar

Posted February 25, 2025 at 8:21 pm

I'm curious – why is LLM ranking preferred over cosine similarity from an embedding model (in the context of this specific problem)?

0Likes Log in to Reply

Hard problems that reduce to document ranking by noperator

Hard problems that reduce to document ranking by noperator

Share This Article

Newsletter

HackTech

10 Comments

noperator

obblekk

westurner

rfurmani

Everdred2dx

m3kw9

mskar

hexator

moralestapia

adamkhakhar

Leave a comment Cancel reply

Editor's Choice

Hard problems that reduce to document ranking by noperator

Hard problems that reduce to document ranking by noperator

Share This Article

Newsletter

10 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter