Undergraduate Disproves 40-Year-Old Conjecture, Invents New Kind of Hash Table by robin_reala

Share This Article

Sed ut perspiciatis unde.

In a 1985 paper, the computer scientist Andrew Yao, who would go on to win the A.M. Turing Award, asserted that among hash tables with a specific set of properties, the best way to find an individual element or an empty spot is to just go through potential spots randomly—an approach known as uniform probing. He also stated that, in the worst-case scenario, where you’re searching for the last remaining open spot, you can never do better than x. For 40 years, most computer scientists assumed that Yao’s conjecture was true.

Krapivin was not held back by the conventional wisdom for the simple reason that he was unaware of it. “I did this without knowing about Yao’s conjecture,” he said. His explorations with tiny pointers led to a new kind of hash table—one that did not rely on uniform probing. And for this new hash table, the time required for worst-case queries and insertions is proportional to (log x)²—far faster than x. This result directly contradicted Yao’s conjecture. Farach-Colton and Kuszmaul helped Krapivin show that (log x)² is the optimal, unbeatable bound for the popular class of hash tables Yao had written about.

“This result is beautiful in that it addresses and solves such a classic problem,” said Guy Blelloch of Carnegie Mellon.

“It’s not just that they disproved [Yao’s

Post Author

taylodl

Posted March 17, 2025 at 2:17 pm

> Krapivin was not held back by the conventional wisdom for the simple reason that he was unaware of it. “I did this without knowing about Yao’s conjecture,” he said.

This idea resonates with me, especially in the context of modern physics. The extensive learning required to make advancements often means that one's thinking is heavily influenced by established theories and teachings, potentially stifling true innovation.

0Likes Log in to Reply
Post Author

hullo

Posted March 17, 2025 at 2:22 pm

original discussion: https://news.ycombinator.com/item?id=43002511

0Likes Log in to Reply
Post Author

mNovak

Posted March 17, 2025 at 4:17 pm

I was hoping they'd have a discussion of the algorithm itself. Quanta is usually good about making these sorts of things approachable.

In any case, the full paper is here [1] if you don't want to scroll through the Wired article.

[1] https://arxiv.org/abs/2501.02305

0Likes Log in to Reply
Post Author

vlovich123

Posted March 17, 2025 at 4:56 pm

If my reading of the paper was correct, this kind of hash table is incredibly complicated to resize because resizing would invalidate all previously handed out pointers unless you only do chaining?

The other challenge is that computing N hash functions is quite expensive so in practice I think this ends up slower in real world terms despite the big-O speed up?

Does this actually end up beating open addressed hash tables? It seems more like a neat hypothetical CS result that wouldn’t have any actual real world application? That’s cool in and of itself but I’m curious if there’s any real world reason to do this.

0Likes Log in to Reply
Post Author

diyseguy

Posted March 17, 2025 at 5:28 pm

I wonder if there is a memory consumption tradeoff for this new data structure? Based on a few initial implementations I see in github, looks like it may be significant? Still a nice discovery.

0Likes Log in to Reply
Post Author

herf

Posted March 17, 2025 at 5:35 pm

If it was his discovery, would be nice if they'd give him first author on the paper's author list (Farach-Colton, Krapivin, Kuszmaul). Though I understand if the proofs were not done by him.

0Likes Log in to Reply
Post Author

hinkley

Posted March 17, 2025 at 5:50 pm

Based on the last two convos I saw on this, I feel like there’s a simpler algorithm here waiting to break out.

Underneath this is a sort of b-tree scented data structure that avoids the bimodal distribution on insertion times, which is important for a concurrent hash table.

0Likes Log in to Reply
Post Author

austin-cheney

Posted March 17, 2025 at 6:47 pm

This article strongly reminded me of Heckle Diff, which was first published almost 47 years ago. https://dl.acm.org/doi/10.1145/359460.359467

The performance consideration Paul Heckle identified was in consideration of index access in arrays versus hash tables. Hash tables are accessed randomly, or pseudo-randomly, until the desired index is found where as indexes in an array are accessed in index order.

0Likes Log in to Reply

Undergraduate Disproves 40-Year-Old Conjecture, Invents New Kind of Hash Table by robin_reala

Undergraduate Disproves 40-Year-Old Conjecture, Invents New Kind of Hash Table by robin_reala

Share This Article

Newsletter

HackTech

8 Comments

taylodl

hullo

mNovak

vlovich123

diyseguy

herf

hinkley

austin-cheney

Leave a comment Cancel reply

Editor's Choice

Undergraduate Disproves 40-Year-Old Conjecture, Invents New Kind of Hash Table by robin_reala

Undergraduate Disproves 40-Year-Old Conjecture, Invents New Kind of Hash Table by robin_reala

Share This Article

Newsletter

8 Comments

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter