nullprogram.com/blog/2023/09/30/
My last article had tips for for arena allocation. This next
article demonstrates a technique for building bespoke hash maps that
compose nicely with arena allocation. In addition, they’re fast, simple,
and automatically scale to any problem that could reasonably be solved
with an in-memory hash map. To avoid resizing — both to better support
arenas and to simplify implementation — they have slightly above average
memory requirements. The design, which we’re calling a hash-trie, is the
result of fruitful collaboration with NRK, whose sibling article
includes benchmarks. It’s my new favorite data structure, and has proven
incredibly useful. With a couple well-placed acquire/release atomics, we
can even turn it into a lock-free concurrent hash map.
I’ve written before about MSI hash tables, a simple, very fast
map that can be quickly implemented from scratch as needed, tailored to
the problem at hand. The trade off is that one must know the upper bound
a priori in order to size the base array. Scaling up requires resizing
the array — an impedance mismatch with arena allocation. Search trees
scale better, as there’s no underlying array, but tree balancing tends to
be finicky and complex, unsuitable to rapid, on-demand implementation.
We want the ease of an MSI hash table with the scaling of a tree.
I’ll motivate the discussion with example usage. Suppose we have an array
of pointer+length strings, as defined last time:
typedef struct {
uint8_t *data;
ptrdiff_t len;
} str;
And we need a function that removes duplicates in place, but (for the
moment) we’re not worried about preserving order. This could be done
naively in quadratic time. Smarter is to sort, then look for runs.
Instead, I’ve used a hash map to track seen strings. It maps str
to
bool
, and it is represented as type strmap
and one insert+lookup
function, upsert
.
// Insert/get bool value for given str key.
bool *upsert(strmap **, str key, arena *);
ptrdiff_t unique(str *strings, ptrdiff_t len, arena scratch)
{
ptrdiff_t count = 0;
strmap *seen = 0;
while (count < len) {
bool *b = upsert(&seen, strings[count], &scratch);
if (*b) {
// previously seen (discard)
strings[count] = strings[--len];
} else {
// newly-seen (keep)
count++;
*b = 1;
}
}
return count;
}
In particular, note:
-
A null pointer is an empty hash map and initialization is trivial. As
discussed in the last article, one of my arena allocation principles is
default zero-initializion. Put together, that means any data structure
containing a map comes with a ready-to-use, empty map. -
The map is allocated out of the scratch arena so it’s automatically
freed upon any return. It’s as care-free as garbage collection. -
The map directly uses strings in the input array as keys, without making
copies nor worrying about ownership. Arenas own objects, not references.
If I wanted to carve out some fixed keys ahead of time, I could even
insert static strings. -
upsert
returns a pointer to a value. That is, a pointer into the map.
This is not strictly required, but usually makes for a simple interface.
When an entry is new, this value will be false (zero-initialized).
So, what is this wonderful data structure? Here’s the basic shape:
typedef struct {
hashmap *child[4];
keytype key;
valtype value;
} hashmap;
They child
and key
fields are essential to the map. Adding a child
to any data structure turns it into a hash map over whatever field you
choose as the key. In other words, a hash-trie can serve as an intrusive
hash map. In several programs I’ve combined intrusive lists and hash maps
to create an insert-ordered hash map. Going the other direction, omitting
value
turns it into a hash set. (Which is what unique
really needs!)
As you probably guessed, this hash-trie is a 4-ary tree. It can easily be
2-ary (leaner but slower) or 8-ary (bigger and usually no faster), but
4-ary strikes a good balance, if a bit bulky. In the example above,
keytype
would be str
and valtype
would be bool
. The most general
form of upsert
looks like this:
valtype *upsert(hashmap **m, keytype key, arena *perm)
{
for (uint64_t h = hash(key); *m; h <<= 2) {
if (equals(key, (*m)->key)) {
return &(*m)->value;
}
m = &(*m)->child[h>>62];
}
if (!perm) {
return 0;
}
*m = new(perm, hashmap);
(*m)->key = key;
return &(*m)->value;
}
This will take some unpacking. The first argument is a pointer to a
pointer. That’s the destination for any newly-allocated element. As it
travels down the tree, this points into the parent’s child
array. If
it points to null, then it’s an empty tree which, by definition, does not
contain the key.
We need two “methods” for keys: hash
and equals
. The hash function
should return a uniformly distributed integer. As is usually the case,
less uniform fast hashes generally do better than highly-uniform slow
hashes. For hash maps under ~100K elements a 32-bit hash is fine, but
larger maps should use a 64-bit hash state and result. Hash collisions
revert to linear, linked list performance and, per the birthday paradox,
that will happen often with 32-bit hashes on large hash maps.
If you’re worried about pathological inputs, add a seed parameter to
upsert
and hash
. Or maybe even use the address m
as a seed. The
specifics depend on your security model. It’s not an issue for most hash
maps, so I don’t demonstrate it here.
The top two bits of the hash are used to select a branch. These tend to be
higher quality for multiplicative hash functions. At each level
two bits are shifted out. This is what gives it its name: a trie of the
hash bits. Though it’s un-trie-like in the way it deposits elements at
the first empty spot. To make it 2-ary or 8-ary, use 1 or 3 bits at a
time.
I initially tried a Multiplicative Congruential Generator (MCG) to
select the next branch at each trie level, instead of bit shifting, but
NRK noticed it was consistently slower than shifting.
While “delete” could be handled using gravestones, many deletes would not
work well. After all, the underlying allocator is an arena. A combination
of uniformly distributed branching and no deletion means that rebalancing
is unnecessary. This is what grants it its simplicity!
If no arena is provided, it reverts to a lookup and returns null when the
key is not found. It allows one function to flexibly serve both modes. In
unique
, pure lookups are unneeded, so this condition could be skipped in
its strmap
.
Sometimes it’s useful to return the entire hashmap
object itself rather
than an internal pointer, particularly when it’s intrusive. Use whichever
works best for the situation. Regardless, exploit zero-initialization to
detect newly-allocated elements when possible.
In some cases we may deep copy the key in its arena before inserting it
into the map. The provided key may be a temporary (e.g. sprintf
) which
the map outlives, and the caller doesn’t want to allocate a longer-lived
key unless it’s needed. It’s all part of tailoring the map to the problem,
which we can do because it’s so short and simple!
Fleshing it out
Putting it all together, unique
could look like the following, with
strmap
/upsert
renamed to strset
/ismember
:
uint64_t hash(str s)
{
uint64_t h = 0x100;
for