This December I once again did the Advent of Code,
in Rust. If you are interested, my solutions
are on Github. I wanted to highlight one particular solution to the day 2
problem as it is both optimized completely
beyond the point of reason yet contains a useful technique. For simplicity we’re
only going to do part 1 of the day 2 problem here, but the exact same techniques
apply to part 2.
We’re going to start off slow, but stick around because at the end you should
have an idea what on earth this function is doing, how it works, how to make one
and why it’s the world’s smallest hash table:
pub fn phf_shift(x: u32) -> u8 {
let shift = x.wrapping_mul(0xa463293e) >> 27;
((0x824a1847u32 >> shift) & 0b11111) as u8
}
The problem
We receive a file where each line contains A
, B
, or C
, followed by
a space, followed by X
, Y
, or Z
. These are to be understood as choices in
a game of rock-paper-scissors as such:
A = X = Rock
B = Y = Paper
C = Z = Scissors
The first letter (A
/B
/C
) indicates the choice of our opponent, the second
letter (X
/Y
/Z
) indicates our choice. We then compute a score, which has two
components:
-
If we picked Rock we get 1 point, if we picked Paper we get 2 points, and 3 points if we picked Scissors.
-
If we lose we gain 0 points, if we draw we gain 3 points, if we win we get 6 points.
As an example, if our input file looks as such:
A Y
B X
C Z
Our total score would be (2 + 6) + (1 + 0) + (3 + 3) = 15
.
An elegant solution
A sane solution would verify that indeed our input lines have the format [ABC] [XYZ]
, before extracting those two letters. After converting these letters to
integers 0
, 1
, 2
by subtracting either the ASCII code for 'A'
or 'X'
respectively we can immediately calculate the first component of our score as 1 + ours
.
The second component is more involved, but can be elegantly solved using modular arithmetic.
Note that if Rock = 0, Paper = 1, Scissor = 2 then we always have that choice ${k + 1 bmod 3}$
beats $k$. Alternatively, $k$ beats ${k – 1}$, modulo 3:
If we divide the number of points that Advent of Code expects for a loss/draw/win by
three we find that a loss is $0$, a draw is $1$ and a win is $2$ points. From
these observations we can derive the following modular equivalence
$$1 + mathrm{ours} – mathrm{theirs} equiv mathrm{points} pmod 3.$$
To see that it is correct, note that if we drew, ours - theirs
is zero and we
correctly get one point. If we add one to ours
we change from a draw to a
win, and points
becomes congruent with $2$ as desired. Symmetrically, if
we add one to theirs
we change from a draw to a loss, and points
once
again becomes congruent with $0$ as desired.
Translated into code we find that our total score is
1 + ours + 3 * ((1 + ours + (3 - theirs)) % 3)
A general solution
We found a neat closed form, but if we were even slightly less fortunate it might
not have existed. A more general method for solving similar problems would be nice.
In this particular instance that is possible. There are only $3 times 3 = 9$
input pairs, so we can simply hardcode the answer for each situation:
let answers = HashMap::from([
("A X", 4),
("A Y", 8),
("A Z", 3),
("B X", 1),
("B Y", 5),
("B Z", 9),
("C X", 7),
("C Y", 2),
("C Z", 6),
]);
Now we can simply get our answer using answers[input]
. This might feel as a
bit of a non-answer, but it is a legitimate technique. We have a mapping
of inputs to outputs, and sometimes the simplest or fastest (in either
programmer time or execution time) solution is to write it out explicitly and
completely rather than compute the answer at runtime with an algorithm.
Perfect Hash Functions
The above solution works fine, but it pays a cost for its genericity. It uses
a full-fledged string hash algorithm, and lookups involve the full codepath for
hash table lookups (most notably hash collision resolution).
We can drop the genericity for a significant boost in speed if we were to use a
perfect hash function. A
perfect hash function is a specially constructed hash function on some set $S$
of values such that each value in the set maps to a different hash output,
without collisions. It is important to note that we only care about its behavior
for inputs in the set $S$, with a complete disregard for other inputs.
A minimal perfect hash function is one that also maps the inputs to a dense
range of integers $[0, 1, dots, |S|-1]$. This can be very useful because you
can then directly use the hash function output to index a lookup table. This
effectively creates a hash table that maps set $S$ to anything you want.
However, strict minimality is not necessary for this as long as you are okay
with wasting some of the space in your lookup table.
There are fully generic methods for constructing (minimal) perfect hash
functions, such as the “Hash, displace and compress” algorithm by Belazzougui
et. al., which is implemented in the phf crate.
However, they tend to use lookup tables to construct the hash itself. For small
inputs where speed and size is absolutely critical I’ve had good success just
trying stuff. This might sound vague—because it is—so let me walk you
through some examples.
Reading the input
As a bit of a hack we can note that each line of our input from the Advent of
Code consists of exactly four bytes. One letter for our opponent’s choice, a
space, our choice, and a newline byte. So we can simply read our input as a
u32
, which simplifies the hash construction immensely instead of dealing with
strings.
For example, consulting the ASCII table
we find that A
has ASCII code 0x41
, space maps to 0x20
, X
has code
0x58
and the newline symbol has code 0x0a
so the input "A Xn"
can also
simply be viewed as the integer 0x0a582041
if you are on a
little-endian machine. If you are
confused why 0x41
is in the last position remember that we humans write numbers with the
least significant digit on the right as a convention.
Note that on a big-endian machine the order of bytes in a u32
is flipped, so
reading those four bytes into an integer would result in the value 0x4120580a
.
Calling u32::from_le_bytes
converts four bytes assumed to be little-endian to
the native integer representation by swapping the bytes on a big-endian machine
and doing nothing on a little-endian machine. Almost all modern CPUs are
little-endian however, so it’s generally a good idea to write your code such
that the little-endian path is fast and the big-endian path involves a
conversion step, if a conversion step can not be avoided.
Doing this for all inputs gives us the following desired integer →
integer mapping:
Input LE u32 Answer
-------------------------------
A X 0xa582041 4
A Y 0xa592041 8
A Z 0xa5a2041 3
B X 0xa582042 1
B Y 0xa592042 5
B Z 0xa5a2042 9
C X 0xa582043 7
C Y 0xa592043 2
C Z 0xa5a2043 6
Example constructions
When I said I just try stuff, I mean it. Let’s load our mapping into Python
and write a test:
inputs = [0xa582041, 0xa592041, 0xa5a2041, 0xa582042,
0xa592042, 0xa5a2042, 0xa582043, 0xa592043, 0xa5a2043]
answers = [4, 8, 3, 1, 5, 9, 7, 2, 6]
def is_phf(h, inputs):
return len({h(x) for x in inputs}) == len(inputs)
There are nine inputs, so perhaps we get lucky and get a minimal perfect
hash function right away:
>>> [x % 9 for x in inputs]
[0, 7, 5, 1, 8, 6, 2, 0, 7]
Alas, there are collisions. What if we don’t have to be absolutely minimal?
>>> next(m for m in range(9, 2**32)
... if is_phf(lambda x: x % m, inputs))
12
>>> [x % 12 for x in inputs]
[9, 1, 5, 10, 2, 6, 11, 3, 7]
That’s not too bad! Only three elements of wasted space. We can make our first
perfect hash table by placing the answers in the correct spots:
def make_lut(h, inputs, answers):
assert is_phf(h, inputs)
lut = [0] * (1 + max(h(x) for x in inputs))
for (x, ans) in zip(inputs, answers):
lut[h(x)] = ans
return lut
>>> make_lut(lambda x: x % 12, inputs, answers)
[0, 8, 5, 2, 0, 3, 9, 6, 0, 4, 1, 7]
Giving the simple mapping:
const LUT: [u8; 12] = [0, 8, 5, 2, 0, 3, 9, 6, 0, 4, 1, 7];
pub fn answer(x: u32) -> u8 {
LUT[(x % 12) as usize]
}
Compressing the table
We stopped here on the first modulus that works, which is honestly fine in this
case because only three bytes of wasted space is pretty good. But what if we
didn’t get so lucky? We have to keep looking. Even though
modulo $m$ has $[0, m)$ as its
codomain, when applied to our set of
inputs its image might
span a smaller subset. Let’s inspect some:
>>> [(m, max(x % m for x in inputs))
... for m in range(1, 30)
... if is_phf(lambda x: x % m, inputs)]
[(12, 11), (13, 11), (19, 18), (20, 19), (21, 17), (23, 22),
(24, 19), (25, 23), (26, 21), (27, 25), (28, 19), (29, 16)]
Unfortunately but also logically, there is an upwards trend of the maximum
index as you increase the modulus. But $13$ also seems promising, let’s take a look:
>>> [x % 13 for x in inputs]
[3, 6, 9, 4, 7, 10, 5, 8, 11]
>>> make_lut(lambda x: x % 13, inputs, answers)
[0, 0, 0, 4, 1, 7, 8, 5, 2, 3, 9, 6]
Well, well, well, aren’t we lucky? The first three indices are unused, so we can
shift all the others back and get a minimal perfect hash function!
const LUT: [u8; 9] = [4, 1, 7, 8, 5, 2, 3, 9, 6];
pub fn answer(x: u32) -> u8 {
LUT[(x % 13 - 3) as usize]
}
In my experience with creating a bunch of similar mappings in the past, you’d
be surprised to see how often you get lucky, as long as your mapping isn’t too
large. As you add more ‘things to try’ to your toolbox, you also have more
opportunities of getting lucky.
Fixing near-misses
Another thing to try is fixing near-misses. For example, let’s take another look
at our original naive attempt:
>>> [x % 9 for x in inputs]
[0, 7, 5, 1, 8, 6, 2, 0, 7]
Only the l