Written by Dale Weiler
Last updated Saturday, January 1, 2022
The need for signed integer arithmetic is often misplaced as most integers
never represent negative values within a program. The indexing of arrays and
iteration count of a loop reflects this concept as well. There should be a
propensity to use unsigned integers more often than signed, yet despite this,
most code incorrectly choses to use signed integers almost exclusively.
Most of the motivation of this article applies to C and C++, but examples for
other languages such as Go, Rust and
Odin will also be
presented in an attempt to establish that this concept applies to all
languages, regardless of their choices (for instance C and C++ leave signed
integer wrap undefined), but rather is intrinsic to the arithmetic itself.
The arguments against unsigned
#
There are a lot of arguments against the use of unsigned integers. Let me
explain why I think they’re mostly incorrect.
The safety argument
#
The most typical argument against the use of unsigned integers is that it’s
more error prone since it’s far easier for an expression to underflow than it
is to overflow. This advice is so common that the official
Google C++ Style Guide
outright discourages the use of unsigned types.
We’ll see in the following arguments where these safety issues come from and how
to easily avoid them with trivial idioms that are easier to understand than
using signed everywhere. We’ll also see that these arguments are incorrect most
of the time as they encourage continuing to write and use unsafe code.
The loop in reverse argument
#
When the counter of a for loop needs to count in reverse and the body of
the loop needs to execute when the counter is also zero, most programmers will
find unsigned difficult to use because i >= 0
will always evaluate true
.
The temptation is to cast the unsigned value to a signed one, e.g:
for (int64_t i = int64_t(size) - 1; i >= 0; i--) {
// ...
}
Of course this is dangerous as it’s a narrowing conversion, with a cast which
silences a legitimate warning. In C and C++ it invokes undefined behavior when
given specific large values and most certainly is exploitable. Most applications
would just crash on inputs >= 0x7ffffffffffffffff
. The typical argument is
that such a value would be “pathological”. Not only is this argument incorrect,
it’s even more dangerous which we will see later.
This danger is one of the supporting arguments behind always using signed
integer arithmetic. The argument is incorrect though, because int64_t
would
never permit a value >= 0x7ffffffffffffffff
. It’s only avoiding the issue in
that the specific problematic numeric range above that limit is no longer allowed.
Tough luck if you needed a value that large and if you followed the sage advice
of Google to always used signed and had that large value, well now you have a
significantly worse problem, as you now invoked signed overflow unconditionally.
Which for languages like C and C++, invoke undefined behavior. While languages
like Go and Odin will wrap and have the wrong numeric ranges in the loop
as a result of that wrap behavior.
The correct approach here is that unsigned underflow is well-defined in C and
C++ and we should be teaching the behavior of wrapping arithmetic as it’s
useful in general, but it also makes reverse iteration as easy as forward.
for (size_t i = size - 1; i < size; i--) {
// ...
}
The approach here is to begin from size - 1
and count down on each iteration.
When the counter reaches zero, the decrement causes the counter to underflow and
wrap around to the max possible value of the unsigned type. This value is far
larger than size
, so the condition i < size
evaluates false and the loop
stops.
Languages like Rust chose to make even unsigned underflow a trap representation
in Debug builds, but specific features like Range
will let you safely achieve
the same efficient wrapping behavior on underflow with much cleaner syntax.
for i in (0..size).rev() {
// ...
}
With this approach, no casts are needed, no silent bugs are introduced, and
the “pathological” input still works correctly. In fact, this form permits eve