Comparing two numbers should be easy right? Maybe it should, yet it’s not the case in C++ even if we constrain the comparison to the domain of integral numbers.
If you try to compare a signed with an unsigned integer there are several possible outcomes. It might actually work and you will never know what you risked. Maybe it will not work as you expected and you’ll spend quite some time scratching your head about what just happened. It’s also possible that it will not work according to your expectation but it will go unnoticed.
Another option is that you get a compiler warning either by specifically turning on -Wsign-compare
or -Wextra
. If it’s combined with -Werror
, the compilation will even break and you must fix it. In this and the coming two articles, we are going to talk about integer comparisons.
Let’s start by checking a bit deeper what can go wrong and then we walk through in detail the C++20 solution.
The problem of comparing a signed with an unsigned
What are integral types in C++?
There are quite a few of them: bool
, char
, char8_t
, char16_t
, char32_t
, short
, int
, long
, long long
, and the unsigned versions of them. Let’s put aside that we can also cv qualify them and that there are a big bunch of implementation-defined types such as uint32_t
et al.
Even though these are all integrals, bool
and char
types are not meant to store numbers. We are limiting our focus to short
, int
, long
, long long
and their unsigned versions.
If you take the signed and unsigned versions of the same type, they are going to occupy the same size in memory. For example, both a short
and an unsigned short
will need 2 bytes. 2 bytes give us 2**16 possibilities, 2**16 different values to store. Therefore the range of a short is from -2**8 to 2**8-1, while the range of an unsigned short
is from 0 to 2**16-1.
The bitwise representations of -1 and 4294967295 are the same, but depending on how those two bytes are flagged, their interpretation will be different. Unsigned integers can only carry non-negative numbers, therefore there is no need to reserve a bit for the sign. The least significant bit represents 2^0, the next one 2^1, then 2^2 and so on.
On the other hand, a signed integer can hold both negative and positive numbers. To be able to represent all of them, it uses the two’s complement form.
- The most significant bit represents the sign (0 for positive, 1 for negative numbers)
- To convert a positive value to its negative counterpart, you invert all the bits and then add 1
But what does this mean in practice?
If you try to interpret -1 as an unsigned int
– assuming a 4-byte size – the result will be something big, in this case 4294967295. -1 is represented as 111111111’111111111’111111111’111111111 in memory. The first byte shows that we deal with a negative value, then we have to negate everything bitwise and subtract -1 to get the value it stores. By negating bitwise we get 0 and if we subtract -1 we are at -1.
But if 111111111'111111111'111111111'111111111
is treated as an unsigned number we simply get the biggest possible positive number that can be represented on 32 bits.
This also means that big enough unsigned numbers cannot be represented in 2’s complement form given that the size of the variable stays the same. It’s straightforward given that there is one bit reserved to store the sign of the stored number.
This difference in representation makes it potentially unsafe to compare signed and unsigned numbers to each other.
To put in code the above, unless you use -Werror
the below will compile. If you don’t use -Wextra
or -Wsign-compare
you won’t even get a warning.
1 2 3 4 5 6 7 8 9 |
int main() { static_assert(static_cast<unsigned int>(-1) > 42); constexpr int n = -1; constexpr size_t m = 42; static_assert(n > m); return 0; } |
The modern solution
C++20 introduced the so-called “intcmp” functions in the
header to provide a safe way to compare signed and unsigned integers and at the same time get mathematically reasonable results. In other words, they will treat -1
smaller than any non-negative number.
First, let’s see what are the available functions and what are their meaning:
Function | Meaning |
---|---|
std::cmp_equal(n, m) | n == m |
std::cmp_not_equal(n, m) | n != m |
std::cmp_less(n, m) | n < m |
std::cmp_greater(n, m) | n > m |
std::cmp_less_equal(n, m) | n <= m |
std::cmp_greater_equal(n, m) | n >= m |
This means that when reading out the function name the relation such as “less” is always compared to the first first parameter. The first parameter is less than or greater than the second.
While these functions are only available since C++20, they are easy to backport as they require no new language features (C++17 suffices) and the reference implementation on C++ Reference is good enough.
Let me first put here the code and then let’s go through it.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 |
/* Template parameters T and U should be numbers that we can ensure both with concepts or with static assertions */ template<class T, class U> constex |
=td>
Read More