If I give a programmer a string such as "9223372036854775808" and I ask them to convert it to an integer, they might do the following in C++:
std::string s = .... uint64_t val; auto [ptr, ec] = std::from_chars(s.data(), s.data() + s.size(), val); if (ec != std::errc()) {} // I have an error ! // val holds the value
It is very fast: you can parse a sequence of random 32-bit integers at about 40 cycles per integer, using about 128 instructions.
Can you do better?
The recent Intel processors have new instructions, AVX-512, that can process multiple bytes at once and do it with masking, so that you can select just a range of data.
I am going to assume that you know the beginning and the end of sequence of digits. The following code with AVX-512 intrinsic does the following:
- Computes the span in bytes (digit_count),
- If we have more than 20 bytes, we know that the integer is too large to fit in a 64-bit integer,
- We compute a “mask”: a 32-bit value that only the most significant digit_count bits set to 1,
- We load an ASCII or UTF-8 string in a 256-bit register,
- We subtract character value ‘0’ to get values between 0 and 9 (digit values),
- We check whether some value exceeds 9, in which case we had a non-digit character.
size_t digit_count = size_t(end - start); // if (digit_count > 20) { error ....} const simd8x32 ASCII_ZERO = _mm256_set1_epi8('0'); const simd8x32 NINE = _mm256_set1_epi8(9); uint32_t mask = uint32_t(0xFFFFFFFF) << (start - e