nullprogram.com/blog/2023/10/08/
This has been a ground-breaking year for my C skills, and paradigm shifts
in my technique has provoked me to reconsider my habits and coding style.
It’s been my largest personal style change in years, so I’ve decided to
take a snapshot of its current state and my reasoning. These changes have
produced significant productive and organizational benefits, so while most
is certainly subjective, it likely includes a few objective improvements.
I’m not saying everyone should write C this way, and when I contribute
code to a project I follow their local style. This is about what works
well for me.
Primitive types
Starting with the fundamentals, I’ve been using short names for primitive
types. The resulting clarity was more than I had expected, and it’s made
my code more enjoyable to review. These names appear frequently throughout
a program, so conciseness pays. Also, now that I’ve gone without, _t
suffixes are more visually distracting than I had realized.
typedef uint8_t u8;
typedef char16_t c16;
typedef int32_t b32;
typedef int32_t i32;
typedef uint32_t u32;
typedef uint64_t u64;
typedef float f32;
typedef double f64;
typedef uintptr_t uptr;
typedef char byte;
typedef ptrdiff_t size;
typedef size_t usize;
Some people prefer an s
prefix for signed types. I prefer i
, plus as
you’ll see, I have other designs for s
. For sizes, isize
would be more
consistent, and wouldn’t hog the identifier, but signed sizes are the
way and so I want them in a place of privilege. usize
is niche,
mainly for interacting with external interfaces where it might matter.
b32
is a “32-bit boolean” and communicates intent. I could use _Bool
,
but I’d rather stick to a natural word size and stay away from its weird
semantics. To beginners it might seem like “wasting memory” by using a
32-bit boolean, but in practice that’s never the case. It’s either in a
register (return value, local variable) or would be padded anyway (struct
field). When it actually matters, I pack booleans into a flags
variable,
and a 1-byte boolean rarely important.
While UTF-16 might seem niche, it’s a necessary evil when dealing with
Win32, so c16
(“16-bit character”) has made a frequent appearance. I
could have based it on uint16_t
, but putting the name char16_t
in its
“type hierarchy” communicates to debuggers, particularly GDB, that for
display purposes these variables hold character data. Officially Win32
uses a type named wchar_t
, but I like being explicit about UTF-16.
u8
is for octets, usually UTF-8 data. It’s distinct from byte
, which
represents raw memory and is a special aliasing type. In theory these
can be distinct types with differing semantics, though I’m not aware of
any implementation that does so (yet?). For now it’s about intent.
What about systems that don’t support fixed width types? That’s academic,
and far too much time has been wasted worrying about it. That includes
time wasted on typing out int_fast32_t
and similar nonsense. Virtually
no existing software would actually work correctly on such systems — I’m
certain nobody’s testing it after all — so it seems nobody else cares
either.
I don’t intend to use these names in isolation, such as in code snippets
(outside of this article). If I did, examples would require the typedefs
to give readers the complete context. That’s not worth extra explanation.
Even in the most recent articles I’ve used ptrdiff_t
instead of size
.
Macros
Next, my “standard” set of macros:
#define sizeof(x) (size)sizeof(x)
#define alignof(x) (size)_Alignof(x)
#define countof(a) (sizeof(a) / sizeof(*(a)))
#define lengthof(s) (countof(s) - 1)
While I still prefer ALL_CAPS
for constants, I’ve adopted lowercase for
function-like macros because it’s nicer to read. They don’t have the same
namespace problems as other macro definitions: I can have a macro named
new()
and also variables and fields named new
because they don’t look
like function calls.
For GCC and Clang, my favorite assert
macro now looks like this:
#define assert(c) while (!(c)) __builtin_unreachable()
It has useful properties beyond the usual benefits:
-
It does not require separate definitions for debug and release builds.
Instead it’s controlled by the presence of Undefined Behavior Sanitizer
(UBSan), which is already present/absent in these circumstances. That
includes fuzz testing. -
libubsan
provides a diagnostic printout with a file and line number. -
In release builds it turns into a practical optimization hint.
To enable assertions in release builds, put UBSan in trap mode with
-fsanitize-trap
and then enable at least -fsanitize=unreachable
. In
theory this can also be done with -funreachable-traps
, but as of this
writing it’s been broken for the past few GCC releases.
Parameters and functions
No const
. It serves no practical role in optimization, and I cannot
recall an instance where it caught, or would have caught, a mistake. I
held out for awhile as prototype documentation, but on reflection I found
that good parameter names were sufficient. Dropping const
has made me
noticeably more productive by reducing cognitive load and eliminating
visual clutter. I now believe its inclusion in C was a costly mistake.
(One small exception: I still like it as a hint to place static tables in
read-only memory closer to the code. I’ll cast away the const
if needed.
This is only of minor importance.)
Literal 0
for null pointers. Short and sweet. This is not new, but a
style I’ve used for about 7 years now, and has appeared all over my
writing since. There are some theoretical edge cases where it may cause
defects, and lots of ink has been spilled on the subject, but
after a couple 100K lines of code I’ve yet to see it happen.
restrict
when necessary, but better to organize code so that it’s not,
e.g. don’t write to “out” parameters in loops, or don’t use out parameters
at all (more on that momentarily). I don’t bother with inline
because I
compile everything as one translation unit anyway.
typedef
all structures. I used to shy away from it, but eliminating the
struct
keyword makes code easier to read. If it’s a recursive structure,
use a forward declaration immediately above so that such fields can use
the short name:
typedef struct map map;
struct map {
map *child[4];
// ...
};
Declare all functions static
except for entry points. Again, with
everything compiled as a single translation unit there’s no reason to do
otherwise. It was probably a mistake for C not to default to static
,
though I don’t have a strong opinion on the matter. With the clutter
eliminated through short types, no const
, no struct
, etc. functions
fit comfortably on the same line as their return type. I used to break
them apart so that the function name began on its own line, but that’s no
longer necessary.
In my writing I sometimes omit static
to simplify, and because outside
the context of a complete program it’s mostly irrelevant. However, I will
use it below to emphasize this style.
For awhile I capitalized type names as that effectively put them in a kind
of namespace apart from variables and functions, but I eventually stopped.
I may try this idea in different way in the future.
Strings
One of my most productive changes this year has been the total rejection
of null terminated strings — another of those terrible mistakes — and the
embrace of this basic string type:
#define s8(s) (s8){(u8 *)s, lengthof(s)}
typedef struct {
u8 *data;
size len;
} s8;
I’ve used a few names for it, but this is my favorite. The s
is for
string, and the 8
is for UTF-8 or u8
. The s8
macro (sometimes just
spelled S
) wraps a C string literal, making a s8
string out of it. A
s8
is handled like a fat pointer, passed and returned by copy.
s8
makes for a great function prefix, unlike str
, all of which are
reserved. Some exa