Aria Beingessner
March 16th, 2022
Phantomderp and I have both recently been very aligned on a particular subject: being extremely angry about C ABIs and trying to fix them. Where we’re not aligned is why we’re mad about them.
He’s trying to materially improve the conditions of using C itself as a programming language.
I’m trying to materially improve the conditions of using literally any language other than C.
Now you might reasonably ask: “what the fuck does your problem have to do with C?”
It wouldn’t if C was actually a programming language. Unfortunately, it’s not, and it hasn’t been for a long time. This isn’t about the fact that C is actually horribly ill-defined due to a billion implementations or its completely failed integer hierarchy.
That stuff sucks, but on its own that wouldn’t be my problem.
My problem is that C was elevated to a role of prestige and power, its reign so absolute and eternal that it has completely distorted the way we speak to each other. Rust and Swift cannot simply speak their native and comfortable tongues – they must instead wrap themselves in a grotesque simulacra of C’s skin and make their flesh undulate in the same ways it does.
C is the lingua franca of programming. We must all speak C, and therefore C is not just a programming language anymore – it’s a protocol that every general-purpose programming language needs to speak.
So actually this kinda is about the whole “C is an inscrutable implementation-defined mess” thing. But only insofar as it makes this protocol we all have to use an even bigger nightmare!
Ok let’s get technical. You’ve finished designing your new language, Bappyscript, with first class support for Bappy Paws/Hooves/Fins. An amazing language that’s going to completely revolutionize the way that cats, sheep, and sharks program!
But now you need to actually make it do something useful. You know like, take user input, or write output, or literally anything observable? If you want programs written in your language to be good little citizens that work well with the major operating systems, you need to interact with the operating system’s interface. I hear that everything on Linux is “just a file”, so let’s open a file on Linux!
googles
OPEN(2)
NAME
open, openat, creat - open and possibly create a file
SYNOPSIS
#include
int open(const char *pathname, int flags);
int open(const char *pathname, int flags, mode_t mode);
int creat(const char *pathname, mode_t mode);
int openat(int dirfd, const char *pathname, int flags);
int openat(int dirfd, const char *pathname, int flags, mode_t mode);
/* Documented separately, in openat2(2): */
int openat2(int dirfd, const char *pathname,
const struct open_how *how, size_t size);
Feature Test Macro Requirements for glibc (see
feature_test_macros(7)):
openat():
Since glibc 2.10:
_POSIX_C_SOURCE >= 200809L
Before glibc 2.10:
_ATFILE_SOURCE
Um sorry what? This is Bappyscript, not C. Where’s the Bappyscript interface for Linux?
googles
What do you mean there’s no Bappyscript interface for Linux!? Ok well sure it’s a brand-new language, but you’re going to add one, right? Right…?
Fuck ok, I guess we have to use what they’ve given us.
We’re going to need some kind of Interface that lets our language call Functions that are Foreign to it. A Foreign Function Interface… FFI… yeah! I like the sound of that!
Oh hey there Rust, you have C FFI too? And you too Swift? Even Python?!
Everyone had to learn to speak C to talk to the major operating systems, and then when it came time to talk to eachother we suddenly all already spoke C so… why not talk to eachother in terms of C too?
Oops! Now C is the lingua franca of programming.
Oops! Now C isn’t just a programming language, it’s a protocol.
Ok so apparently basically every language has to learn to talk C. A language that is definitely very well-defined and not a mass hallucination.
What does “talking” C mean? It means getting descriptions of an interface’s types and functions in the form of a C header and somehow:
- matching the layouts of those types
- doing some stuff with linkers to resolve the function’s symbols as pointers
- calling those functions with the appropriate ABI (like putting args in the right registers)
Well we’ve got a few problems here:
- You can’t actually write a C parser.
- C doesn’t actually have an ABI. Or even defined type layouts.
Yes, I am genuinely asserting that parsing C is basically impossible.
“But wait! There are lots of tools that read in C headers! Like rust-bindgen!”
Nope:
bindgen uses libclang to parse C and C++ header files. To modify how bindgen searches for libclang, see the clang-sys documentation. For more details on how bindgen uses libclang, see the bindgen users guide.
Anyone who spends much time trying to parse C(++) headers very quickly says “ah, actually, fuck that” and asks a C(++) compiler to do it. Keep in mind that meaningfully parsing a C header is more than just parsing: you need to resolve #includes, typedefs, and macros too! So now you need to implement all of the platform’s header resolution logic and somehow figure out what things are DEFINED in the environment you care about! Yikes!
Like let’s take the really extreme example of Swift. It has basically everything going for it in terms of C interop and resources:
It’s a language developed by Apple to effectively replace Objective-C as the primary language for defining and using system APIs on its own platforms. In doing so, it has (imo) taken the notion of ABI-stability and design further than anyone else.
It’s also one of the most hardcore FFI-supporting languages I’ve ever seen! It can natively import (Objective-)C(++) headers and will produce a nice and native Swift interface with types getting automagically “bridged” to their Swift equivalents at the boundary (often transparently due to the types having identical ABIs)!
Swift was also developed by many of the same people at Apple who built and maintain Clang and LLVM. Straight-up world-leading experts in C and its spawn. One of those people is Doug Gregor, let’s see what his opinion on C FFI is:
Ah, well fuck. Not even Swift has the stomach for this stuff.
(See also Jordan Rose’s and John McCall’s llvm presentation on why Swift went with this approach)
So what do you do if you Absolutely Positively do not want to have a C compiler parsing and resolving headers at compile time?
You hand-translate those bad boys! int64_t
? Write i64
. long
? Write… uhhhh… oh no.
What’s a long?
Ok well no big surprise here: the integer types in C that were designed to be wobbly-sized for “portability” are in fact wobbly-sized! Like ok we can punt on CHAR_BIT being weird, but that still doesn’t help us know the size and align of long
!
“But wait! There are standardized calling conventions and ABIs for each platform!”
There are! And they usually define the layouts of key primitives in C! (And some of them don’t just define the calling conventions in terms of C types! side-eyes AMD64 SysV)
Ok