A few months ago, I set myself the challenge of writing a C compiler in 500 lines of Python, after writing my SDF donut post.
How hard could it be?
The answer was, pretty hard, even when dropping quite a few features.
But it was also pretty interesting, and the result is surprisingly functional and not too hard to understand!
There’s too much code for me to comprehensively cover in a single blog post, so I’ll just give an overview of the decisions I made, things I had to cut, and the general architecture of the compiler, touching on a representative piece of each part.
Hopefully after reading this post, the code is more approachable!

Decisions, decisions
The first, and most critical decision, was that this would be a single-pass compiler.
500 lines is too spare to be defining and transforming an abstract syntax tree!
What does that mean?

Most compilers: faffing around with syntax trees
Well, most compiler’s internals look something like this:

The tokens get lexed, then a parser runs over them and builds pretty little syntax trees:
# hypothetical code, not from anywhere
def parse_statement(lexer) -> PrettyLittleSyntaxTree:
...
if type := lexer.try_next(TYPE_NAME):
variable_name = lexer.next(IDENTIFIER)
if lexer.try_next("="):
initializer = parse_initializer(lexer)
else:
initializer = None
lexer.next(SEMICOLON)
return VariableDeclarationNode(
type = type,
name = variable_name,
initializer = initializer,
)
...
# much later...
def emit_code_for(node: PrettyLittleSyntaxTree) -> DisgustingMachineCode:
...
if isinstance(node, VariableDeclarationNode):
slot = reserve_stack_space(node.type.sizeof())
add_to_environment(node.name, slot)
if node.initializer is not None:
register = emit_code_for(node.initializer)
emit(f"mov {register}, [{slot}]")
...
The important thing here is that there’s two passes, first the parsing builds up a syntax tree, then a second pass chews that tree up and turns it into machine code.
That’s really useful for most compilers!
It keeps the parsing and codegen separate, so each can evolve independently.
It also means that you can transform the syntax tree before using it to generate code—for example, by applying optimizations to it.
In fact, most compilers have multiple levels of “intermediate representations” between the syntax tree and codegen!
This is really great, good engineering, best practices, recommended by experts, etc.
But… it takes too much code, so we can’t do it.
Instead, we’ll be single-pass: code generation happens during parsing.
We parse a bit, emit some code, parse a bit more, emit a bit more code.
So for example, here’s some real code from the c500
compiler for parsing the prefix ~
op:
# lexer.try_next() checks if the next token is ~, and if so, consumes
# and returns it (truthy)
elif lexer.try_next("~"):
# prefix() parses and generates code for the expression after the ~,
# and load_result emits code to load it, if needed
meta = load_result(prefix())
# immediately start yeeting out the negation code!
emit("i32.const 0xffffffff")
emit("i32.xor")
# webassembly only supports 32bit types, so if this is a smaller type,
# mask it down
mask_to_sizeof(meta.type)
# return type information
return meta
Notice there’s no syntax trees, no PrefixNegateOp
nodes.
We see some tokens and immediately spit out the corresponding instructions.
You may have noticed those instructions are WebAssembly, which leads us into the next section…

Using WebAssembly, for some reason?
So I decided to make the compiler target WebAssembly.
I honestly don’t know why I did this, it really didn’t make it easier—I guess I was just curious?
WebAssembly is a really weird target, especially for C.
Besides the somewhat-external issues like spending a lot of time confused before I realized WebAssembly v2 is pretty different than WebAssembly v1, the instruction set itself is weird.
For one, there’s no goto.
Instead, you have blocks—structured assembly, imagine that!—and “break” instructions that jump to either the beginning or end of a specific nesting-level of block.
This was basically inconsequential for if
and while
, but made implementing for
extremely cursed, which we’ll go over later.
Additionally, WebAssembly doesn’t have registers, it has a stack, and is a stack machine.
At first you might think that’s awesome, right?
C needs a stack!
We can just use the WebAssembly stack as our C stack!
Nope, because you can’t take references to the WebAssembly stack.
So instead, we need to maintain our own in-memory stack anyways, and then shuffle it on and off of the WASM parameter stack.
So in the end, I think I ended up with slightly more code than I would have needed to target a more normal ISA like x86 or ARM.
But it was interesting!
And theoretically, you could run code compiled with c500
in a browser, although I haven’t tried (I just use the wasmer
CLI).

Error handling
It basically doesn’t.
There’s a function die
, which is called when anything weird happens and dumps a compiler stack trace—if you’re lucky, you get a line number and a somewhat-vague error message.
------------------------------
File "...compiler.py", line 835, in
compile("".join(fi)) # todo: make this line-at-a-time?
File "...compiler.py", line 823, in compile
global_declaration(global_frame, lexer)
File "...compiler.py", line 417, in value
var, offset = frame.get_var_and_offset(varname)
File "...compiler.py", line 334, in get_var_and_offset
return self.parent.get_var_and_offset(name)
File "...compiler.py", line 336, in get_var_and_offset
die(f"unknown variable {n}", None if isinstance(name, str) else name.line)
File "...compiler.py", line 14, in die
traceback.print_stack()
------------------------------
error on line 9: unknown variable c
The Rust compiler, this is not :-)

What to drop
Finally, I had to decide what not to support, since it just wasn’t feasible to get all of C into 500 lines. (sorry!)
I decided I wanted a really decent sampling of features that tested what the general implementation approach was capable of—for example, if I had skipped pointers, I could have just gotten away with the WASM parameter stack and shed a lot of complexity, but that would have felt like cheating.
I ended up implementing the following features:
- arithmetic operations and binary operators, with proper precedence
int
,short
, andchar
types- string constants (with escapes)
- pointers (of however many levels), including correct pointer arithmetic (incrementing an
int*
adds 4) - arrays (only single-level, not
int[][]
) - functions
- typedefs (and the lexer hack!)
Notably, it doesn’t support:
- structs :-( would be possible with more code, the fundamentals were there, I just couldn’t squeeze it in
- enums / unions
- preprocessor directives (this would probably be 500 lines by itself…)
- floating point. would also be possible, the
wasm_type
stuff is in, again just couldn’t squeeze it in - 8 byte types (
long
/long long
ordouble
) - some other small things like pre/post cremements, in-place initialization, etc., which just didn’t quite fit
- any sort of standard library or i/o that isn’t returning an integer from
main()
- casting expressions
The compiler passes 34/220 test cases in the c-testsuite.
More importantly to me, it can compile and run the following program successfully:
int swap(int* a, int* b) {
int t;
t = *a; *a = *b; *b = t;
return t;
}
int fib(int n) {
int a, b;
for (a = b = 1; n > 2; n = n - 1) {
swap(&a, &b);
b = b + a;
}
return b;
}
int main() {
return fib(10); // 55
}
OK, enough about deciding things, let’s get into the code!

Helper types
There’s a small collection of helper types and classes that the compiler uses.
None of them are particularly strange, so I’ll pass over them fairly quickly.

Emitter
(compiler.py:21)
This is a singleton helper to emit nicely-formatted WebAssembly code.
WebAssembly, at least the textual format, is formatted as s-expressions, but individual instructions don’t need to be parenthesized:
(module
;;
(func $swap
(param $a i32)
(param $b i32)
(result i32)
global.get $__stack_pointer ;; prelude -- adjust stack pointer
i32.const 12
i32.sub
;;
)
)
Emitter
just helps with emitting code with nice indentation so it’s easier to read.
It also has a no_emit
method, which will be used for an ugly hack later—stay tuned!

StringPool (compiler.py:53)
StringPool
holds all the string constants so they can be arranged in a contiguous region of memory, and hands out addresses into that for the codegen to use.
When you write char *s = "abc"
in c500
, what really happens is:
StringPool
appends a null terminatorStringPool
checks if it’s already stored"abc"
, and if so, just hands that address back- Otherwise,
StringPool
adds it to a dictionary along with the base address + the total byte length stored so far—the address of this new string in the pool StringPool
hands that address back- When all the code is finished compiling, we create an
rodata
section with the giant concatenated string produced byStringPool
, stored at the string pool base address (retroactively making all the addressesStringPool
handed out valid)

Lexer
(compiler.py:98)
The Lexer
class is complex, because lexing C is complex ((\([\abfnrtv'"?]|[0-7]{1,3}|x[A-Fa-f0-9]{1,2}))
is a real regex in that code for character escapes), but conceptually simple: the lexer marches along identifying what the token at the current position is.
The caller can peek that token, or it can use next
to tell the lexer to advance, “consuming” that token.
It can also use try_next
to conditionally advance only if the next token is a certain kind—basically, try_next
is a shortcut for if self.peek().kind == token: return self.next()
.
There’s some additionally complexity because of something called the “lexer hack”.
Essentially, when parsing C you want to know if something is a type name or variable name (because that context matters for compiling certain expressions), but there’s no syntactic distinction between them: int int_t = 0;
is perfectly valid C, as is typedef int int_t; int_t x = 0;
.
To know if an arbitrary token int_t
is a type name or a variable name, we need to feed type information from the parsing/codegen stage back into the lexer.
This is a giant pain for regular compilers that want to keep their lexer, parser, and codegen modules pure and plantonically separate, but it’s actually not very hard for us!
I’ll explain it more when we get to the typedef
section, but basically we just keep types: set[str]
in Lexer
, and when lexing, check if a token is in that set before giving it a token kind:
if m := re.match(r"^[a-zA-Z_][a-zA-Z0-9_]*", self.src[self.loc :]):
tok = m.group(0)
...
# lexer hack
return Token(TOK_TYPE if tok in self.types else TOK_NAME, tok, self.line)

CType
(compiler.py:201)
This is just a dataclass for representing information about a C type, like you’d write in int **t
or short t[5]
or char **t[17]
, minus the t
.
It contains:
- the type’s name (with any typedefs resolved), such as
int
orshort
- what level of pointer is is (
0
= not a pointer,1
=int *t
,2
=int **t
, and so on) - what the array size is (
None
= not an array,0
=int t[0]
,1
=int t[1]
, and so on)
Notably, as mentioned before, this type only supports single-level arrays, and not nested arrays like int t[5][6]
.

FrameVar
and StackFrame
(compiler.py:314)
These classes handle our C stack frames.
As I mentioned before, because you can’t take references to the WASM stack, we have to manually handle the C stack, we can’t use the WASM one.
To set up the C stack, the prelude emitted in __main__
sets up a global __stack_pointer
variable, and then every function call decrements that by however much space the function needs for its parameters and local variables—calculated by that function’s StackFrame
instance.
I’ll go over how that calculation works in more detail when we get to parsing functions, but essentially, each parameter and local variable gets a slot in that stack space, and increases StackFrame.frame_size
(and thus the offset of the next variable) depending on its size.
The offset, type information, and other data for each parameter and local variable are stored in a FrameVar
instance, in StackFrame.variables
, in order of declaration.

ExprMeta
(compiler.py:344)
This final dataclass is used to track whether the result of an expression is a value or a place.
We need to keep track of this distinction in order to handle certain expressions differently based on how they’re used.
For example, if you have a variable x
of type int
, it can be used in two ways:
x + 1
wants the value ofx
, say1
, to operate on&x
wants the address ofx
, say0xcafedead
When we parse the x
expression, we can easily fetch the address from the stack frame:
# look the variable up in the `StackFrame`
var, offset = frame.get_var_and_offset(varname)
# put the base address of the C stack on top of the WASM stack
emit(f"global.get $__stack_pointer")
# add the offset (in the C stack)
emit(f"i32.const {offset}")
emit("i32.add")
# the address of the variable is now on top of the WASM stack
But now what?
If we i32.load
this address to get the value, then &x
will have no way to get the address.
But if we don’t load it, then x + 1
will try to add one to the address, resulting in 0xcafedeae
instead of 2
!
That’s where ExprMeta
comes in: we leave the address on the stack, and return an ExprMeta
indicating this is a place:
return ExprMeta(True, var.type)
Then, for operations like +
that always want to operate on values instead of places, there’s a function load_result
that turns any places into values:
def load_result(em: ExprMeta) -> ExprMeta:
"""Load a place `ExprMeta`, turning it into a value
`ExprMeta` of the same type"""
if em.is_place:
# emit i32.load, i32.load16_s, etc., based on the type
emit(em.type.load_ins())
return ExprMeta(False, em.type)
...
# in the code for parsing `+`
lhs_meta = load_result(parse_lhs())
...
Meanwhile, an operation like &
just doesn’t load the result, and instead leaves the address on the stack: in an important sense, &
is a no-op in our compiler, since it doesn’t emit any code!
if lexer.try_next("&"):
meta = prefix()
if not meta.is_place:
die("cannot take reference to value", lexer.line)
# type of &x is int* when x is int, hence more_ptr
return ExprMeta(False, meta.type.more_ptr())
Note also that, despite being an address, the result of &
isn’t a place! (The code returns an ExprMeta
with is_place=False
.)
The result of &
should be treated like a value, since &x + 1
should add 1
(or rather, sizeof(x)
) to the address.
That’s why we need the place/value distinction, since just “being an address” isn’t enough to know whether the result of an expression should be loaded.
OK, enough about helper classes.
Let’s move on to the meat of codegen!

Parsing and code generation
The general control flow of the compiler goes like this:
The blue rectangles represent the main functions of the compiler—__main__
, compile()
, global_declaration()
, statement()
, and expression()