01. Complex systems
are built from simple ideas
Complex software like databases, compilers, and browsers are treated
like black boxes. You use them every day as a user, but you
probably don’t understand them as a programmer, even though
they are nothing but code. Why?
- They have little in common with programmers’ daily task.
- Their code bases are so large, so discouraging.
But that doesn’t mean you can’t learn these things. People have built
small, toy versions of these things for learning purposes, such as C in 4 Functions, Crafting Interpreters, Web Browser Engineering. It
turns out that if you minimize uninteresting details and focus on the
essentials, you can recreate seemingly complex software and learn its
core ideas.
For compilers, some of the core ideas include:
- Parsing stuff with recursion.
- Representing a program as a tree and simulate it.
- Representing a program as a list of instructions with gotos.
- Stack machine or register machine.
None of these ideas are complicated. So a compiler is a viable and
useful exercise for programmers.
02. What are the core
ideas of databases?
I’ve built a small database in 3000 lines
from scratch in Go to learn the core ideas of databases. It’s not that
complicated if you approach it in a certain way.
2.1 Power-loss atomicity
A database stores data on disk, but why not just use files? Why is
the filesystem not a database? Because databases eliminate a common
source of data corruption: partially written files caused by a crash or
power loss.
Video games warn you not to turn off the power while saving, because
a partially written save file can destroy all your progress. Building a
database will teach you how to eliminate this concern. The solution is
an append-only log with a checksum for each item.
- Append-only means that writes will not destroy existing data.
- Checksum means that partial writes will be detected and discarded,
making appends power-loss atomic.
╔═══╤════╦═══╤════╦══╌╌╌══╦═══╤════╗
║sum│data║sum│data║more...║sum│data║
╚═══╧════╩═══╧════╩══╌╌╌══╩═══╧════╝
╰───┬────╯
possible partial write
These are the first 2 ideas. But they are not enough to build a
database.