We don’t always think of it this way, but on modern machines, memory and pointers are an abstraction. Today’s machines have virtual memory, divided in blocks called “pages”, such that the addresses represented by pointers don’t necessarily map to the same address in physical RAM. In fact, mmap even makes it possible to map files to memory, so some of these addresses aren’t even mapped to RAM addresses at all.
Two weeks ago, I wrote about UVM, the small virtual machine I’ve been building in my spare time. This VM has a relatively low-level design where it has untyped instructions and pointers for instance. Generally speaking, I’ve done my best to design the VM to be fairly “conventional”, in the sense that most design elements are aligned with common practice and unsurprising. In my opinion, this is important because it keeps the design approachable to newcomers. Having a design that’s unsurprising means that new users don’t need to read a 100-page manual and familiarize themselves with new terminology to get something done.
Even though I’ve done what I could to keep the design of UVM unsurprising, there is one aspect that’s unconventional. At the moment, UVM uses what’s known as a Harvard Architecture, where code and data live in two separate linear address spaces. Essentially, code and data live in two separate arrays of bytes. There’s actually a third address space too: the stack is distinct from the address space used for the heap. That means you can’t directly get a pointer to something that’s stored on the stack.
It’s maybe not that unconventional when you think about it, because WASM works the same way. You can’t directly get a pointer to a stack variable in WASM either, and you also can’t directly read/write to code memory. Same goes for the JVM. It just seems unconventional because UVM presents itself as a fairly low-level virtual machine that gives you pointers, and yet there are restrictions on what you can do with those pointers.
There’s a few reasons why the stack, the heap and executable memory are separate in UVM. The main reason is performance. By creating distinct address spaces, we make accesses to these different address spaces explicit. At the moment, UVM is interpreted, but my goal is to eventually build a simple JIT compiler for it. That brings in performance considerations. If everything lived in a single address space, then potentially, every write to memory could write anywhere. Every single time that you wr