In the last article, we covered bare metal programming on RISC-V. Please familiarize yourself with that material before proceeding with the rest of this article, as this article is a direct continuation of the aforementioned one.
This time we are talking about RISC-V SBI (Supervisor Binary Interface), with OpenSBI as the example. We’ll look at how SBI can assist us with implementing operating system kernel primitives and we’ll end the article with a practical example using riscv64 virt
machine.
Table of contents
Open Table of contents
RISC-V and “BIOS”
In the article mentioned above, we talked extensively about the very first stages of the RISC-V bootup process. We mentioned that first the ZSBL (Zero Stage Bootloader) runs, initializes a few registers and jumps directly to some address hardcoded by ZSBL. In the case of QEMU’s riscv64 virt
, the hardcoded address is 0x80000000
. This is where the first user-provided code runs, and if left to default, QEMU will load OpenSBI
there.
Machine modes
So far we have avoided talking about different machine modes, and now is the perfect time to introduce them. The concept with machines modes is that not every piece of software should be able to access just about any memory address on the machine, or even execute just about any instructions available with the CPU. Traditionally, in a textbook example, the two main divisions are made here:
- Privileged mode
- Unprivileged mode
The privileged mode is where the machine starts at the boot time. Any instruction is permitted and no address access is considered an access violation. Once the operating system takes over the control of the system and starts launching the user code (aka userspace code), the modes start switching. When the user code is running on the CPU core, it is running within the unprivileged mode where not everything is accessible. Going back to the kernel mode means switching back to the privilged mode.
This is a very textbook and simplistic view at the permissions of operations and the question arises: why only 2 modes?
In systems, more than 2 modes typically exist, forming a protection ring with multiple access modes. RISC-V specification does not necessarily prescribe exactly which modes must be implemented for a core, except the M (Machine) mode. This is the most privileged mode.
Typically, the processors with M mode only are simple embedded systems, moving over more secure systems (M and S modes), all the way to full systems that can run Unix-like operating systems (M, S and U modes).
SBI
The official docs provide a formal definition, and I will try to water it down here with the goals of making it more intuitive.
RISC-V’s SBI spec defines the layer of software that sits at the bottom of the RISC-V software stack. This is very similar to BIOS, which is traditionally the first bit of software that runs on a machine. You might have seen some of the guides for developing a simple kernel from scratch, and they typically involve something similar to what we did in the initial guide for bare metal programming on RISC-V, with a small twist — they are very often actually depending on the pre-existing software to do some I/O. The similarity to our previous guide is that they also carefully align the first instructions to the correct address to ensure that the processor’s execution flow goes as intended and the simple kernel takes over at the right time, however, what I have typically observed in those short guides is that the goal is typically to print something like ‘Hello world’ to the VGA screen. This last bit sounds like a fairly complex operation, and it really is.
How is printing to the VGA then done easily then? The answer is that BIOS is here to assist with the most basic I/O operations such as printing some characters to the screen, hence its name — Basic Input Output System! Please pay attention to the opening section of the bare metal programming guide: we were achieving interaction with the user without depending on any existing software on the machine (well, almost true, we still went through the Zero Stage Bootloader, but we didn’t depend on any outcome from it, nor we really had any control over it; it’s simply hardcoded into the system). If we were to print something on the VGA screen, instead of sending characters out through UART, we would have to do a lot more than send an ASCII code to a single address. VGA involves setting up the display device into the right mode, by sending multiple values over, setting up different parameters, etc. It’s a fairly ellaborate operation.
So how does BIOS traditionally help with tasks like these? The main concept is that whatever operating system ends up installed on the machine, it would anyway need some basic functionality, such as printing some information to the VGA screen. Thus, the machine can have these standard operations simply baked into it and ready to consume by whatever operating system ends up on the machine. Conceptually, we can think of these procedures as an everyday library we write our applications against.
Additionally, if an operating system is written against such a “library”, it automatically becomes more portable. The “library” should have all the low level details, such as “outputting to UART means writing to 0x10000000
” (as is the case with QEMU’s riscv64 virt
VM), vs. “outputting to UART means writing to 0x12345678
”, and the operating system simply needs to invoke “outputting to UART” procedure, while this “library” will know exactly how to interact with the hardware.
Fancy abstractions
This is all just a lot of talk for a very simple concept we have been using in programming since day 1: we apply layers of abstractions in our coding. Think of something like a Python function that does something like “sending a local file to an email address”. From a high level perspective, we simply call a function send_file_to_email(file, email)
and the underlying library opens up the network connection and starts pumping the bytes. This could be just another Python library. At some point, that will likely move down the software stack, and the Python library will depend on the Python runtime written in something like C to make a system call to the operating system (for example, to perform a core operation such as opening a network socket). The operating system has a network driver somewhere deep down, which knows to which address in the address space does it need to send the individual bytes in order to send the bytes over the wire to the network and so on. The main concept here is that we have an established way of hiding the complexity of operations by delegating them to the lower layers of the software stack. We built the larger system not from the atomic parts, but out of “molecules”.
If we’re delegating the complexity to the underlying library, it probably just means a function call. However, once it’s time to delegate the complexity to the operating system and lower, this happens through a binary interface.
Binary interface
Since basically forever, the x86
has been the dominant architecture for the computers we use, be it desktops or laptops. Things have been changing a lot lately, and other architectures are entering the picture, but let’s focus on just x86
. What then, makes an application built for Linux incompatible with the application for Windows? If it’s written for x86
, and both Linux and Windows run on x86
, what could possibly be the differentiator here? The CPU instructions are not different from one platform and the other, so what could it be? The answer is the interface between the application and the operating system. This particular link between the user software and the operating system is called the application binary interface (ABI). ABI is just a definition that says how the services from the operating system are invoked from the user application.
Therefore, when we say something like “this software is written for platform X”, it’s not enough to just say that X is x86
or RISC-V
, we must say x86/Linux
or x86/Windows
or RISC-V Linux
etc. The platform definition may be even more complex than that if things like dynamic linking are involved, but let us not go there for now.
Let’s take a quick example at a program written in assembly for x86/Linux
that just prints a ‘Hello’ string to the standard output.
.global _start
.section .text
_start: mov $4, %eax ; 4 is the code for the 'write' system call
mov $1, %ebx ; We are writing to file 1, i.e. the 'standard output'
mov $message, %ecx ; The data we want to print is at the address defined by the symbol message
mov $5, %edx ; The length of the data we want to print is 5
int $0x80 ; Invoke the system call, i.e. ask kernel to print the data to the standard output
mov $1, %eax ; 1 is the code for the 'exit' system call
mov $0, %ebx ; 0 is the process return code
int $0x80 ; Invoke the system call, i.e. ask the the kernel to close this process
.section .data
message: .ascii "Hello"
Assemble this program with:
as -o syscall.o syscall.s
Link it with:
ld -o syscall syscall.o
Run with:
./syscall
You should see the output “Hello”. If you’re on Bash and you also want to double check the process return code, simply run:
echo $?
And you should see 0
.
Tip: If you want to try out this example from above, but you do not have access to an x86/Linux machine, you can do this through a JavaScript VM that emulates an x86 system in-browser here; that’s a really cool website!
And there we have it: a program which prints a message to the standard output when run on an x86
machine with a Linux kernel. C standard library was not used. The final ELF
binary should run on Linux with no dependencies other than it is run on the correct platform.
Now back to the question, what makes this binary incompatible with Windows (potentially)? Another operating system encodes the system calls differently (e.g. writing isn’t code 4, but code 123, or the parameters are passed through different CPU registers). And now you have a good idea of how to directly interface with the kernel, without the assistance of the standard library (although you probably almost never want to do it). This means you have uncovered the layer at which software does things like opening files, allocates memory, sends signals, etc. The C standard library can be thought of as a wrapper which hides this complexity of invoking software interrupts through the int
instruction to communicate with the kernel, and instead makes it look like a normal call to a C function, and then under the hood, this is what it is. To be fair, the library does a lot more than that, but for the purposes of this article, it can be thought of simply as a wrapper.
And now in the RISC-V world, we have the same thing: the user application interfaces with the kernel through software interrupt CPU instructions, and passing the parameters through the CPU registers. And the kernel basically does the same thing with the SBI in order to invoke its services! It’s just that this final layer of logic invocation is called the SBI, not the ABI. A way to think about it is that it is not the application that works in the lower layer, but rather the supervisor of the applications. The difference, however, is in the name only, and the concept remains absolutely the same.
Practical example with OpenSBI
At this point we have established that SBI, much like ABI, is just a way of invoking a functionality in the lower layers of the software stack. Furthermore, we also established the SBI sits at the bottom of the software stack on a RISC-V machine, and runs in the most privileged M mode. Let’s add some more details to this picture.
It should also make sense at this point why the QEMU developers chose the -bios
flag in order to accept the SBI software image (because the functionality is basically the same as BIOS). As a reminder, the -bios
flag should point to an ELF
file that will lay out the SBI software out in memory starting from address 0x80000000
.
Let’s start the QEMU’s VM with just OpenSBI loaded, and see what happens. We shouldn’t really have to pass anything to QEMU since it defaults to loading OpenSBI at 0x80000000
.
qemu-system-riscv64 -machine virt
This is the output (on the serial port, not VGA):
OpenSBI v0.8
____ _____ ____ _____
/ __ / ____| _ _ _|
| | | |_ __ ___ _ __ | (___ | |_) || |
| | | | '_ / _ '_ ___ | _ < | |
| |__| | |_) | __/ | | |____) | |_) || |_
____/| .__/ ___|_| |_|_____/|____/_____|
| |
|_|
Platform Name : riscv-virtio,qemu
Platform Features : timer,mfdeleg
Platform HART Count : 1
Boot HART ID : 0
Boot HART ISA : rv64imafdcsu
BOOT HART Features : pmp,scounteren,mcounteren,time
BOOT HART PMP Count : 16
Firmware Base : 0x80000000
Firmware Size : 96 KB
Runtime SBI Version : 0.2
MIDELEG : 0x0000000000000222
MEDELEG : 0x000000000000b109
PMP0 : 0x0000000080000000-0x000000008001ffff (A)
PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)
The machine keeps spinning in place, presumably because it is set up to do so by default since there is no other piece of software passed to QEMU to take over the control after OpenSBI. At this point, things look good, it seems like OpenSBI has been set up properly (and its output confirms that it sits right at 0x80000000
).
How do we keep going up the software stack, how do we add a new layer? The new layer could be something like an operating system kernel, so similarly to how we have previously built an ELF
file containing instructions to be placed at 0x80000000
, we will build another ELF
file for QEMU to load into its memory, but this time the instructions will come to another address, since the portion starting at 0x80000000
has already been taken over by OpenSBI.
Which address should we load our fake “kernel” at, then?
Booting the OS kernel after SBI and calling into OpenSBI
When we loaded the BIOS/SBI/whatever you want to call it, the address was basically burnt into the machine’s logic. The first few instructions were Zero Stage Bootloader (ZSBL) and the final instruction from there was jumping to the hardcoded address 0x80000000
. As we previously mentioned, this is an immutable fact of the platform we’re wor