This post will show you how to use D to write a bare-metal “Hello world”
program that targets the RISC-V QEMU simulator. In a future blog post (now
available) we’ll build on this to target actual
hardware: the VisionFive
2 SBC. See
blog-code for the final code from
this post. For a more complex example, see
Multiplix, an operating system I am
developing that runs on the VisionFive 2 (and Raspberry Pis).
Recently I’ve been writing bare-metal code in C, and I’ve become a bit
frustrated with the lack of features that C provides. I started searching for a
good replacement, and revisited D (a language I used for a
project a few years ago). It turns out D has introduced a mode called
betterC1 (sounds exactly like what I
want), which essentially disables all language features that require the D
runtime. This makes it roughly as easy to use D for bare-metal programming as
C. You don’t get all the features of D, but you get enough that it covers all
the things I want (in fact, for systems programming I prefer the betterC subset
of D over full D). D in betterC mode is exactly what it sounds like, and
retains the feel of C – going forward I think I will be using it instead of C
in all situations where I would have otherwise used C (even in non-bare-metal
situations).
Here are the positives about D I value most:
- A decent import system (no more header files and
#include
). - Automatic bounds checking, and bounded strings and arrays.
- Methods in structs.
- Compile-time code evaluation (run D code at compile-time!).
- Powerful templating and generics.
- Iterators.
- Default support for thread-local storage.
- Scope guards and RAII.
- Some memory safety protections with
@safe
. - A fairly comprehensive and readable online specification.
- An active discord channel with people that answer my questions in minutes.
- Both an LLVM-based compiler (LDC) and a GNU compiler (GDC), which is officially
part of the GCC project.- And these compilers both export roughly the same flags and intrinsics as
Clang and GCC respectively.
- And these compilers both export roughly the same flags and intrinsics as
These features, combined with the lack of a runtime and the C-like feel of the
language (making it easy to port previous code), make it a no-brainer for me to
have D as the base choice for any project where I would otherwise use C.
Now that I’ve told you about my reasons for choosing D, let’s try using it to
write a bare-metal application that targets RISC-V. If you want to follow
along, the first step is to download the toolchain (the following tools should
work on Linux or MacOS). You’ll need three different components:
- LDC 1.30 (the LLVM-based D compiler). Can be downloaded from
GitHub. Make sure to
use version 1.302. - A
riscv64-unknown-elf
GNU toolchain. Can be downloaded from
SiFive’s Freedom Tools
repository. - The QEMU RISC-V simulator:
qemu-system-riscv64
. Can be downloaded from
SiFive’s Freedom Tools
repository, or also
usually available as part of your system’s QEMU package.
We’ll be using LDC since it ships with the ability to target riscv64
. I have
used GDC for bare-metal development as well, but it requires building a
toolchain from source since nobody ships pre-built riscv64-unknown-elf-gdc
binaries3. We’ll use the GNU toolchain for assembling, linking, and for other
tools like objcopy
and objdump
, and QEMU for simulating the hardware.
With these installed you should be able to run:
$ ldc2 --version
LDC - the LLVM D compiler (1.30.0):
...
$ riscv64-unknown-elf-ld
riscv64-unknown-elf-ld: no input files
$ qemu-system-riscv64 -h
...
We’re writing bare-metal code, so there’s no operating system, no console, no
files – nothing. The CPU just starts executing instructions at a pre-specified
address4 after performing some initial setup. We’ll figure out what that
address is later when we set up the linkerscript. For now we can just define
the _start
symbol as our entrypoint, and assume the linker will place the
code at this label at the CPU entrypoint.
A D function requires a valid stack pointer, so before we can execute any D code
we need to load the stack pointer register sp
with a valid address.
Let’s make a file called start.s
and put the following in it:
.section ".text.boot"
.globl _start
_start:
la sp, _stack_start
call dstart
_hlt:
j _hlt
For now let’s assume _stack_start
is a symbol with the address of a valid
stack, and in the linkerscript we’ll set this up properly. After loading sp
,
we call a D function called dstart
, defined in the next part.
Now we can define our dstart
function in dstart.d
. For now we’ll just cause
an infinite loop.
module dstart;
extern (C) void dstart() {
while (1) {}
}
Before we can compile this program we need a bit of linkerscript to tell the
linker how our code should be laid out. We’ll need to specify the address where
the text section should start (the entry address), and reserve space for all
the data sections (.rodata
, .data
, .bss
), and the stack.
Entry address
Today we’ll be targeting the QEMU virt
RISC-V machine, so we have
to figure out what its entrypoint is.
We can ask QEMU for a list of all devices in the virt
machine by telling it
to dump the its device tree:
$ qemu-system-riscv64 -machine virt,dumpdtb=virt.dtb
$ dtc virt.dtb > virt.dts
In virt.dts
you’ll find the following entry:
memory@80000000 {
device_type = "memory";
reg = <0x00 0x80000000 0x00 0x8000000>;
};
This means that RAM starts at address 0x80000000
(everything below is special
memory or inaccessible). The CPU entrypoint for the virt
machine is the first
instruction in RAM, stored at 0x80000000
.
In the linkerscript, we need to tell the linker that it should place the
_start
function at 0x80000000
. We do this by telling it to put the
.text.boot
section first in the .text
section, located at 0x80000000
.
Then we include the rest of the .text
sections, followed by read-only data,
writable data, and the BSS.
In link.ld
:
ENTRY(_start)
SECTIONS
{
.text 0x80000000 : {
KEEP(*(.text.boot))
*(.text*)
}
.rodata : {
. = ALIGN(8);
*(.rodata*)
*(.srodata*)
. = ALIGN(8);
}
.data : {
. = ALIGN(8);
*(.sdata*)
*(.data*)
. = ALIGN(8);
}
.bss : {
. = ALIGN(8);
_bss_start = .;
*(.sbss*)
*(.bss*)
*(COMMON)
. = ALIGN(8);
_bss_end = .;
}
.kstack : {
. = ALIGN(16);
. += 4K;
_stack_start = .;
}
/DISCARD/ : { *(.comment .note .eh_frame) }
}
What is the BSS?
The BSS is a region of memory that the compiler assumes is initialized to all
zeroes. Usually the static data for a program is directly copied into the ELF
executable – if you have a string "hello world"
in your program, those exact
bytes will live somewhere in the binary (in the read-only data section).
However, a lot of static data is initialized to zero, so instead of putting
those zero bytes directly into the ELF file, the linker lets us save space by
making a special section (the BSS) that must be initialized to all zeroes at
runtime, but won’t actually contain that data in the ELF file itself. So even
if you have a giant 1MB array of zeroes, your ELF binary will be small because
that section will be expanded into RAM only when the application starts.
Usually the OS sets up the BSS before it launches a program, but we’re running
bare-metal, so we have to do that manually in the dstart
function (in the
next section). To make this initialization possible, we define the
_bss_start
and _bss_end
symbols in the linkerscript. These are symbols
whose addresses will be the start and end of the BSS section respectively.
Reserving space for the stack
We also reserve one page for the .kstack
section and mark the _stack_start
symbol to be located to the end of it (remember the stack grows down). The
stack must be 16-byte aligned.
Now we have everything we need to compile a basic bare-metal program.
$ ldc2 -Oz -betterC -mtriple=riscv64-unknown-elf -mattr=+m,+a,+c --code-model=medium -c dstart.d
$ riscv64-unknown-elf-as -mno-relax -march=rv64imac start.S -c -o start.o
$ riscv64-unknown-elf-ld -Tlink.ld start.o dstart.o -o prog.elf
Let’s look at some of these flags:
Oz
: optimize aggressively for size.betterC
: enable betterC mode (disable the built-in D runtime).mtriple=riscv64-unknown-elf
: build for the riscv64 bare-metal ELF target.mattr=+m,+a,+c
: enable the following RISC-V extensions:m
(multiply/divide),a
(atomics), andc
(compressed instructions).code-model=medium
: code
models in
RISC-V control how pointers to far away locations are constructed. The
medium
code model (also calledmedany
) allows us to address any symbol
located within 2 GiB of the current address, and is recommended for 64-bit
programs. See the SiFive post for more information.mno-relax
: disables linker relaxation in the assembler (it is already
disabled by default in LDC). Linker relaxation is a RISC-V-specific
optimization that allows the linker to make use of thegp
(global pointer)
register. I explain it in more detail in the linker relaxation
section.
It’s going to get tedious to type out these commands repeatedly, so let’s
create a Makefile5 (or a Knitfile if you’re
cool):
SRC=$(wildcard *.d)
OBJ=$(SRC:.d=.o)
all: prog.bin
%.o: %.d
ldc2 -Oz -betterC -mtriple=riscv64-unknown-elf -mattr=+m,+a,+c,+relax --code-model=medium --makedeps=$*.dep $< -c -of $@
%.o: %.s
riscv64-unknown-elf-as -march=rv64imac $< -c -o $@
prog.elf: start.o $(OBJ)
riscv64-unknown-elf-ld -Tlink.ld $^ -o $@
%.bin: %.elf
riscv64-unknown-elf-objcopy $< -O binary $@
%.list: %.elf
riscv64-unknown-elf-objdump -D $< > $@
run: prog.bin
qemu-system-riscv64 -nographic -bios none -machine virt -kernel prog.bin
clean:
rm -f *.bin *.list *.o *.elf *.dep
-include *.dep
and compile with
This file is a raw dump of our program. At this point it clocks in at a
whopping 22 bytes.
To see the disassembled program, run
$ make prog.list
...
$ cat prog.list
prog.elf: file format elf64-littleriscv
Disassembly of section .text:
0000000080000000 <_start>:
80000000: 00001117 auipc sp,0x1
80000004: 02010113 addi sp,sp,32 # 80001020 <_stack_start>
80000008: 00000097 auipc ra,0x0
8000000c: 00c080e7 jalr 12(ra) # 80000014
0000000080000010 <_hlt>:
80000010: a001 j 80000010 <_hlt>
...
0000000080000014 :
80000014: a001 j 80000014
Looks like our _start
function is being linked properly at 0x80000000
and
has the expected assembly!
If you try to run with
$ make run
qemu-system-riscv64 -nographic -bios none -machine virt -kernel prog.bin
it will just enter an infinite loop (press Ctrl-A
Ctrl-X
to quit QEMU). We
still have a bit more work to do before we get output.
Now let’s modify dstart
to initialize the BSS. We need to de