August 1, 2022
If you write any code that deals with manual memory management, you are likely
familiar with the concept of a “use after free” bug. These bugs can be the
source of, at best, program crashes, and at worst serious vulnerabilities. A
lesser discussed counterpart to use after free, is “use after return”. In some
cases, the latter can be even more troublesome, due to the operations that are
performed when one procedure calls
another.
In this post, we’ll take a look at what happens under the hood when a RISC-V
program includes a use after return bug, as well as how “higher-level”
programming languages can guard against this behavior, at varying levels of cost
to the programmer.
If you aren’t familiar with the RISC-V Bytes
series, it may be worth
giving our first post, Cross-Platform Debugging with QEMU and
GDB, a quick read to
get the necessary tools installed to follow along.
A Small C Program
Let’s start off by taking a look at a small C program:
main.c
#include
int *g;
void first()
{
int a = 1;
g = &a;
}
void second()
{
int b = 2;
}
int main(int argc, char **argv)
{
first();
second();
printf("%dn", *g);
return 0;
}
This program is meant to run in userspace (U mode in RISC-V
vernacular),
meaning that it depends on some initialization that runs before main()
(via
crt0
), as well as system libraries and
functionality offered by the operating system, such as the ability to print to
stdout
. We have one global
variable, g
, which is a pointer to a 32-bit integer (the default in C for our
64-bit machine), and we call two procedures, first()
then second()
, before
printing the contents of the address pointed to by g
, then exiting.
You may be able to guess what the output of this program will be, but let’s
compile it, then run it and see. We’ll use an unoptimized build to get a
simplified, albeit somewhat unrealistic, picture of what is going on.
$ riscv64-unknown-linux-gnu-gcc -static main.c
GCC Version: 11.1.0
Note:
gcc
compiles dynamically linked
executables
by default, but we opt to pass the-static
flag here to compile a
statically linked
executable.
This makes it a bit easier to run on a different host architecture via
binfmt_misc
because we don’t
have to invoke our RISC-V dynamic linker at runtime.
Let’s run our program and check the output:
Interesting! Despite only ever assigning the address of a
, which contains the
32-bit integer 1
, to g
, the contents of that address contain the value 2
when we print. We do assign 2
to the variable b
in second()
, but how does
that end up as the value at the address in g
? The crux of the issue is that we
are using variables together that have different lifetimes. While g
is
defined at the global scope, and thus exists for the entire lifetime of the
program, a
is defined only in the scope of first()
, meaning that the program
has no concept of a
once the function returns.
But we still haven’t explained why the value stored in b
, which has a lifetime
scoped to second()
has ended up at the address stored in g
. Let’s take a
look at what is happening in the generated machine code.
$ riscv64-unknown-linux-gnu-objdump -D a.out | less
Note: there are many ways to explore the disassembled content of an
executable. A pattern that I have found useful is pipingobjdump -D
into
less
, then searching for the symbol I’m looking for, in this case using
/main
. You can step forward through the matches usingn
, and step backward
usingShift+N
.
00000000000105d2 <first>:
105d2: 1101 addi sp,sp,-32
105d4: ec22 sd s0,24(sp)
105d6: 1000 addi s0,sp,32
105d8: 4785 li a5,1
105da: fef42623 sw a5,-20(s0)
105de: fec40713 addi a4,s0,-20
105e2: 9ae1b823 sd a4,-1616(gp) # 70a78
105e6: 0001 nop
105e8: 6462 ld s0,24(sp)
105ea: 6105 addi sp,sp,32
105ec: 8082 ret
00000000000105ee <second>:
105ee: 1101 addi sp,sp,-32
105f0: ec22 sd s0,24(sp)
105f2: 1000 addi s0,sp,32
105f4: 4789 li a5,2
105f6: fef42623 sw a5,-20(s0)
105fa: 0001 nop
105fc: 6462 ld s0,24(sp)
105fe: 6105 addi sp,sp,32
10600: 8082 ret
0000000000010602 <main>:
10602: 1101 addi sp,sp,-32
10604: ec06 sd ra,24(sp)
10606: e822 sd s0,16(sp)
10608: 1000 addi s0,sp,32
1060a: 87aa mv a5,a0
1060c: feb43023 sd a1,-32(s0)
10610: fef42623 sw a5,-20(s0)
10614: fbfff0ef jal ra,105d2
10618: fd7ff0ef jal ra,105ee
1061c: 9b01b783 ld a5,-1616(gp) # 70a78
10620: 439c lw a5,0(a5)
10622: 85be mv a1,a5
10624: 0004c7b7 lui a5,0x4c
10628: 26078513 addi a0,a5,608 # 4c260
1062c: 5cc040ef jal ra,14bf8 <_io_printf>
10630: 4781 li a5,0
10632: 853e mv a0,a5
10634: 60e2 ld ra,24(sp)
10636: 6442 ld s0,16(sp)
10638: 6105 addi sp,sp,32
1063a: 8082 ret
Check yourself: why is the difference in address from one instruction to
another sometimes2
(2
bytes ==16
bits) (example:10604 - 10602 = 2
)
and sometimes4
(4
bytes ==32
bits) (example:10614 - 10610 = 4
)? Our
64-bit RISC-V GCC install is using-march=rv64imafdc
. The final extension,
c
, corresponds to “Compressed”, which allows the compiler to compress some
instructions to reduce code size. Every instruction in RISC-V is32
bits,
but when the compressed extension is enabled, some instructions can be
represented in just16
bites.
Starting with main
, we see our typical function prologue, where the stack is
being grown by 32
bytes (10602
), the return address (ra
) is being stored
at the top of the stack frame (10604
), and the previous frame pointer (s0
)
is being stored just below it (10606
) before eventually updating the frame
pointer to point to the top of main
‘s 32
byte stack frame (10608
). The
next three instructions are not consequential for our investigation today, and
are only present here due to the fact that we are compiling with no
optimization, but for completeness, we are taking the arguments passed to
main()
and storing them in its stack frame. a0
, which gets moved to a5
(1060a
), contains argc
, a 32-bit integer specifying the number of arguments
passed. Because we are targeting a 64
bit machine, word size is 32
bits,
so we can use sw
(“store word”) to store argc
in main
‘s stack frame close
to the bottom (10610
). Similarly, a1
, which contains argc
, a pointer (or
more specifically, a pointer to a pointer) to the arguments passed to the
program, gets put at the very bottom of the stack frame (1060c
).
Description: visualization of function prologue for
main()
. Note that the
mv a5,a0
instruction is omitted from numbered operations on the stack.
Now we are ready to jump to first()
. We see a similar function prologue,
before storing 1
into the a5
register (105d8
), subsequently storing the
value on the stack (105da
), then finally storing the stack address into a4
(105de
). Lastly, we update g
with the stack address from a4
, such that it
now points to a memory location where the value 1
is stored. Though we haven’t
reached where g
eventually points to 2
, we have already made the mistake
that can lead to a “use after return” vulnerability. If we step through our
program with GDB, we can identify the exact
address in g
after first()
.
In one terminal start QEMU, but wait for GDB to attach on port 1234
:
$ qemu-riscv64 -g 1234 a.out
And in another start GDB by attaching to QEMU and setting a break point at
first()
:
$ riscv64-unknown-linux-gnu-gdb a.out -ex "target remote :1234" -ex "break first"
(gdb) c
Continuing.
Breakpoint 1, 0x00000000000105d8 in first ()
(gdb) x/8i $pc
=> 0x105d8 : li a5,1
0x105da : sw a5,-20(s0)
0x105de : addi a4,s0,-20
0x105e2 : sd a4,-1616(gp)
0x105e6 : nop
0x105e8 : ld s0,24(sp)
0x105ea : addi sp,sp,32
0x105ec : ret
After continuing to first()
(GDB will frequently skip the function prologue by
default), we see the same function body that we dumped above. We are interested
in the address that is stored in a4
by the addi
instruction at address
0x105de
(
). It is a location in first
‘s stack frame calculated
using an offset of -20
from the frame pointer s0
.
(gdb) i r s0 a4 sp
s0 0x40007ffd90 0x40007ffd90
a4 0x40007ffd7c 274886294908
sp 0x40007ffd70 0x40007ffd70
Printing s0
, a4
, and sp
shows us that the address stored in a4
is in
fact 20
bytes below the frame pointer (0x40007ffd90 - 0x40007ffd7c = 0x14 = 20
) and 12
bytes above the stack pointer (0x40007ffd7c - 0x40007ffd70 = 0xc = 12
). This is fine as long as we are within first
‘s body, but as soon as we
return our stack frame changes. We can see this happening in the function
epilogue:
(gdb) x/4i $pc
=> 0x105e6 : nop
0x105e8 : ld s0,24(sp)
0x105ea : addi sp,sp,32
0x105ec : ret
We restore the previous frame pointer in 0x105e8
(
), then restore
the stack pointer in 0x105ea
(
). By the time we return to main
the address in a4
is outside of the frame (i.e. below the stack pointer):
(gdb) si
0x00000000000105e8 in first ()
(gdb) si
0x00000000000105ea in first ()
(gdb) si
0x00000000000105ec in first ()
(gdb) si
0x0000000000010618 in main ()
(gdb) i r s0 a4 sp
s0 0x40007ffdb0 0x40007ffdb0
a4 0x40007ffd7c 274886294908
sp 0x40007ffd90 0x40007ffd90
More importantly, g
is now a dangling
pointer.
Description: visualization of steps in
first()
, which includes: growing the
stack, storing1
on the stack, updatingg
to point to the location of1
on the stack, then shrinking the stack.
Our next instruction is a jump (jal
– “jump and link”) to second()
, where we
are again setting up a 32
byte stack frame, then storing a value in it:
(gdb) si
0x00000000000105ee in second ()
(gdb) x/8i $pc
=> 0x105ee : addi sp,sp,-32
0x105f0 : sd s0,24(sp)
0x105f2 : addi s0,sp,32
0x105f4 : li a5,2
0x105f6 : sw a5,-20(s0)
0x105fa : nop
0x105fc : ld s0,24(sp)
0x105fe : addi sp,sp,32
You’ll notice that the instruction at address 0x105f6
(
) is the
same instruction we saw in first()
at 0x105da
(
). Let’s step to
that instruction, and look at the contents of g
before and after.
(gdb) si
0x00000000000105f0 in second ()
(gdb) si
0x00000000000105f2 in second ()
(gdb) si
0x00000000000105f4 in second ()
(gdb) si
0x00000000000105f6 in second ()
(gdb) p /s (int*)g
$4 = (int *) 0x40007ffd7c
(gdb) x/d 0x40007ffd7c
0x40007ffd7c: 1
(gdb) si
0x00000000000105fa in second ()
(gdb) p /s (int*)g
$5 = (int *) 0x40007ffd7c
(gdb) x/d 0x40007ffd7c
0x40007ffd7c: 2
While g
continues to point at the same address (0x40007ffd7c
), that address
is now part of second
‘s stack frame, meaning that it is free to allocate local
variables in the range. Without explicitly assigning to g
, we have implicitly
changed its value. This is a fairly trivial example, but in more extreme cases
this type of implicit assignment could allow a user to manipulate the output, or
worse, the control flow, of the program just by supplying certain inputs.
Description: visualization of steps in
second()
, which includes: growing the
stack to include the address ing
(which currently contains1
),
overwriting the value to2
, then shrinking the stack again.
How Does Go Handle This?
One of the reasons why folks reach for “memory safe” languages these days is to
avoid accidentally introducing vulnerabilities with mistakes like the one
described above. One way “memory safe” languages handle this is via a built-in
runtime and garbage collector. Go is a popular language that takes this
approach. Let’s look at a sibling program to the one we have already been
exploring.
main.go
package main
import "fmt"
var g *int
func first() {
a := 1
g = &a
}
func second() {
b := 2
// Assign to blank identifier to satisfy compiler.
_ = b
}
func main() {
first()
second()
fmt.Println(*g)
}
This is almost identical to the C program we saw before. However, when we
compile we’ll see the generated machine code is quite different. As before,
we’ll disable optimization using the following Go compiler flags:
-N
: disable optimizations-l
: disable inlining-m
: print optimization decisions
$ GOOS=linux GOARCH=riscv64 go build -gcflags '-N -l -m' main.go
# command-line-arguments
./main.go:8:2: moved to heap: a
./main.go:21:13: ... argument does not escape
./main.go:21:14: *g escapes to heap
Go Version: 1.18.4
If we run the program, we’ll see we that, unlike our C program, we get the
expected value of 1
.
Without even examining the machine code, the -m
flag is causing the compiler
to emit information about how it is handling our assignment of the address of a
local variable (a
) with a temporary lifetime to a global variable (g
) that
outlives it. This is explained in the Go
FAQ:
How do I know whether a variable is allocated on the heap or the stack?
From a correctness standpoint, you don’t need to know. Each variable in Go
exists as long as there are references to it. The storage location chosen by
the implementation is irrelevant to the semantics of the language.
The storage location does have an effect on writing efficient programs. When
possible, the Go compilers will allo