Stack Use After Return by azhenley

Share This Article

Sed ut perspiciatis unde.

August 1, 2022

If you write any code that deals with manual memory management, you are likely
familiar with the concept of a “use after free” bug. These bugs can be the
source of, at best, program crashes, and at worst serious vulnerabilities. A
lesser discussed counterpart to use after free, is “use after return”. In some
cases, the latter can be even more troublesome, due to the operations that are
performed when one procedure calls
another.
In this post, we’ll take a look at what happens under the hood when a RISC-V
program includes a use after return bug, as well as how “higher-level”
programming languages can guard against this behavior, at varying levels of cost
to the programmer.

If you aren’t familiar with the RISC-V Bytes
series, it may be worth
giving our first post, Cross-Platform Debugging with QEMU and
GDB, a quick read to
get the necessary tools installed to follow along.

A Small C Program

Let’s start off by taking a look at a small C program:

main.c


#include 

int *g;

void first()
{
  int a = 1;
  g = &a;
}

void second()
{
  int b = 2;
}

int main(int argc, char **argv)
{
  first();
  second();
  printf("%dn", *g);
  return 0;
}

This program is meant to run in userspace (U mode in RISC-V
vernacular),
meaning that it depends on some initialization that runs before main() (via
crt0), as well as system libraries and
functionality offered by the operating system, such as the ability to print to
stdout. We have one global
variable, g, which is a pointer to a 32-bit integer (the default in C for our
64-bit machine), and we call two procedures, first() then second(), before
printing the contents of the address pointed to by g, then exiting.

You may be able to guess what the output of this program will be, but let’s
compile it, then run it and see. We’ll use an unoptimized build to get a
simplified, albeit somewhat unrealistic, picture of what is going on.


$ riscv64-unknown-linux-gnu-gcc -static  main.c

GCC Version: 11.1.0

Note: gcc compiles dynamically linked
executables
by default, but we opt to pass the -static flag here to compile a
statically linked
executable.
This makes it a bit easier to run on a different host architecture via
binfmt_misc because we don’t
have to invoke our RISC-V dynamic linker at runtime.

Let’s run our program and check the output:

Interesting! Despite only ever assigning the address of a, which contains the
32-bit integer 1, to g, the contents of that address contain the value 2
when we print. We do assign 2 to the variable b in second(), but how does
that end up as the value at the address in g? The crux of the issue is that we
are using variables together that have different lifetimes. While g is
defined at the global scope, and thus exists for the entire lifetime of the
program, a is defined only in the scope of first(), meaning that the program
has no concept of a once the function returns.

But we still haven’t explained why the value stored in b, which has a lifetime
scoped to second() has ended up at the address stored in g. Let’s take a
look at what is happening in the generated machine code.

$ riscv64-unknown-linux-gnu-objdump -D a.out | less

Note: there are many ways to explore the disassembled content of an
executable. A pattern that I have found useful is piping objdump -D into
less, then searching for the symbol I’m looking for, in this case using
/main. You can step forward through the matches using n, and step backward
using Shift+N.


00000000000105d2 <first>:
   105d2:       1101                    addi    sp,sp,-32
   105d4:       ec22                    sd      s0,24(sp)
   105d6:       1000                    addi    s0,sp,32
   105d8:       4785                    li      a5,1
   105da:       fef42623                sw      a5,-20(s0)
   105de:       fec40713                addi    a4,s0,-20
   105e2:       9ae1b823                sd      a4,-1616(gp) # 70a78 
   105e6:       0001                    nop
   105e8:       6462                    ld      s0,24(sp)
   105ea:       6105                    addi    sp,sp,32
   105ec:       8082                    ret

00000000000105ee <second>:
   105ee:       1101                    addi    sp,sp,-32
   105f0:       ec22                    sd      s0,24(sp)
   105f2:       1000                    addi    s0,sp,32
   105f4:       4789                    li      a5,2
   105f6:       fef42623                sw      a5,-20(s0)
   105fa:       0001                    nop
   105fc:       6462                    ld      s0,24(sp)
   105fe:       6105                    addi    sp,sp,32
   10600:       8082                    ret

0000000000010602 <main>:
   10602:       1101                    addi    sp,sp,-32
   10604:       ec06                    sd      ra,24(sp)
   10606:       e822                    sd      s0,16(sp)
   10608:       1000                    addi    s0,sp,32
   1060a:       87aa                    mv      a5,a0
   1060c:       feb43023                sd      a1,-32(s0)
   10610:       fef42623                sw      a5,-20(s0)
   10614:       fbfff0ef                jal     ra,105d2 
   10618:       fd7ff0ef                jal     ra,105ee 
   1061c:       9b01b783                ld      a5,-1616(gp) # 70a78 
   10620:       439c                    lw      a5,0(a5)
   10622:       85be                    mv      a1,a5
   10624:       0004c7b7                lui     a5,0x4c
   10628:       26078513                addi    a0,a5,608 # 4c260 
   1062c:       5cc040ef                jal     ra,14bf8 <_io_printf>
   10630:       4781                    li      a5,0
   10632:       853e                    mv      a0,a5
   10634:       60e2                    ld      ra,24(sp)
   10636:       6442                    ld      s0,16(sp)
   10638:       6105                    addi    sp,sp,32
   1063a:       8082                    ret

Check yourself: why is the difference in address from one instruction to
another sometimes 2 (2 bytes == 16 bits) (example: 10604 - 10602 = 2)
and sometimes 4 (4 bytes == 32 bits) (example: 10614 - 10610 = 4)? Our
64-bit RISC-V GCC install is using -march=rv64imafdc. The final extension,
c, corresponds to “Compressed”, which allows the compiler to compress some
instructions to reduce code size. Every instruction in RISC-V is 32 bits,
but when the compressed extension is enabled, some instructions can be
represented in just 16 bites.

Starting with main, we see our typical function prologue, where the stack is
being grown by 32 bytes (10602), the return address (ra) is being stored
at the top of the stack frame (10604), and the previous frame pointer (s0)
is being stored just below it (10606) before eventually updating the frame
pointer to point to the top of main‘s 32 byte stack frame (10608). The
next three instructions are not consequential for our investigation today, and
are only present here due to the fact that we are compiling with no
optimization, but for completeness, we are taking the arguments passed to
main() and storing them in its stack frame. a0, which gets moved to a5
(1060a), contains argc, a 32-bit integer specifying the number of arguments
passed. Because we are targeting a 64 bit machine, word size is 32 bits,
so we can use sw (“store word”) to store argc in main‘s stack frame close
to the bottom (10610). Similarly, a1, which contains argc, a pointer (or
more specifically, a pointer to a pointer) to the arguments passed to the
program, gets put at the very bottom of the stack frame (1060c).

Description: visualization of function prologue for main(). Note that the
mv a5,a0 instruction is omitted from numbered operations on the stack.

Now we are ready to jump to first(). We see a similar function prologue,
before storing 1 into the a5 register (105d8), subsequently storing the
value on the stack (105da), then finally storing the stack address into a4
(105de). Lastly, we update g with the stack address from a4, such that it
now points to a memory location where the value 1 is stored. Though we haven’t
reached where g eventually points to 2, we have already made the mistake
that can lead to a “use after return” vulnerability. If we step through our
program with GDB, we can identify the exact
address in g after first().

In one terminal start QEMU, but wait for GDB to attach on port 1234:

$ qemu-riscv64 -g 1234 a.out

And in another start GDB by attaching to QEMU and setting a break point at
first():


$ riscv64-unknown-linux-gnu-gdb a.out -ex "target remote :1234" -ex "break first"
(gdb) c
Continuing.

Breakpoint 1, 0x00000000000105d8 in first ()
(gdb) x/8i $pc
=> 0x105d8 :	li	a5,1
   0x105da :	sw	a5,-20(s0)
   0x105de :	addi	a4,s0,-20
   0x105e2 :	sd	a4,-1616(gp)
   0x105e6 :	nop
   0x105e8 :	ld	s0,24(sp)
   0x105ea :	addi	sp,sp,32
   0x105ec :	ret

After continuing to first() (GDB will frequently skip the function prologue by
default), we see the same function body that we dumped above. We are interested
in the address that is stored in a4 by the addi instruction at address
0x105de (). It is a location in first‘s stack frame calculated
using an offset of -20 from the frame pointer s0.


(gdb) i r s0 a4 sp
s0             0x40007ffd90        0x40007ffd90
a4             0x40007ffd7c	       274886294908
sp             0x40007ffd70	       0x40007ffd70

Printing s0, a4, and sp shows us that the address stored in a4 is in
fact 20 bytes below the frame pointer (0x40007ffd90 - 0x40007ffd7c = 0x14 = 20) and 12 bytes above the stack pointer (0x40007ffd7c - 0x40007ffd70 = 0xc = 12). This is fine as long as we are within first‘s body, but as soon as we
return our stack frame changes. We can see this happening in the function
epilogue:


(gdb) x/4i $pc
=> 0x105e6 :	nop
   0x105e8 :	ld	s0,24(sp)
   0x105ea :	addi	sp,sp,32
   0x105ec :	ret

We restore the previous frame pointer in 0x105e8 (), then restore
the stack pointer in 0x105ea (). By the time we return to main
the address in a4 is outside of the frame (i.e. below the stack pointer):


(gdb) si
0x00000000000105e8 in first ()
(gdb) si
0x00000000000105ea in first ()
(gdb) si
0x00000000000105ec in first ()
(gdb) si
0x0000000000010618 in main ()
(gdb) i r s0 a4 sp
s0             0x40007ffdb0        0x40007ffdb0
a4             0x40007ffd7c	       274886294908
sp             0x40007ffd90	       0x40007ffd90

More importantly, g is now a dangling
pointer.

Description: visualization of steps in first(), which includes: growing the
stack, storing 1 on the stack, updating g to point to the location of 1
on the stack, then shrinking the stack.

Our next instruction is a jump (jal – “jump and link”) to second(), where we
are again setting up a 32 byte stack frame, then storing a value in it:


(gdb) si
0x00000000000105ee in second ()
(gdb) x/8i $pc
=> 0x105ee :	addi	sp,sp,-32
   0x105f0 :	sd	s0,24(sp)
   0x105f2 :	addi	s0,sp,32
   0x105f4 :	li	a5,2
   0x105f6 :	sw	a5,-20(s0)
   0x105fa :	nop
   0x105fc :	ld	s0,24(sp)
   0x105fe :	addi	sp,sp,32

You’ll notice that the instruction at address 0x105f6 () is the
same instruction we saw in first() at 0x105da (). Let’s step to
that instruction, and look at the contents of g before and after.

(gdb) si
0x00000000000105f0 in second ()
(gdb) si
0x00000000000105f2 in second ()
(gdb) si
0x00000000000105f4 in second ()
(gdb) si
0x00000000000105f6 in second ()
(gdb) p /s (int*)g
$4 = (int *) 0x40007ffd7c
(gdb) x/d 0x40007ffd7c
0x40007ffd7c:	1
(gdb) si
0x00000000000105fa in second ()
(gdb) p /s (int*)g
$5 = (int *) 0x40007ffd7c
(gdb) x/d 0x40007ffd7c
0x40007ffd7c:	2

While g continues to point at the same address (0x40007ffd7c), that address
is now part of second‘s stack frame, meaning that it is free to allocate local
variables in the range. Without explicitly assigning to g, we have implicitly
changed its value. This is a fairly trivial example, but in more extreme cases
this type of implicit assignment could allow a user to manipulate the output, or
worse, the control flow, of the program just by supplying certain inputs.

Description: visualization of steps in second(), which includes: growing the
stack to include the address in g (which currently contains 1),
overwriting the value to 2, then shrinking the stack again.

How Does Go Handle This?

One of the reasons why folks reach for “memory safe” languages these days is to
avoid accidentally introducing vulnerabilities with mistakes like the one
described above. One way “memory safe” languages handle this is via a built-in
runtime and garbage collector. Go is a popular language that takes this
approach. Let’s look at a sibling program to the one we have already been
exploring.

main.go


package main

import "fmt"

var g *int

func first() {
	a := 1
	g = &a
}

func second() {
	b := 2
   // Assign to blank identifier to satisfy compiler.
	_ = b
}

func main() {
	first()
	second()
	fmt.Println(*g)
}

This is almost identical to the C program we saw before. However, when we
compile we’ll see the generated machine code is quite different. As before,
we’ll disable optimization using the following Go compiler flags:

-N: disable optimizations
-l: disable inlining
-m: print optimization decisions


$ GOOS=linux GOARCH=riscv64 go build -gcflags '-N -l -m' main.go
# command-line-arguments
./main.go:8:2: moved to heap: a
./main.go:21:13: ... argument does not escape
./main.go:21:14: *g escapes to heap

Go Version: 1.18.4

If we run the program, we’ll see we that, unlike our C program, we get the
expected value of 1.

Without even examining the machine code, the -m flag is causing the compiler
to emit information about how it is handling our assignment of the address of a
local variable (a) with a temporary lifetime to a global variable (g) that
outlives it. This is explained in the Go
FAQ:

How do I know whether a variable is allocated on the heap or the stack?

From a correctness standpoint, you don’t need to know. Each variable in Go
exists as long as there are references to it. The storage location chosen by
the implementation is irrelevant to the semantics of the language.

The storage location does have an effect on writing efficient programs. When
possible, the Go compilers will allo

Stack Use After Return by azhenley

Stack Use After Return by azhenley

Share This Article

Newsletter

August 1, 2022

A Small C Program

How Does Go Handle This?

HackTech

Leave a comment Cancel reply

Editor's Choice

Stack Use After Return by azhenley

Stack Use After Return by azhenley

Share This Article

Newsletter

A Small C Program

How Does Go Handle This?

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter