Async/await syntax in Rust was initially released to much fanfare and excitement. To quote Hacker
News at the time:
This is going to open the flood gates. I am sure lot of people were just waiting for this moment
for Rust adoption. I for one was definitely in this boat.Also, this has all the goodness: open-source, high quality engineering, design in open, large
contributors to a complex piece of software. Truly inspiring!
Recently, the reception has been a bit more mixed. To a quote a comment on Hacker News again,
discussing a recent blog post on the subject:
I genuinely can’t understand how anybody could look at the mess that’s Rust’s async and think that
it was a good design for a language that already had the reputation of being very complicated to
write.I tried to get it, I really did, but my god what a massive mess that is. And it contaminates
everything it touches, too. I really love Rust and I do most of my coding in it these days, but
every time I encounter async-heavy Rust code my jaw clenches and my vision blurs.
Of course, neither of these comments are completely representative: even four years ago, some people
had pointed concerns. And in the same thread as this comment about jaws clenching and vision
blurring, there were many people defending async Rust with equal fervor. But I don’t think I would
be out of pocket to say that the nay-sayers have grown more numerous and their tone more strident as
time has gone on. To some extent this is just the natural progression of the hype cycle, but I also
think as we have become more distant from the original design process, some of the context has been
lost.
Between 2017 and 2019, I drove the design of async/await syntax, in collaboration with others and
building on the work of those who came before me. Forgive me if I am put a bit off when someone says
that they don’t know how anyone could look at that “mess” and “think that it was a good design,” and
please indulge me in this imperfectly organized and overly long explanation of how async Rust came
to exist, what its purpose was, and why, in my opinion, for Rust there was no viable alternative. I
hope that along the way I might shed more light on the design of Rust in a broader and deeper sense,
at least slightly, and not merely regurgitate the justifications of the past.
The basic issue at stake in this debate is Rust’s decision to use a “stackless coroutine” approach
to implementing user-space concurrency. A lot of terms are thrown around in this discussion and its
reasonable not to be familiar with all of them.
The first concept we need to get straight is the very purpose of the feature: “user-space
concurrency.” The major operating systems present a set of fairly similar interfaces to achieve
concurrency: you can spawn threads, and perform IO on those threads using syscalls, which block that
thread until they complete. The problem with these interfaces is that they involve certain overheads
that can become a limiting factor when you want to achieve certain performance targets. These are
two-fold:
- Context-switching between the kernel and userspace is expensive in terms of CPU cycles.
- OS threads have a large pre-allocated stack, which increases per-thread memory overhead.
These limitations are fine up to a certain point, but for massively concurrent programs they do not
work. The solution is to use a non-blocking IO interface and schedule many concurrent operations on
a single OS thread. This can be done by the programmer “by hand,” but modern languages frequently
provide facilities to make this easier. Abstractly, languages have some way of dividing work into
tasks and scheduling those tasks onto threads. Rust’s system for this is async/await.
The first axis of choice in this design space is between cooperative and preemptive scheduling.
Must tasks “cooperatively” yield control back to the scheduling subsystem, or can they be
“preemptively” stopped at some point while they’re running, without the task being aware of it?
A term that gets thrown around a lot in these discussions is coroutine, and it is used in somewhat
contradictory ways. A coroutine is a function which can be paused and then later resumed. The big
ambiguity is that some people use the term “coroutine” to mean a function which has explicit syntax
for pausing and resuming it (this would correspond to a cooperatively scheduled task) and some
people use it to mean any function that can pause, even if the pause is performed implicitly by a
language runtime (this would also include a preemptively scheduled task). I prefer the first
definition, because it introduces some manner of meaningful distinction.
Goroutines, on the other hand, are a Go language feature which enables concurrent, preemptively
scheduled tasks. They have an API that is the same as a thread, but it is implemented as part of the
language instead of as an operating system primitive, and in other languages they are often called
virtual threads or else green threads. So by my definition, goroutines are not coroutines, but
other people use the broader definition and say goroutines are a kind of coroutine. I’ll refer to
this approach as green threads, because that’s been the terminology used in Rust.
The second axis of choice is between a stackful and a stackless coroutine. A stackful coroutine
has a program stack in the same way that an OS thread has a program stack: as functions are called
as part of the coroutine, their frames are pushed on the stack; when the coroutine yields, the state
of the stack is saved so that it can be resumed from the same position. A stackless coroutine on
the other hand stores the state it needs to resume in a different way, such as in a continuation
or in a state machine. When it yields, the stack it was using is used by the operation that took
over from it, and when it resumes it takes back control of the stack and that continuation or state
machine is used to resume the coroutine where it left off.
One issue that is often brought up with async/await (in Rust and other languages) is the “function
coloring problem” – a complaint that in order to get the result of an async function, you need to
use a different operation (such as awaiting it) rather than call it normally. Both green threads and
stackful coroutine mechanisms can avoid this outcome, because it’s that special syntax that is used
to indicate that something special is happening to manage the stackless state of the coroutine
(what specifically depends on the language).
Rust’s async/await syntax is an example of a stackless coroutine mechanism: an async function is
compiled to a function which returns a Future
, and that future is what is used to store the state
of the coroutine when it yields control. The basic question at hand in this debate is whether Rust
was correct to adopt this approach, or if it should have adopted a more Go-like “stackful” or “green
thread” approach, ideally without explicit syntax that “colors” functions.
Green threads
A third Hacker News comment represents well the kind of remark that I often see in this
debate:
The alternative concurrency model people want is structured concurrency via stackful coroutines
and channels on top of a work stealing executor.Until someone does the work to demo that and compare it to async/await with futures I don’t think
there’s any productive discussion to be had.
Setting aside the references to structured concurrency, channels and a work stealing executor
(completely orthogonal concerns), the bewildering thing about comments like this is that originally
Rust did have a stackful coroutine mechanism, in the form of green threads. It was removed in late
2014, shortly before the 1.0 release. Understanding the reason why will help us get to the bottom of
why Rust shipped async/await syntax.
A big issue for any green threading system – Rust’s or Go’s or any other language’s – is what to do
about the program stack for these threads. Remember that one of the goals of a user-space
concurrency mechanism is to reduce the memory overhead of the large, pre-allocated stack used by OS
threads. Therefore, green thread libraries tend to try to adopt a mechanism to spawn threads with
smaller stacks, and grow them only as needed.
One way to achieve this is so-called “segmented stacks,” in which the stack is a linked list of
small stack segments; when the stack grows beyond the bound of its segment, a new segment is added
to the list, and when it shrinks, that segment is removed. The problem with this technique is that
introduces a high variability in the cost of pushing a stack frame onto the stack. If the frame fits
in the current segment, this is basically free. If it doesn’t, it requires allocating a new segment.
A particularly pernicious version of this is when a function call in a hot loop requires allocating
a new segment. This adds an allocation and deallocation to every iteration of that loop, having a
significant impact on performance. And it is entirely opaque to users, because users don’t know how
deep the stack will be when a function is called. Both Rust and Go started with segmented stacks,
and then abandoned this approach for these reasons.
Another approach is called “stack copying.” In this case, a stack is more like a Vec
than a linked
list: when the stack hits its limit, it is reallocated larger so that the limit isn’t hit. This
allows stacks to start small and grow as needed, without the downsides of segmented stacks. The
problem with this is that reallocating the stack means copying it, which means the stack will now be
at a new location in memory. Any pointers into the stack are now invalid, and there needs to be some
mechanism for updating them.
Go uses stack copying, and benefits from the fact that in Go pointers into a stack can only
exist in the same stack, so it just needs to scan that stack to rewrite pointers. Even this requires
runtime type information which Rust doesn’t keep, but Rust also allows pointers into a stack that
aren’t stored inside that stack – they could be somewhere in the heap, or in the stack of another
thread. The problem of tracking these pointers is ultimately the same as the problem of garbage
collection, except that instead of freeing memory it is moving it. Rust could not adopt this
approach because Rust does not have a garbage collector, so in the end it could not adopt stack
copying. Instead, Rust solved the problem of segmented stacks by making its green threads large,
just like OS threads. But this eliminated one of the key advantages of green threads.
Even in a situation like Go, which can have resizing stacks, green threads carry certain unavoidable
costs when trying to integrate with libraries written in other languages. The C ABI, with its OS
stack, is the shared minimum of every language. Switching code from executing on a green thread to
running on the OS thread stack can b