Good day, comrades!
Today I’d like to share the good news that WebAssembly is finally coming
for the rest of us weirdos.
WebAssembly for the rest of us
17 Mar 2023 – BOB 2023
Andy Wingo
Igalia, S.L.
This is a transcript-alike of a talk that I gave last week at BOB
2023, a gathering in Berlin of
people that are using “technologies beyond the mainstream” to get things
done: Haskell, Clojure, Elixir, and so on. PDF slides here, and I’ll link the video too when it becomes available.
WebAssembly, the story
WebAssembly is an exciting new universal compute platform
WebAssembly: what even is it? Not a programming language that
you would write software in, but rather a compilation target: a sort of
assembly language, if you will.
WebAssembly, the pitch
Predictable portable performance
- Low-level
- Within 10% of native
Reliable composition via isolation
- Modules share nothing by default
- No nasal demons
- Memory sandboxing
Compile your code to WebAssembly for easier distribution and composition
If you look at what the characteristics of WebAssembly are as an
abstract machine, to me there are two main areas in which it is an
advance over the alternatives.
Firstly it’s “close to the metal” — if you compile for example an
image-processing library to WebAssembly and run it, you’ll get similar
performance when compared to compiling it to x86-64 or ARMv8 or what
have you. (For image processing in particular, native still generally
wins because the SIMD primitives in WebAssembly are more narrow and
because getting the image into and out of WebAssembly may imply a copy,
but the general point remains.) WebAssembly’s instruction set covers a
broad range of low-level operations that allows compilers to produce
efficient code.
The novelty here is that WebAssembly is both portable while also being
successful. We language weirdos know that it’s not enough to do
something technically better: you have to also succeed in getting
traction for your alternative.
The second interesting characteristic is that WebAssembly is (generally
speaking) a principle-of-least-authority architecture: a WebAssembly
module starts with access to nothing but itself. Any capabilities that
an instance of a module has must be explicitly shared with it by the
host at instantiation-time. This is unlike DLLs which have access to
all of main memory, or JavaScript libraries which can mutate global
objects. This characteristic allows WebAssembly modules to be reliably
composed into larger systems.
WebAssembly, the hype
It’s in all browsers! Serve your code to anyone in the world!
It’s on the edge! Run code from your web site close to your users!
Compose a library (eg: Expat) into your program (eg: Firefox), without risk!
It’s the new lightweight virtualization: Wasm is what containers were to VMs! Give me that Kubernetes cash!!!
Again, the remarkable thing about WebAssembly is that it is succeeding!
It’s on all of your phones, all your desktop web browsers, all of the
content distribution networks, and in some cases it seems set to replace
containers in the cloud. Launch the rocket emojis!
WebAssembly, the reality
WebAssembly is a weird backend for a C compiler
Only some source languages are having success on WebAssembly
What about Haskell, Ocaml, Scheme, F#, and so on – what about us?
Are we just lazy? (Well…)
So why aren’t we there? Where is Clojure-on-WebAssembly? Where are the
F#, the Elixir, the Haskell compilers? Some early efforts exist, but
they aren’t really succeeding. Why is that? Are we just not putting in
the effort? Why is it that Rust gets to ride on the rocket ship but Scheme does not?
WebAssembly, the reality (2)
WebAssembly (1.0, 2.0) is not well-suited to garbage-collected languages
Let’s look into why
As it turns out, there is a reason that there is no good Scheme
implementation on WebAssembly: the initial version of WebAssembly is a
terrible target if your language relies on the presence of a garbage
collector. There have been some advances but this observation still
applies to the current standardized and deployed versions of
WebAssembly. To better understand this issue, let’s dig into the guts
of the system to see what the limitations are.
GC and WebAssembly 1.0
Where do garbage-collected values live?
For WebAssembly 1.0, only possible answer: linear memory
(module (global $hp (mut i32) (i32.const 0)) (memory $mem 10)) ;; 640 kB
The primitive that WebAssembly 1.0 gives you to represent your data is
what is called linear memory: just a buffer of bytes to which you can
read and write. It’s pretty much like what you get when compiling
natively, except that the memory layout is more simple. You can obtain
this memory in units of 64-kilobyte pages. In the example above we’re
going to request 10 pages, for 640 kB. Should be enough, right? We’ll
just use it all for the garbage collector, with a bump-pointer
allocator. The heap pointer / allocation pointer is kept in the mutable
global variable $hp.
(func $alloc (param $size i32) (result i32) (local $ret i32) (loop $retry (local.set $ret (global.get $hp)) (global.set $hp (i32.add (local.get $size) (local.get $ret))) (br_if 1 (i32.lt_u (i32.shr_u (global.get $hp) 16) (memory.size)) (local.get $ret)) (call $gc) (br $retry)))
Here’s what an allocation function might look like. The allocation
function $alloc is like malloc: it takes a number of bytes and returns
a pointer. In WebAssembly, a pointer to memory is just an offset, which
is a 32-bit integer (i32). (Having the option of a 64-bit address
space is planned but not yet standard.)
If this is your first time seeing the text representation of a
WebAssembly function, you’re in for a treat, but that’s not the point of
the presentation :) What I’d like to focus on is the (call $gc) —
what happens when the allocation pointer reaches the end of the region?
GC and WebAssembly 1.0 (2)
What hides behind (call $gc) ?
Ship a GC over linear memory
Stop-the-world, not parallel, not concurrent
But… roots.
The first thing to note is that you have to provide the $gc yourself.
Of course, this is doable — this is what we do when compiling to a
native target.
Unfortunately though the multithreading support in WebAssembly is
somewhat underpowered; it lets you share memory and use atomic
operations but you have to create the threads outside WebAssembly. In
practice probably the GC that you ship will not take advantage of
threads and so it will be rather primitive, deferring all collection
work to a stop-the-world phase.
GC and WebAssembly 1.0 (3)
Live objects are
- the roots
- any object referenced by a live object
Roots are globals and locals in active stack frames
No way to visit active stack frames
What’s worse though is that you have no access to roots on the stack. A
GC has to keep live objects, as defined circularly as any object
referenced by a root, or any object referenced by a live object. It
starts with the roots: global variables and any GC-managed object
referenced by an active stack frame.
But there we run into problems, because in WebAssembly (any version, not
just 1.0) you can’t iterate over the stack, so you can’t find active
stack frames, so you can’t find the stack roots. (Sometimes people want
to support this as a low-level
capability
but generally speaking the consensus would appear to be that overall
performance will be better if the engine is the one that is responsible
for implementing the GC; but that is foreshadowing!)
GC and WebAssembly 1.0 (3)
Workarounds
- handle stack for precise roots
- spill all possibly-pointer values to linear memory and collect conservatively
Handle book-keeping a drag for compiled code
Given the noniterability of the stack, there are basically two
work-arounds. One is to have the compiler and run-time maintain an
explicit stack of object roots, which the garbage collector can know for
sure are pointers. This is nice because it lets you move objects. But,
maintaining the stack is overhead; the state of the art solution is
rather to create a side table (a “stack map”) associating each potential
point at which GC can be called with instructions on how to find the
roots.
The other workaround is to spill the whole stack to memory. Or,
possibly just pointer-like values; anyway, you conservatively scan all
words for things that might be roots. But instead of having access to
the memory to which the WebAssembly implementation would spill your
stack, you have to do it yourself. This can be OK but it’s sub-optimal;
see my recent post on the Whippet garbage
collector
for a deeper discussion of the implications of conservative
root-finding.
GC and WebAssembly 1.0 (4)
Cycles with external objects (e.g. JavaScript) uncollectable
A pointer to a GC-managed object is an offset to linear memory, need capability over linear memory to read/write object from outside world
No way to give back memory to the OS
Gut check: gut says no
If that were all, it would already be not so great, but it gets worse!
Another problem with linear-memory GC is that it limits the potential
for composing a number of modules and the host together, because the
garbage collector that manages JavaScript objects in a web browser knows
nothing about your garbage collector over your linear memory. You can
easi