This is one of those times where I got so fascinated by the idea of a thing
that I forgot to ask myself whether it’s a good idea to build the thing. The
idea being, transpiling JavaScript to C++ so I can compile that to whatever
I need.
I have arrived at the conclusion that I don’t think this specific approach is
worth exploring further. This made it hard to write a blog post because I took
so many shortcuts and introduced so many flaws, that I won’t be able to tell
a consistent or coherent story. At the same time, I think it’s better to have
a half-assed blog post where I can explain my thought process and share my
lessons learned than to have no blog post at all. So here we go.
The proof-of-concept implementation, and by extension this blog post, follow
the principles of evolutionary design. I took many, many shortcuts and left
many parts of this system incomplete as I prioritized making it over the finish
line. I hope despite all of that, that there’s still some interesting bits in
here for you.
The Spark
While the end result of my exploration is not specific to WebAssembly (in fact,
it arguably works better without getting WebAssembly involved), the original
motivation was very much this: Running JavaScript in WebAssembly.
At work, I have been wrapping my head around Shopify Functions. I don’t
want to get too much into the business pitch, but Shopify Functions boil
down to Shopify running your code on their servers, tightly integrated
with the rest of their business logic. This allows developers to deeply
customize Shopify, even in performance-critical sections of the pipeline.
In ecommerce, both security and performance are paramount, so WebAssembly –
bringing predictable performance and a strong sandbox – makes sense as the
fundamental piece of technology. A third-party developer can inject
arbitrary code written in theoretically any language, while Shopify can remain
in control over how these code fragments are allowed to affect the rest of the
system. Shopify accepts any WASI-compatible Wasm module with a maximum module
size of 250KB.
At the time of writing, all WebAssembly extension points Shopify offers to
have a “JSON in, JSON out” architecture. Being a web developer, I was craving
to write my Shopify Functions in JavaScript — but alas, JavaScript does not
compile to WebAssembly. Or does it?
JS in Wasm the easy way
To run JavaScript in Wasm, one solution is to compile a JS engine to Wasm,
and have it parse and execute your JS code. Engines like V8 or SpiderMonkey are
massive and won’t easily compile to Wasm, not to mention the fact that JIT’ing
as a concept is not possible in Wasm right now. That didn’t stop the
ByteCodeAlliance from compiling Spidermonkey to WebAssembly, but
I was pretty sure it wasn’t going to be a small output module.
JIT’ing: WebAssembly is designed to store the instructions immutably and
separately from the memory that the instructions work on. That means that, at
least as of now, a Wasm module cannot generate instructions and subsequently
execute them.
Instead I was looking at JS interpreters and VMs. The Shopify Functions team
created javy, a toolchain that compiles a JS VM to Wasm and embeds your
JS in the Wasm module. The engine that javy relies on is QuickJS, a small
JavaScript VM that is fully ES2015 compliant. It was written by Fabrice
Bellard, who also created qemu, ffmpeg and tcc. The problem is that the
resulting Wasm module is slightly over 250KB. It was close enough that I tried
removing the JS parser and only compile QuickJS’s byte code VM. Alas, no
cigar. Even removing unused globals (like ArrayBuffer
or Symbol
) did not
get me under the limit.
The Shopify Functions team is looking into blessing a way to write functions in
JavaScript. In the meantime, I’ll be spending the rest of the blog post looking
into a less serious solution.
C++
One language that compiles really well to Wasm is C++. Most of the early days
of Wasm toolchains were focused on making C++ code run on the web, as C++
is often at the foundation of many big software projects. LLVM’s clang++
now supports Wasm out of the box, and WASI-SDK provides a sysroot (libc,
libc++ etc) that works against WASI rather than, say, POSIX. This allows
you to compile C/C++ code to WebAssembly, and run it in any WASI-compatible
environment (like wasmtime).
Now here finally comes my rather amateurish observation that led to this
blog post: I think JavaScript looks a lot like C++. In fact, most of the
features that JavaScript has to offer, C++20 has to offer as well. Often with
extremely similar syntax. What if I could write a transpiler of sorts that
translates JavaScript to C++ and aims to maintain the semantics and behavior
of JavaScript? Can I write a really dumb transpiler that defers all the
difficult stuff like type checking and scoping to the C++ compiler (mostly
because of my lack of experience in building compilers)? Would that yield
smaller binaries? Maybe even faster ones? Well, only one way to find out.
The North Star
As a north star for how capable I wanted my toy transpiler to be, I wrote an
admittedly convoluted JS program similar to the one below.
function* numbers() {
let i = 0;
const f = () => i++;
yield* [f(),f(),f()].map(i => i + 1);
}
const arr = [];
for(let x of numbers()) {
arr.push(x);
}
IO.write_to_stdout(arr.join(","));
The program is nonsense, of course, but it covers a good range of features that
I want to support: Variables, Functions, Output, Loops, Iterators, Generators,
Closures, Methods, … and the output is deterministic and well-defined as
well: 1,2,3
.
The Proof-of-Concept
Let me get the PoC out the way. I have called this exploration jsxx and
you can find all the source code on my GitHub. Be warned, though: This
is the first time I’m using C++20. I used to write C++ many years ago, at a time
where C++11 was considered bleeding edge. I did a lot of C when I was working
on microprocessors, and it still shows. Nowadays I mostly write JavaScript
and Rust. I used this as an opportunity to catch up on C++ and get a bit more
familiar with all the new stuff that C++20 has to offer. When I asked them, Sy
Brand recommended Josh Lospinoso’s book “C++ Crash Course”, which I have
read, enjoyed and can now only recommend myself. And supporting No Starch Press
is an added benefit.

That all being said, I’m sure my C++ is horrible, so please don’t look at it
too closely.
Using JSXX
The UI of the transpiler is also very basic. For example, using the north star
program from above, you can run jsxx
to compile JS to C++ and immediately
invoke clang++
to turn it into a native binary.
$ cat testprog.js | cargo run
$ ./output
1.000000,2.000000,3.000000
To compile to WebAssembly, use the --wasm
flag and provide the path to
WASI-SDK’s clang++
(and additional compiler flags, if desired):
$ cat testprog.js |
cargo run --
--wasm
--clang-path $HOME/Downloads/wasi-sdk-16.0/bin/clang++
-- -Oz -flto -Wl,--lto-O3
$ wasmtime output.wasm
1.000000,2.000000,3.000000
$ ls -alh output.wasm
-rwxr-xr-x 1 surma staff 86K Sep 29 19:05 output.wasm
$ cat output.wasm | brotli -q 11 -c | wc -c
29972
So I managed to run some fairly complex JavaScript in Wasm without writing a
whole engine, and ended up with a mere 86KiB (~30KiB brotli’d). That’s pretty
cool.
ES20-ohmygodwhathaveyoudone: Please don’t get too excited. This
transpiler supports a miniscule subset of JavaScript and is in no way compliant
to any ECMAScript standard. It could be, but as of now it’s not.
If you want to inspect the generated C++ code, pass the --emit-cpp
flag.
Anyhow, I don’t think this particular approach is worth pursuing any further.
To explain why I think that, I suppose I should explain how this approach
works.
JSXX
Let’s start with the most normal part of this setup. The parser.
Parsing
I didn’t want to write my own parser. That’s not where the interesting parts of
this project were going to happen. Since I wanted to write this transpiler in
Rust, I decided to just rip out the parser and the AST from swc, which would
allow me to parse even the most recent ES2022 syntax. My goal was to exploit
the similarities between JS and C++ to keep the transpiler extremely simple.
All it would do is traverse the JS AST and emit corresponding C++ code in a
single pass, without tracking variable scopes, types or any of that complicated
compiler-y stuff. Most of the meat would be in the runtime I was going to
write. The guiding principle here is: It doesn’t need to be pretty, it just
needs to compile.
Variables
A variable in JS can contain any of the primitive value types: bool
,
number
, string
, Function
, Object
or Array
(I suppose, an Array
is
just a special Object
, but I think modeling them separately actually makes
the implementation easier). There are technically more primitive types like
Symbol
or BigInt
, but I wasn’t gonna implement those.
At a syntactic level, this is easy to translate to C++, especially since C++
introduced the auto
keyword for variable declarations. However, a variable
has to have a single type, whereas a variable in JS can change its type as
often as it likes. Assigning a number and then a string to the same variable
is common in JS, but problematic for C++’s type system. I needed to introduce
a type that can hold any JS value. This type turned into the class JSValue
.
Before we look at the inside of the class, having the name is enough to
transpile a variable declaration.
let x = 4;
x = "hello";
… can be transpiled to C++ as …
auto x = JSValue{4};
x = JSValue{"hello"};
If I was writing C and wanted a variable to contain one of many types, I’d
use a union
, which is notoriously not type safe. C++ has a type-safe
counterpart to C’s union
, which is called std::variant
:
#include
class JSValue {
using Box = std::variant<JSBool,
JSNumber,
JSString,
JSFunction,
JSArray,
JSObject>;
Box box;
}
Using jsValue.box.index()
we can query what the type of the underlying value
is. With std::get
we can get access to the underlying
value. If we call std::get
with the wrong type the call will
throw an exception.
Primitive types
Most of the primitive types in JS have a direct counterpart in C++. A number
in JS maps to a C++ double
, a JS string
to a C++ std::string
(let’s
ignore details of WTF16 vs whatever string encoding is in C++), etc. However,
I decided to wrap each C++ primitive in a custom class because I knew I’d have
to add methods like .toString()
to them sooner or later, and that requires
a class.
IEEE-754: The ECMAScript spec demands that all
number
s be a IEEE-754
double-precision floating-point number (i.e. a C++double
). However, many
engines have an optimization to use integers under the hood if the code path
does not use fraction