There are two types of programming languages: garbage-collected programming languages and programming languages that require manual memory management. Examples of the former, among many more, are Kotlin, PHP, or Java. Examples of the latter are C, C++, or Rust. As a general rule, higher-level programming languages are more likely to have garbage collection as a standard feature. In this blog post, the focus is on such garbage-collected programming languages and how they can be compiled to WebAssembly (Wasm). But what is garbage collection (often referred to as GC) to begin with?
Browser support
- Chrome 119, Supported 119
- Firefox, Not supported
- Edge 119, Supported 119
- Safari, Not supported
Garbage collection
In simplified terms, the idea of garbage collection is the attempt to reclaim memory which was allocated by the program, but that is no longer referenced. Such memory is called garbage. There are many strategies for implementing garbage collection. One of them is reference counting where the objective is to count the number of references to objects in memory. When there are no more references to an object, it can be marked as no longer used and thus ready for garbage collection. PHP‘s garbage collector uses reference counting, and using the Xdebug extension’s xdebug_debug_zval()
function allows you to peek under its hood. Consider the following PHP program.
$a= (string) rand();
$c = $b = $a;
$b = 42;
unset($c);
$a = null;
?>
The program assigns a random number casted to a string to a new variable called a
. It then creates two new variables, b
and c
, and assigns them the value of a
. After that, it reassigns b
to the number 42
, and then unsets c
. Finally, it sets the value of a
to null
. Annotating each step of the program with xdebug_debug_zval()
, you can see the garbage collector’s reference counter at work.
$a= (string) rand();
$c = $b = $a;
xdebug_debug_zval('a');
$b = 42;
xdebug_debug_zval('a');
unset($c);
xdebug_debug_zval('a');
$a = null;
xdebug_debug_zval('a');
?>
The above example will output the following logs, where you see how the number of references to the value of the variable a
decreases after each step, which makes sense given the code sequence. (Your random number will be different of course.)
a:
(refcount=3, is_ref=0)string '419796578' (length=9)
a:
(refcount=2, is_ref=0)string '419796578' (length=9)
a:
(refcount=1, is_ref=0)string '419796578' (length=9)
a:
(refcount=0, is_ref=0)null
Reference counting is used in PHP, but most modern browsers now don’t use reference-counting for garbage collection.
There are other challenges with garbage collection, like detecting cycles, but for this article, having a basic level of understanding of reference counting is enough.
Programming languages are implemented in other programming languages
It may feel like inception, but programming languages are implemented in other programming languages. For example, the PHP runtime is primarily implemented in C. You can check out the PHP source code on GitHub. PHP’s garbage collection code is mainly located in the file zend_gc.c
. Most developers will install PHP via the package manager of their operating system. But developers can also build PHP from the source code. For example, in a Linux environment, the steps ./buildconf && ./configure && make
would build PHP for the Linux runtime. But this also means that the PHP runtime can be compiled for other runtimes, like, you guessed it, Wasm.
Traditional methods of porting languages to the Wasm runtime
Independently from the platform PHP is running on, PHP scripts are compiled into the same bytecode and run by the Zend Engine. The Zend Engine is a compiler and runtime environment for the PHP scripting language. It consists of the Zend Virtual Machine (VM), which is composed of the Zend Compiler and the Zend Executor. Languages like PHP that are implemented in other high-level languages like C commonly have optimizations that target specific architectures, such as Intel or ARM, and require a different backend for each architecture. In this context, Wasm represents a new architecture. If the VM has architecture-specific code, like just-in-time (JIT) or ahead-of-time (AOT) compilation, then the developer also implements a backend for JIT/AOT for the new architecture. This approach makes a lot of sense because often the main part of the codebase can just be recompiled for each new architecture.
Given how low-level Wasm is
?phpspan>