Coming from OCaml, the Rust programming language has many appealing
features. Rust’s system for tracking lifetime and ownership allows
users to safely express patterns that are awkward in OCaml, such as:
- Stack-allocated values and custom allocation schemes.
- Managed resources that can’t be (easily) garbage collected,
e.g. file descriptors or GPU memory. - Mutable data structures in the presence of concurrency.
On the other hand, Rust’s approach comes with some trade-offs.
Eschewing garbage collection requires careful consideration of lifetime
and ownership throughout a codebase. Emphasizing lifetime-polymorphism
can also make type inference untenable, a design choice that wouldn’t
fit OCaml.
At Jane Street, we’ve been working on extending OCaml to better support
these use cases, without giving up the principles that make OCaml a
convenient and flexible language.
To do so, we’re introducing a system of modes, which track
properties like the locality and uniqueness of OCaml values. Modes
allow the compiler to emit better, lower-allocation code, empower
users to write safer APIs, and with the advent of multicore,
statically guarantee data race freedom—all in a lightweight way
that only affects those in need.
The OCaml compiler does not statically track lifetimes. Instead, it
relies on a garbage collector to figure out a suitable lifespan for each
value at runtime. Values are collected only after they become
unreferenced, so OCaml programs are memory-safe.
To a first approximation, this model requires allocating all values on
the heap. Fortunately, OCaml’s generational
GC can
efficiently handle short-lived values—minor-heap allocation
simply advances a ring buffer.
However, placing everything on the heap is still a pessimistic
approach. Where possible, using a specialized allocator could improve
performance. For example, the minor heap is typically larger than cache,
so future allocations are likely to evict live values. Stack allocation
would immediately re-use freed space, eliminating this concern.
Providing an alternative to heap allocation would also have other
benefits:
-
Every minor heap allocation brings us closer to the next minor
collection cycle. A minor collection incurs some fixed overhead,
but more importantly, frequent collection causes more values to be
moved to the major heap. Promoted values become much costlier to
collect later on. -
At Jane Street, we often write “zero-allocation” code, which must
never trigger a GC cycle. A stack allocator would make it much
easier to write programs that do not touch the heap.
When such performance concerns are relevant, one should arguably be
using a language based on explicit memory management, like Rust.
However, garbage collection is genuinely useful; explicit management is
a burden on users. Ideally, a language could provide a spectrum of
allocation strategies freely interoperable within a single
application. With modes, users can write OCaml with all the usual GC
guarantees—but when performance is paramount, opt into the
consideration of lifetimes, ownership, and concurrency.
Local Variables
In OCaml, it turns out that many short-lived values can be
stack-allocated. To safely refer to such values, we introduce
local variables.
Determining whether a variable is local involves checking a certain
condition on its lifetime. Consider the following function:
let is_int str =
let opt = Int.of_string_opt str in
match opt with
| Some _ -> true
| None -> false
;;
val is_int : string -> bool
Naively, this function incurs a heap allocation. The compiler does
not know the lifetime of opt
—our function could return it, or
even store it in a global variable. Because opt
could escape this
function, the value referenced by opt
may need to live forever.
Therefore, it must be heap-allocated.
As the programmer, however, we can deduce that a shorter lifetime
suffices. In fact, opt
only needs to live until we match
on it.
When is_int
returns, opt
is no longer accessible, so it could have
safely been allocated in stack memory local to is_int
.
Specifically, opt
is local because its lifetime does not exceed its
enclosing stack frame, which we call its region. At runtime,
entering is_int
begins a region by saving the current stack pointer;
exiting ends the region and reclaims stack-allocated memory. Since opt
is only accessible within this region, it may safely be allocated in the
corresponding stack frame.
Note that a stack-allocated value is not necessarily stored on the
control flow stack, as seen in languages that support alloca()
. In
this example, we request space from a stack-based allocator backed by
entirely unrelated memory.
The Locality Mode
So, local variables are those that do not escape their region. To
formalize this constraint in a manner the compiler can check, we
introduce modes.
-
By default, variables have the
global
mode. A global variable
has the capability to escape any region, so always references the
heap. -
Variables with the new
local
mode cannot escape their enclosing
region, so may refer to the stack.
A mode is attached to a variable upon declaration, either in a let
binding or in a function parameter. In both cases, the compiler will
check that the value does not escape its region.
let foo (local x) =
let local y = 0 in
x, y
;;
3 | x, y ^ Error: this value escapes its region.
A local
parameter represents a promise by the callee: the function
will not store a reference to the value anywhere that could be accessed
after the function returns. Intuitively, it’s safe to pass a
stack-allocated value to a function if we know the value’s lifetime will
not be extended.
let is_empty (local str) =
String.length str = 0
;;
val is_empty : string @ local -> bool
Here, the syntax string @ local
denotes that is_empty
takes its parameter
“at” the local mode.
Even without explicit mode annotations, the compiler can statically
determine which variables may escape their enclosing region. Such
variables are assigned the global mode; all others are automatically
inferred to be local. At this point, the compiler may construct values
bound to local
variables using stack allocation.
Local Returns
Returning a local value from a function should appear contradictory,
since a function’s result has clearly escaped its region. On the other
hand, if functions can only return globals, constructing fully
stack-allocated values becomes difficult—they can only be built up
from literals. The solution:
let local_list () =
exclave [1; 2; 3]
;;
val local_list : unit -> int list @ local
The exclave
keyword ends the current region and executes the given
expression in the enclosing region. The caller receives a local
variable prohibited from escaping the caller’s region. Therefore,
it’s safe to allocate that value on the caller