I’ve seen a lot of misconceptions around what the unsafe
keyword means for the utility
and validity of Rust and its marketing as a “safe systems language”. The truth is a lot
more complicated than a single pithy tweet can possibly sum up, unfortunately; here it is
as I see it.
Basically, the unsafe keyword does not turn off the advanced type system
that keeps Rust code honest. It only allows a few select “superpowers”, like dereferencing
raw pointers. It is used to implement safe abstractions over a fundamentally unsafe world
so that the majority of Rust code can use those abstractions and avoid memory unsafety.
The Promise of Safety
Rust promises safety as one of its core tenets; it is, in some ways, the raison d’être
of the language. It does not, however, go about providing that safety in the traditional
way, using a runtime and a garbage collector; rather, Rust uses a very advanced type
system to keep track of which values are safe to access when, and the compiler then
statically analyzes each Rust program to ensure that certain invariants are upheld.
Safety in Python
Let’s take, as an example, the Python programming language. Pure Python code cannot
corrupt its memory. List accesses have bounds checking, references returned by functions
are reference counted to prevent dangling pointers, and there’s no way to perform
arbitrary pointer arithmetic.
This has two consequences: first, a lot of types have to be “special”. For example, it’s
not possible to implement an efficient Python list or dict in pure Python; instead, the
CPython interpreter implements lists and dicts internally. Second, access to external
(non-Python-managed) functions, called “foreign function interface”, requires the use of
the special ctypes
module and breaks the language’s safety guarantees.
In a certain sense, this means that everything written in Python is memory unsafe.
Safety in Rust
Rust also provides safety, but instead of implementing unsafe structures in C, it provides
a so-called “escape hatch”: the unsafe
keyword.
This means that the foundational data structures in Rust like Vec
, VecDeque
, BTreeMap
,
and String
are all implemented using Rust.
“But Nora,” I hear you asking, “if Rust provides an escape hatch from its guarantees, and
the standard library is implemented using that escape hatch, isn’t everything written in
Rust unsafe?”
In a word, dear reader, yes – in exactly the same way that everything in Python is.
Let’s dig into that.
What Is Prohibited in Safe Rust?
Safety, in Rust, is very well-defined; we think about it a lot. In essence, safe Rust
programs cannot:
- Dereference a pointer that does not point to the type the compiler thinks it points to.
This means no null pointers (because they point to nothing), no memory-out-of-bounds
and/or segmentation faults, and no buffer overflows, but it also means no use-after-free
or double-free (because freeing memory counts as dereferencing a pointer), and no type
punning. - Cause there to be either multiple mutable references or both mutable and immutable
references to the same data at the same time. This is, if you have a mutable reference
to some data, only you have that reference, and if you have an immutable reference to
that data, it will not change while you hold that reference. This means it is impossible
to cause a data race in safe Rust, which is a guarantee most other safe languages do not
provide.
Rust encodes this information in the type system, either through the use of algebraic
data types like Option
to encode presence/absence and Result
to encode
failure/success, or references and lifetimes like &T
vs &mut T
to encode the
difference between shared (immutable) and exclusive (mutable) references and &'a T
versus &'b T
to denote references that are valid in different contexts/frames. (These
are usually elided; that is, the compiler is generally smart enough to figure them out.)
Examples
For example, the following code does not compile because it would cause a dangling
reference; specifically, my_struct does not live long enough
. In other words, the
function would return a reference to something that no longer exists, and therefore the
compiler won’t (and really, doesn’t know how to) compile it.
fn dangling_reference(v: &u64) -> &MyStruct {
// Create a new value of type MyStruct with the value field set to v,
// the function's one parameter.
let my_struct = MyStruct { value: v };
// Return a reference to the local variable my_struct.
return &my_struct;
// my_struct is deallocated (popped off the stack, really).
}
This code does the same thing, but tries to get around the problem by placing the value
on the heap (Box
is Rust’s name for a basic smart pointer with no wacky behavior.)
fn dangling_heap_reference(v: &u64) -> &Box {
let my_struct = MyStruct { value: v };
// Put the struct in a Box, allocating space for it on the heap and moving it there.
let my_box = Box::new(my_struct);
// Return a reference to the local variable my_box.