Some mistakes Rust doesn’t catch by jacobwg

Share This Article

Sed ut perspiciatis unde.

I still get excited about programming languages. But these days, it’s not so
much because of what they let me do, but rather what they don’t let me do.

Ultimately, what you can with a programming language is seldom limited by the
language itself: there’s nothing you can do in C++ that you can’t do in C, given
infinite time.

As long as a language is turing-complete and compiles down to assembly, no
matter the interface, it’s the same machine you’re talking to. You’re limited
by… what your hardware can do, how much memory it has (and how fast it is),
what kind of peripherals are plugged into it, and so on.

There’s of course differences in expressiveness: some tasks might require more
or less code in different languages. The Java language is, or at least was,
infamous for being verbose: but other upsides made it an attractive choice for
many companies, today still.

And then there’s performance, debuggability (which, if it isn’t a word,
definitely should be one), and a dozen of other factors you might want to take
under advisement when “picking a language”.

The size of the forest

But consider this: of the complete set of combinations of all possible
instructions, only a tiny fraction are actually useful programs. A much tinier
fraction still, actually achieve the task you’ve set out to do.

So one could view “programming” as searching for the right program within that
set. And one could view the virtue of “stricter” languages in reducing the size
of the set you’re searching in, because there’s fewer “legal” combinations.

With that in mind, one might be tempted to rank languages by “how many programs
are legal”. I don’t expect everyone to achieve consensus on a single ranking,
but some divisions are well-accepted.

Consider the following JavaScript program:

JavaScript code
function foo(i) {
  console.log("foo", i);
}

function bar() {
  console.log("bar!");
}

function main() {
  for (i = 0; i < 3; i++) {
    foo(i);
  }
  return;
  bar();
}

main();

In this code, bar() is never actually invoked – main returns before it would
be.

Running it under node.js yields no warnings whatsoever:

Shell session

$ node sample.js foo 0 foo 1 foo 2

The same sample, as Go, also doesn’t yield any warnings:

Go code
package main

import "log"

func foo(i int) {
  log.Printf("foo %d", i)
}

func bar() {
  log.Printf("bar!")
}

func main() {
  for i := 0; i < 3; i++ {
    foo(i)
  }
  return
  bar()
}

Shell session

$ go build ./sample.main $ ./sample 2022/02/06 17:35:55 foo 0 2022/02/06 17:35:55 foo 1 2022/02/06 17:35:55 foo 2

However, the go vet tool (which ships with the default Go distribution),
bats an eyelash:

Shell session

$ go vet ./sample.go # command-line-arguments ./sample.go:18:2: unreachable code

Because even though our code is not technically incorrect, it’s… suspicious.
It looks a lot like incorrect code. So the linter gently asks “hey, did you
really mean that? if you did, all good, just maybe comment it out. if you
didn’t, now’s your chance to fix it”.

The same code, but in Rust, makes for a much noisier experience still:

Rust code
fn foo(i: usize) {
    println!("foo {}", i);
}

fn bar() {
    println!("bar!");
}

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    return;
    bar()
}

Shell session
$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
warning: unreachable expression
  --> src/main.rs:14:5
   |
13 |     return;
   |     ------ any code following this expression is unreachable
14 |     bar()
   |     ^^^^^ unreachable expression
   |
   = note: `#[warn(unreachable_code)]` on by default

warning: `lox` (bin "lox") generated 1 warning
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/lox`
foo 0
foo 1
foo 2

I love that it doesn’t just show what code is unreachable, but why that code
is unreachable.

Note that this is still a warning – just something we should look at when we get
a chance, but not a showstopper. (Unless we slap #![deny(unreachable_code)] at
the start of our main.rs, the equivalent of passing -Werror=something to
gcc/clang).

Fuck around now, find out… when?

Let’s change our sample a little bit. Say we remove the definition of bar
entirely.

After all, it’s never called – what harm could it do?

JavaScript code
function foo(i) {
  console.log("foo", i);
}

function main() {
  for (i = 0; i < 3; i++) {
    foo(i);
  }
  return;
  bar();
}

main();

Shell session

$ node sample.js foo 0 foo 1 foo 2

10/10 node.js implementations agree: nobody cares about bar, because it’s
never actually called.

Go, however, is really cross about bar‘s departure:

Go code
package main

import "log"

func foo(i int) {
  log.Printf("foo %d", i)
}

func main() {
  for i := 0; i < 3; i++ {
    foo(i)
  }
  return
  bar()
}

Shell session

$ go run ./sample.go # command-line-arguments ./sample.go:14:2: undefined: bar

…and terse as ever.

The Rust compiler is also heartbroken:

Rust code
fn foo(i: usize) {
    println!("foo {}", i);
}

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    return;
    bar()
}

Shell session
$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error[E0425]: cannot find function `bar` in this scope
  --> src/main.rs:10:5
   |
10 |     bar()
   |     ^^^ not found in this scope

warning: unreachable expression
  --> src/main.rs:10:5
   |
9  |     return;
   |     ------ any code following this expression is unreachable
10 |     bar()
   |     ^^^^^ unreachable expression
   |
   = note: `#[warn(unreachable_code)]` on by default

For more information about this error, try `rustc --explain E0425`.
warning: `lox` (bin "lox") generated 1 warning
error: could not compile `lox` due to previous error; 1 warning emitted

…and still insistent that, were bar to exist (which it currently doesn’t),
it would still never get called, and we still ought to… rethink our
position.

So, both Go and Rust reject these programs as illegal (they issue an error and
refuse to emit a compiled form of the program), even though, if I’m to be
entirely fair, it’s a perfectly fine program.

But there’s a perfectly reasonable, practical explanation for this.

node.js, is in essence, an interpreter. It does ship with a just-in-time
compiler (several, in fact), but that is an implementation detail. We can
imagine that execution is performed “on the fly”, as new expressions and
statements are encountered, and be reasonably close to the truth.

So, node.js needn’t concern itself with the existence of a bar symbol until
the very moment it’s called (or accessed, or assigned to, etc.)

At which point, it will error out. At runtime, during the execution of our
program.

JavaScript code
function foo(i) {
  console.log("foo", i);
}

function main() {
  for (i = 0; i < 3; i++) {
    foo(i);
  }
  // 👇 (there used to be a 'return' here)
  bar();
}

main();

Shell session
$ node sample.js
foo 0
foo 1
foo 2
/home/amos/bearcove/lox/sample.js:10
  bar();
  ^

ReferenceError: bar is not defined
    at main (/home/amos/bearcove/lox/sample.js:10:3)
    at Object. (/home/amos/bearcove/lox/sample.js:13:1)
    at Module._compile (node:internal/modules/cjs/loader:1101:14)
    at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10)
    at Module.load (node:internal/modules/cjs/loader:981:32)
    at Function.Module._load (node:internal/modules/cjs/loader:822:12)
    at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12)
    at node:internal/main/run_main_module:17:47

However, both the Go and Rust compilers, through different machinery, eventually
generate some native executable that is full of machine code, and relatively
self-contained.

And thus, they must know what code to emit for the whole main function.
Including the address of bar, which, although it is in an unreachable portion
of the code, we still wrote a “call” instruction for in our source code.

If we wanted to reproduce roughly what’s happening in node.js, we’d need to use
a function pointer instead, which could be null, or point to a valid function:
and we’d only find out when we actually call it.

This Go code compiles:

Go code
package main

import "log"

func foo(i int) {
  log.Printf("foo %d", i)
}

type Bar func()

var bar Bar

func main() {
  for i := 0; i < 3; i++ {
    foo(i)
  }
  bar()
}

Shell session

$ go build ./sample.go

But panics during execution:

Shell session
$ ./sample
2022/02/06 18:08:06 foo 0
2022/02/06 18:08:06 foo 1
2022/02/06 18:08:06 foo 2
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x48756e]

goroutine 1 [running]:
main.main()
        /home/amos/bearcove/lox/sample.go:17 +0x6e

However, it stops panicking if we actually initialize bar to some valid
implementation:

Go code
package main

import "log"

func foo(i int) {
  log.Printf("foo %d", i)
}

type Bar func()

var bar Bar

// 👇 we initialize bar in an `init` function, called implicitly at startup
func init() {
  bar = func() {
    log.Printf("bar!")
  }
}

func main() {
  for i := 0; i < 3; i++ {
    foo(i)
  }

  bar()
}

We can do the same little charade in Rust:

Rust code
fn foo(i: usize) {
    println!("foo {}", i);
}

fn bar_impl() {
    println!("bar!");
}

static BAR: fn() = bar_impl;

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    BAR()
}

Shell session
$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
    Finished dev [unoptimized + debuginfo] target(s) in 0.14s
     Running `target/debug/lox`
foo 0
foo 1
foo 2
bar!

Although, reproducing the crash is harder. Because we can’t just declare a
function pointer that points to nothing.

Rust code
$ fn foo(i: usize) {
    println!("foo {}", i);
}

// 👇
static BAR: fn();

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    BAR()
}

Shell session
$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error: free static item without body
 --> src/main.rs:5:1
  |
5 | static BAR: fn();
  | ^^^^^^^^^^^^^^^^-
  |                 |
  |                 help: provide a definition for the static: `= ;`

error: could not compile `lox` due to previous error

If we want to account for the possibility of bar being there or not there, we
must change its type to Option instead:

Rust code
fn foo(i: usize) {
    println!("foo {}", i);
}

//            👇
static BAR: Option<fn()>;

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    BAR()
}

And we still must assign it something.

Shell session
$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error: free static item without body
 --> src/main.rs:5:1
  |
5 | static BAR: Option;
  | ^^^^^^^^^^^^^^^^^^^^^^^^-
  |                         |
  |                         help: provide a definition for the static: `= ;

(other errors omitted)

In this case, we’ll assign None because I’m trying to showcase what would
happen if bar did not exist:

Rust code
fn foo(i: usize) {
    println!("foo {}", i);
}

static BAR: Option<fn()> = None;

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    BAR()
}

But now we have an error at the call site:

Shell session
$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error[E0618]: expected function, found enum variant `BAR`
  --> src/main.rs:11:5
   |
5  | static BAR: Option = None;
   | -------------------------------- `BAR` defined here
...
11 |     BAR()
   |     ^^^--
   |     |
   |     call expression requires function
   |
help: `BAR` is a unit variant, you need to write it without the parentheses
   |
11 -     BAR()
11 +     BAR
   | 

For more information about this error, try `rustc --explain E0618`.
error: could not compile `lox` due to previous error

Because now, BAR is not a function, that can be called, it’s an Option,
which could be one of either Some(f) (where f is a function we can call),
or None (indicating the absence of a function we can call).

So, Rust forces us to account for both cases, which we can do with a match for
example:

Rust code
fn foo(i: usize) {
    println!("foo {}", i);
}

static BAR: Option<fn()> = None;

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    match BAR {
        Some(f) => f(),
        None => println!("(no bar implementation found)"),
    }
}

Shell session
$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
    Finished dev [unoptimized + debuginfo] target(s) in 0.24s
     Running `target/debug/lox`
foo 0
foo 1
foo 2
(no bar implementation found)

And, with BAR set to the Some variant:

Rust code
fn foo(i: usize) {
    println!("foo {}", i);
}

static BAR: Option<fn()> = Some({
    // we could define this outside the option, but we don't have to!
    // this is just showing off, but I couldn't resist, because it's fun.
    fn bar_impl() {
        println!("bar!");
    }
    // the last expression of a block (`{}`) is what the block evaluates to
    bar_impl
});

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    match BAR {
        Some(f) => f(),
        None => println!("(no bar implementation found)"),
    }
}

Shell session
$ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.00s
     Running `target/debug/lox`
foo 0
foo 1
foo 2
bar!

So, if we compare the stance of all three languages here:

JavaScript (via node.js here) plain doesn’t care if bar() exists until
you actually call it.
Go cares if it’s a regular function call, but it will let you build a
function pointer that points to nowhere, and panic at runtime
Rust will not let you build a function pointer that
points to nowhere at all.

JavaScript’s looseness is not an oversight here: the mechanism which it uses to
look up symbols is completely different from Go and Rust. Even though there’s
no mention of bar anywhere in our code, it might still exist, as evidenced
by this ~crime~ sample code:

JavaScript code
function foo(i) {
  console.log("foo", i);
}

eval(
  `mruhgr4hgx&C&./&CD&iutyurk4rum.(hgx'(/A`
    .split("")
    .map((c) => String.fromCharCode(c.charCodeAt(0) - 6))
    .join(""),
);

function main() {
  for (i = 0; i < 3; i++) {
    foo(i);
  }
  bar();
}

main();

Shell session

$ node sample.js foo 0 foo 1 foo 2 bar!

As for Rust, I should specify: safe Rust doesn’t let you do that.

If we let ourself write unsafe code, an essential part of Rust, without
which one could not build safe abstractions atop the standard C library, or
system calls, for example, we can achieve crashdom:

Rust code
fn foo(i: usize) {
    println!("foo {}", i);
}

// initialize BAR with some garbage
static BAR: fn() = unsafe { std::mem::transmute(&()) };

fn main() {
    for i in 0..=2 {
        foo(i)
    }
    BAR();
}

Shell session
$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error[E0080]: it is undefined behavior to use this value
 --> src/main.rs:5:1
  |
5 | static BAR: fn() = unsafe { std::mem::transmute(&()) };
  | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ type validation failed: encountered pointer to alloc4, but expected a function pointer
  |
  = note: The rules on what exactly is undefined behavior aren't clear, so this check might be overzealous. Please open an issue on the rustc repository if you believe it should not be considered undefined behavior.
  = note: the raw bytes of the constant (size: 8, align: 8) {
              ╾───────alloc4────────╼                         │ ╾──────╼
          }

For more information about this error, try `rustc --explain E0080`.
error: could not compile `lox` due to previous error

Mh, nope, it caught that one. Huh.

Fine, let’s do this then:

Rust code
fn foo(i: usize) {
    println!("foo {}", i);
}

const BAR: *const () = std::ptr::null();

fn main() {
    for i in 0..=2 {
        foo(i)
    }

    let bar: fn() = unsafe { std::mem::transmute(BAR) };
    bar();
}

Shell session
$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
    Finished dev [unoptimized + debuginfo] target(s) in 0.13s
     Running `target/debug/lox`
foo 0
foo 1
foo 2
zsh: segmentation fault (core dumped)  cargo run

There. We had to go out of our way to ask for it, but we got it.

So, of those three languages, it wouldn’t be unreasonable to say that JavaScript
is the loosest one (letting us add things to the global scope at runtime,
evaluate arbitrary strings, etc.), Rust is the strictest one (not letting us
create a dangling function pointer in safe Rust), and Go is somewhere in the
middle.

Also, types

Similarly, we can see a clear distinction in how those three languages treat
types.

It is extremely easy (too easy perhaps) to make a JavaScript function that can
“add” two arbitrary things. Because function parameters don’t have types.

So, an add function will just as happily add together two numbers, as it will
concatenate two strings:

JavaScript code
function add(a, b) {
  return a + b;
}

function main() {
  console.log(add(1, 2));
  console.log(add("foo", "bar"));
}

main();

Shell session

$ node sample.js 3 foobar

In Go, it’s not that easy, because we have to pick a type.

We can do numbers:

Go code
package main

import "log"

func add(a int, b int) int {
  return a + b
}

func main() {
  log.Printf("%v", add(1, 2))
}

Shell session

$ go run ./sample.go 2022/02/06 19:01:55 3

And we can do strings:

Go code
package main

import "log"

func add(a string, b string) string {
  return a + b
}

func main() {
  log.Printf("%v", add("foo", "bar"))
}

Shell session

$ go run ./sample.go 2022/02/06 19:02:25 foobar

But we can’t do both.

Or can we?

Go code
package main

import "log"

func add(a interface{}, b interface{}) interface{} {
  if a, ok := a.(int); ok {
    if b, ok := b.(int); ok {
      return a + b
    }
  }

  if a, ok := a.(string); ok {
    if b, ok := b.(string); ok {
      return a + b
    }
  }

  panic("incompatible types")
}

func main() {
  log.Printf("%v", add(1, 2))
  log.Printf("%v", add("foo", "bar"))
}

Shell session

$ go run ./sample.go 2022/02/06 19:05:11 3 2022/02/06 19:05:11 foobar

It’s… not very good, though. add(1, "foo") will compile, but panic at
runtime, for example.

Luckily, Go 1.18 beta added generics, so maybe?

Go code
package main

import "log"

func add[T int64 | string](a T, b T) T {
  return a + b
}

func main() {
  log.Printf("%v", add(1, 2))
  log.Printf("%v", add("foo", "bar"))
}

Shell session

$ go run ./main.go ./main.go:10:22: int does not implement int64|string

Ah. Let’s see what the type parameters
proposal
suggests… oh. Okay.

Go code
package main

import "log"

type Addable interface {
  ~int | ~int8 | ~int16 | ~int32 | ~int64 |
    ~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 | ~uintptr |
    ~float32 | ~float64 | ~complex64 | ~complex128 |
    ~string
}

func add[T Addable](a T, b T) T {
  return a + b
}

func main() {
  log.Printf("%v", add(1, 2))
  log.Printf("%v", add("foo", "bar"))
}

Shell session

$ go run ./main.go 2022/02/06 19:12:11 3 2022/02/06 19:12:11 foobar

Sure, that… that works. But I mean, we’re not expressing a property of a type,
so much as listing all the types we can think of. I guess nobody will ever want
to implement the + operator for a user type. Or add int128 / uint128 to
the language.

Ah well.

As for contestant number 3, well… surely it’s going to do great right? After
all, these articles are just thinly-veiled Rust propaganda, so surely it’ll…

Rust code
use std::ops::Add;

fn add<T>(a: T, b: T) -> T::Output
where
    T: Add<T>,
{
    a + b
}

fn main() {
    dbg!(add(1, 2));
    dbg!(add("foo", "bar"));
}

Shell session
$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error[E0277]: cannot add `&str` to `&str`
  --> src/main.rs:12:10
   |
12 |     dbg!(add("foo", "bar"));
   |          ^^^ no implementation for `&str + &str`
   |
   = help: the trait `Add` is not implemented for `&str`
note: required by a bound in `add`
  --> src/main.rs:5:8
   |
3  | fn add(a: T, b: T) -> T::Output
   |    --- required by a bound in this
4  | where
5  |     T: Add,
   |        ^^^^^^ required by this bound in `add`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `lox` due to previous error

Huh.

I mean, that’s good if you’re into that sort of thing.

I am, so I like it: first of all, we’ve asked for “any type that we can add”,
not just listed a bunch of concrete types. We’ve also allowed for T + T to
return a type that isn’t T! The function’s return type is >::Output, which could be anything.

Second of all, the diagnostic here is fantastic: it tells us what we asked,
why it couldn’t be satisfied… I like it.

It doesn’t really give the rationale behind Add not being implemented for
&str and &str, so I still serve a purpose. &str is just a string slice:
it refers to some data that exists elsewhere, and doesn’t carry ownership of
the data itself.

In our example, the data is in the executable itself:

Rust code
fn add<T>(_: T, _: T) -> T {
    todo!();
}

fn main() {
    dbg!(add(1, 2));
    dbg!(add("foo", "bar"));
}

Shell session

$ cargo build --quiet $ objdump -s -j .rodata ./target/debug/lox | grep -B 3 -A 3 -E 'foo|bar' 3c0d0 03000000 00000000 02000000 00000000 ................ 3c0e0 00000000 00000000 02000000 00000000 ................ 3c0f0 00000000 00000000 20000000 04000000 ........ ....... 👇 3c100 03000000 00000000 62617266 6f6f6164 ........barfooad 3c110 64282266 6f6f222c 20226261 7222296e d("foo", "bar")n 3c120 6f742079 65742069 6d706c65 6d656e74 ot yet implement 3c130 65647372 632f6d61 696e2e72 73000000 edsrc/main.rs... 3c140 01000000 00000000 00000000 00000000 ................

…so it’s valid for the whole time the program is executed: the expression
"foo" is a &'static str.

But to join “foo” and “bar” together, we’d have to allocate some memory. One
fairly natural way to do this would be to create a String, which would
allocate memory on the heap. And String implements Deref, so
anything we can do with a &str, we can also do with a String.

So, long story short, you can’t do &str + &str. You can, however, do String + &str. If we look at the docs,
we find the rationale:

impl<'_> Add<&'_ str> for String

Implements the + operator for concatenating two strings.

This consumes the String on the left-hand side and re-uses its buffer (growing
it if necessary). This is done to avoid allocating a new String and copying
the entire contents on every operation, which would lead to O(n^2) running
time when building an n-byte string by repeated concatenation.

The string on the right-hand side is only borrowed; its contents are copied into
the returned String.

So, if we convert our parameters to String with .to_string():

Rust code
use std::ops::Add;

fn add<T>(a: T, b: T) -> T::Output
where
    T: Add<T>,
{
    a + b
}

fn main() {
    dbg!(add(1, 2));
    dbg!(add("foo".to_string(), "bar".to_string()));
}

Shell session
$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
error[E0277]: cannot add `String` to `String`
  --> src/main.rs:12:10
   |
12 |     dbg!(add("foo".to_string(), "bar".to_string()));
   |          ^^^ no implementation for `String + String`
   |
   = help: the trait `Add` is not implemented for `String`
note: required by a bound in `add`
  --> src/main.rs:5:8
   |
3  | fn add(a: T, b: T) -> T::Output
   |    --- required by a bound in this
4  | where
5  |     T: Add,
   |        ^^^^^^ required by this bound in `add`

error[E0277]: cannot add `String` to `String`
  --> src/main.rs:12:10
   |
12 |     dbg!(add("foo".to_string(), "bar".to_string()));
   |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no implementation for `String + String`
   |
   = help: the trait `Add` is not implemented for `String`

For more information about this error, try `rustc --explain E0277`.
error: could not compile `lox` due to 2 previous errors

…it doesn’t work either.

Because there’s no impl Add for String.

Only impl Add<&str, Output = String> for String. We don’t need ownership of
the right-hand-side operand for +: we’re merely reading it and copying it
immediately following the contents of the left-hand-side operand.

So, we can make our code work, if we accept arguments of two different types:

Rust code
use std::ops::Add;

fn add<A, B>(a: A, b: B) -> A::Output
where
    A: Add<B>,
{
    a + b
}

fn main() {
    dbg!(add(1, 2));
    dbg!(add("foo".to_string(), "bar"));
}

Shell session
$ cargo run
   Compiling lox v0.1.0 (/home/amos/bearcove/lox)
    Finished dev [unoptimized + debuginfo] target(s) in 0.21s
     Running `target/debug/lox`
[src/main.rs:11] add(1, 2) = 3
[src/main.rs:12] add("foo".to_string(), "bar") = "foobar"

So, how did Rust fare here? Depends how you feel about things.

It’s a pretty radical design choice, to force you to be aware that, yeah, since
you’re creating a new value (out of the two parameters), you will have to
allocate. And so it forces you to allocate outside of the Add operation itself.

Rust code
fn main() {
    // I know `to_string` allocates, it's not hidden behind `+`.
    // the `+` may reallocate (to grow the `String`).
    let foobar = "foo".to_string() + "bar";
    dbg!(&foobar);
}

Rust code
fn main() {
    let foo: String = "foo".into();
    let bar: String = "bar".into();

    // 🛑 this doesn't build:
    // the right-hand-side cannot be a `String`, it has to be a string slice,
    // e.g. `&str`
    let foobar = foo + bar;
    dbg!(&foobar);
}

Rust code
fn main() {
    let foo: String = "foo".into();
    let bar: String = "bar".into();

    // this builds fine!
    let foobar = foo + &bar;
    dbg!(&foobar);
}

Rust code
fn main() {
    let foo: String = "foo".into();
    let bar: String = "bar".into();

    let foobar = foo + &bar;
    dbg!(&foobar);

    // 🛑 this doesn't build!
    // `foo` was moved during the first addition (it was reused to store the
    // result of concatenating the two strings)
    let foobar = foo + &bar;
    dbg!(&foobar);
}

Rust code
fn main() {
    let foo: String = "foo".into();
    let bar: String = "bar".into();

    let foobar = foo.clone() + &bar;
    dbg!(&foobar);

    // this builds fine! we've cloned foo in the previous addition, which
    // allocates. again, nothing is hidden in the implementation of `+`.
    let foobar = foo + &bar;
    dbg!(&foobar);
}

That aspect of Rust is a turn-off to some folks. Even to folks who otherwise
love Rust for many other reasons. It’s often been said that there’s a
higher-level language (where you don’t worry about allocating so much) inside of
Rust, waiting to be discovered.

Well, we’re still waiting.

However, that very aspect is also what makes Rust so appealing for systems
programming. And its razor-sharp focus on ownership, lifetimes etc. is also the
underpinning of many of its safety guarantees.

Losing the thread

So, now that we’ve looked at unreachable code / undefined symbols, and types,
let’s talk about concurrency!

Code is said to be “concurrent” when several tasks can make progress at the same
time. And there’s multiple mechanisms through which this can be achieved.

JavaScript definitely lets us write concurrent code:

JavaScript code
function sleep(ms) {
  return new Promise((resolve, _reject) => setTimeout(resolve, ms));
}

async function doWork(name) {
  for (let i = 0; i < 30; i++) {
    await sleep(Math.random() * 40);
    process.stdout.write(name);
  }
}

Promise.all(["a", "b"].map(doWork)).then(() => {
  process.stdout.write("n");
});

And it runs fine in node.js:

Shell session

$ node sample.js abbaabbababaababbababbaabaabaababaabbabababaaababbbaababbabb

We can see here that both task “a” and task “b” are making progress,
concurrently. Not in parallel: they never actually make progress at the same
time, they just do little bits of progress one after the other, and, to the
outside observer, there’s hardly a difference.

That means, for example, that you can definitely use node.js to write server
applications, that are serving a large number of concurrent requests.

Because you don’t strictly need request handler to run in parallel, you just
need to them to process input as it comes in: oh, a client is trying to connect,
let’s accept their connection! They sent a client hello, let’s send a server
hello so we can complete a TLS handshake.

Now the request’s coming in, there’s one header, two, three, etc. – this can all
be done piecemeal. And then we can stream a body back to them, one spoonful at a
time, where spoons are actually buffers.

node.js actually does offer
threads, but you
wouldn’t use them to handle HTTP requests concurrently – rather, you’d use them
to do cpu-intensive tasks in the background, not i/o-bound stuff.

If we turn our attention to Go, we can make a similar program fairly easily:

Go code
package main

import (
  "fmt"
  "math/rand"
  "sync"
  "time"
)

func doWork(name string) {
  for i := 0; i < 30; i++ {
    time.Sleep(time.Duration(rand.Intn(40)) * time.Millisecond)
    fmt.Printf("%v", name)
  }
}

func main() {
  var wg sync.WaitGroup

  for name := range []string{"a", "b"} {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(name)
    }()
  }

  wg.Wait()
  fmt.Printf("n")
}

Shell session

$ go run ./sample.go # command-line-arguments ./sample.go:24:10: cannot use name (type int) as type string in argument to doWork

Haha whoops, the “for range” syntax actually yields two values, and the first
one is the index, so we have to ignore it by binding it to _.

Let’s try this again:

Go code
// omitted: package, imports, func doWork

func main() {
  var wg sync.WaitGroup

  for _, name := range []string{"a", "b"} {
    wg.Add(1)
    go func() {
      defer wg.Done()
      doWork(name)
    }()
  }

  wg.Wait()
  fmt.Printf("n")
}

Shell session

$ go run ./sample.go bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb

Oh dear, there was another mix up. Yet there were no compiler warnings,
strange…

Let’s try go vet?

Shell session
$ go vet ./sample.go
# command-line-arguments
./sample.go:24:11: loop variable name captured by func literal

Haha right! Closures do work like that in Go.

Go code
func main() {
  var wg sync.WaitGroup

  for _, name := range []string{"a", "b"} {
    wg.Add(1)
    name := name
    go func() {
      defer wg.Done()
      doWork(name)
    }()
  }

  wg.Wait()
  fmt.Printf("n")
}

There!

I guess that’s why the Go compiler is so fast – it barely checks for anything!

Now now, we’re not here to disparage any specific language.

Yeah no okay but I mean… how often have you made those mistakes?

Ahh but maybe it’s me who’s stupid. Maybe I’m a stupid stupid boy who just
will not learn – after all, only bad craftspeople blame their tools!

That is… that is so incredibly dense. Do you have any idea of the damage that
stupid, evil idiom has caused?

I don’t know, sounds to me like maybe you’re a bad craftsbear.

AS IT TURNS OUT, good craftspeople don’t shy away from questioning, criticizing,
and trying to improve their tools (or even switching to other tools entirely!)

It’s… it’s what they work with. What did you expect.

Anyone trying to tell you otherwise is probably romanticizing martyrdom and
fooling themselves into thinking that “hard work” must equate suffering, and
you deserve better companionship.

Anyway – the program finally does the thing:

Shell session

$ go run ./sample.go babbababaabbbabbbababbaabbbaabbabababbababbabaababbaaaaaaaaa

There’s really no observable difference, running both program like that in a
shell. We just see a stream of “a” and “b” randomly coming in. Two instances of
“doWork” are also executing concurrently in Go.

But there is an actual difference: see, Go has threads.

If we run our node.js example again, but under strace, to look out for the
write syscall, and reduce the number of iterations to 5 for a more manageable
output, we can see that…

Shell session
❯ strace -f -e write node ./sample.js > /dev/null
write(5, "*", 1)                        = 1
strace: Process 1396685 attached
strace: Process 1396686 attached
strace: Process 1396687 attached
strace: Process 1396688 attached
strace: Process 1396689 attached
[pid 1396684] write(16, "1", 8) = 8
strace: Process 1396690 attached
[pid 1396684] write(1, "b", 1)          = 1
[pid 1396684] write(1, "b", 1)          = 1
[pid 1396684] write(1, "a", 1)          = 1
[pid 1396684] write(1, "a", 1)          = 1
[pid 1396684] write(1, "b", 1)          = 1
[pid 1396684] write(1, "a", 1)          = 1
[pid 1396684] write(1, "b", 1)          = 1
[pid 1396684] write(1, "a", 1)          = 1
[pid 1396684] write(1, "a", 1)          = 1
[pid 1396684] write(1, "b", 1)          = 1
[pid 1396684] write(1, "n", 1)         = 1
[pid 1396684] write(12, "1", 8) = 8
[pid 1396689] +++ exited with 0 +++
[pid 1396688] +++ exited with 0 +++
[pid 1396687] +++ exited with 0 +++
[pid 1396686] +++ exited with 0 +++
[pid 1396685] +++ exited with 0 +++
[pid 1396690] +++ exited with 0 +++
+++ exited with 0 +++

strace intercepts and prints information about system calls made by a process.

The -f option stands for “follow forks”, and it’s especially useful because it
prefixes every line of output with a “pid”, which stands for “process
identifier”, but really, on Linux (where this experiment was done), processes
and threads are very much alike, and so we can actually pretend those pids are
tids (thread identifiers) instead.

…we can see that both “a” and “b” are written by the same thread (PID 1396684).

But if we run the Go program:

Shell session
$ go build ./sample.go && strace -f -e write ./sample > /dev/null
strace: Process 1398810 attached
strace: Process 1398811 attached
strace: Process 1398812 attached
strace: Process 1398813 attached
[pid 1398813] write(1, "b", 1)          = 1
[pid 1398809] write(1, "a", 1)          = 1
[pid 1398813] write(1, "b", 1)          = 1
[pid 1398813] write(5, "", 1)         = 1
[pid 1398809] write(1, "b", 1)          = 1
[pid 1398813] write(1, "a", 1)          = 1
[pid 1398809] write(1, "b", 1)          = 1
[pid 1398813] write(1, "a", 1)          = 1
[pid 1398813] write(5, "", 1)         = 1
[pid 1398809] write(1, "a", 1)          = 1
[pid 1398813] write(1, "b", 1)          = 1
[pid 1398809] write(1, "a", 1)          = 1
[pid 1398809] write(1, "n", 1)         = 1
[pid 1398813] +++ exited with 0 +++
[pid 1398812] +++ exited with 0 +++
[pid 1398811] +++ exited with 0 +++
[pid 1398810] +++ exited with 0 +++
+++ exited with 0 +++

We can see that:

“a” is written by PID 1398809
“b” is written by PID 1398813

…and we can see that occasionally, something writes the null byte (),
which I bet has everything to do with the scheduler.

We can actually tell Go to only use one thread.

Shell session
$ go build ./sample.go && GOMAXPROCS=1 strace -f -e write ./sample > /dev/null
strace: Process 1401117 attached
strace: Process 1401118 attached
strace: Process 1401119 attached
[pid 1401116] write(1, "b", 1)          = 1
[pid 1401116] write(1, "a", 1)          = 1
[pid 1401116] write(1, "b", 1)          = 1
[pid 1401116] write(1, "b", 1)          = 1
[pid 1401116] write(1, "a", 1)          = 1
[pid 1401119] write(1, "b", 1)          = 1
[pid 1401119] write(1, "a", 1)          = 1
[pid 1401119] write(1, "b", 1)          = 1
[pid 1401116] write(1, "a", 1)          = 1
[pid 1401116] write(1, "a", 1)          = 1
[pid 1401116] write(1, "n", 1)         = 1
[pid 1401119] +++ exited with 0 +++
[pid 1401118] +++ exited with 0 +++
[pid 1401117] +++ exited with 0 +++
+++ exited with 0 +++

And now all the writes are issued from the same thread!

Wait! No they’re not! What!

Let’s check the docs:

The GOMAXPROCS variable limits the number of operating system threads that can
execute user-level Go code simultaneously. There is no limit to the number of
threads that can be blocked in system calls on behalf of Go code; those do not
count against the GOMAXPROCS limit. This package’s GOMAXPROCS function queries
and changes the limit.

Ohohhh. I guess nanosleep is a blocking system call huh.

As for Rust, well, we can have threads, for sure:

Rust code
use std::{
    io::{stdout, Write},
    time::Duration,
};

use rand::Rng;

fn do_work(name: String) {
    let mut rng = rand::thread_rng();
    for _ in 0..40 {
        std::thread::sleep(Duration::from_millis(rng.gen_range(0..=30)));
        print!("{}", name);
        stdout().flush().ok();
    }
}

fn main() {
    let a = std::thread::spawn(|| do_work("a".into()));
    let b = std::thread::spawn(|| do_work("b".into()));
    a.join().unwrap();
    b.join().unwrap();
    println!();
}

Shell session

$ cargo run --quiet babbabbabaabababbaaaabbabbabbababaaababbabababbbabbababbababababababaa

The output of strace for that program is exactly what we’d expect (again,
reduced to only 5 iterations for readability):

Shell session
$ cargo build --quiet && strace -e write -f ./target/debug/lox > /dev/null
strace: Process 1408066 attached
strace: Process 1408067 attached
[pid 1408066] write(1, "a", 1)          = 1
[pid 1408067] write(1, "b", 1)          = 1
[pid 1408066] write(1, "a", 1)          = 1
[pid 1408067] write(1, "b", 1)          = 1
[pid 1408067] write(1, "b", 1)          = 1
[pid 1408067] write(1, "b", 1)          = 1
[pid 1408066] write(1, "a", 1)          = 1
[pid 1408067] write(1, "b", 1)          = 1
[pid 1408067] +++ exited with 0 +++
[pid 1408066] write(1, "a", 1)          = 1
[pid 1408066] write(1, "a", 1)          = 1
[pid 1408066] +++ exited with 0 +++
write(1, "n", 1)                       = 1
+++ exited with 0 +++

“a” is written by PID 1408066, and “b” is written by PID 1408067.

We can also do that with async, say, with the tokio crate:

Rust code
use rand::Rng;
use std::{
    io::{stdout, Write},
    time::Duration,
};
use tokio::{spawn, time::sleep};

async fn do_work(name: String) {
    for _ in 0..30 {
        let ms = rand::thread_rng().gen_range(0..=40);
        sleep(Duration::from_millis(ms)).await;
        print!("{}", name);
        stdout().flush().ok();
    }
}

#[tokio::main]
async fn main() {
    let a = spawn(do_work("a".into()));
    let b = spawn(do_work("b".into()));
    a.await.unwrap();
    b.await.unwrap();
    println!();
}

Shell session

$ cargo run --quiet abababbabababbabbabaabababbbaabaabaabbabaabbabbabababaababaa

The output of strace is actually interesting here:

Shell session
$ cargo build --quiet && strace -e write -f ./target/debug/lox > /dev/null
strace: Process 1413863 attached
strace: Process 1413864 attached
strace: Process 1413865 attached
strace: Process 1413866 attached
strace: Process 1413867 attached
strace: Process 1413868 attached
strace: Process 1413869 attached
strace: Process 1413870 attached
strace: Process 1413871 attached
strace: Process 1413872 attached
strace: Process 1413873 attached
strace: Process 1413874 attached
strace: Process 1413875 attached
strace: Process 1413876 attached
strace: Process 1413877 attached
strace: Process 1413878 attached
strace: Process 1413879 attached
strace: Process 1413880 attached
strace: Process 1413881 attached
strace: Process 1413882 attached
strace: Process 1413883 attached
strace: Process 1413884 attached
strace: Process 1413885 attached
strace: Process 1413886 attached
strace: Process 1413887 attached
strace: Process 1413888 attached
strace: Process 1413889 attached
strace: Process 1413890 attached
strace: Process 1413891 attached
strace: Process 1413892 attached
strace: Process 1413893 attached
strace: Process 1413894 attached
[pid 1413893] write(4, "1", 8) = 8
[pid 1413863] write(1, "a", 1)          = 1
[pid 1413863] write(4, "1", 8) = 8
[pid 1413863] write(1, "a", 1)          = 1
[pid 1413863] write(1, "a", 1)          = 1
[pid 1413893] write(1, "b", 1 
[pid 1413863] write(4, "1", 8 
[pid 1413893] <... write resumed>)      = 1
[pid 1413863] <... write resumed>)      = 8
[pid 1413893] write(4, "1", 8) = 8
[pid 1413894] write(1, "b", 1)          = 1
[pid 1413894] write(4, "1", 8) = 8
[pid 1413894] write(1, "b", 1)          = 1
[pid 1413894] write(1, "a", 1)          = 1
[pid 1413894] write(1, "b", 1)          = 1
[pid 1413894] write(1, "a", 1)          = 1
[pid 1413894] write(1, "b", 1)          = 1
[pid 1413862] write(1, "n", 1)         = 1
[pid 1413862] write(4, "1", 8) = 8
[pid 1413867] +++ exited with 0 +++
[pid 1413863] +++ exited with 0 +++
[pid 1413864] +++ exited with 0 +++
[pid 1413868] +++ exited with 0 +++
[pid 1413865] +++ exited with 0 +++
[pid 1413866] +++ exited with 0 +++
[pid 1413869] +++ exited with 0 +++
[pid 1413870] +++ exited with 0 +++
[pid 1413873] +++ exited with 0 +++
[pid 1413871] +++ exited with 0 +++
[pid 1413872] +++ exited with 0 +++
[pid 1413874] +++ exited with 0 +++
[pid 1413875] +++ exited with 0 +++
[pid 1413876] +++ exited with 0 +++
[pid 1413878] +++ exited with 0 +++
[pid 1413877] +++ exited with 0 +++
[pid 1413879] +++ exited with 0 +++
[pid 1413880] +++ exited with 0 +++
[pid 1413881] +++ exited with 0 +++
[pid 1413882] +++ exited with 0 +++
[pid 1413883] +++ exited with 0 +++
[pid 1413884] +++ exited with 0

Some mistakes Rust doesn’t catch by jacobwg

Some mistakes Rust doesn’t catch by jacobwg

Share This Article

Newsletter

The size of the forest

Fuck around now, find out… when?

Also, types

Losing the thread

HackTech

Leave a comment Cancel reply

Editor's Choice

Some mistakes Rust doesn’t catch by jacobwg

Some mistakes Rust doesn’t catch by jacobwg

Share This Article

Newsletter

The size of the forest

Fuck around now, find out… when?

Also, types

Losing the thread

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter