I still get excited about programming languages. But these days, it’s not so
much because of what they let me do, but rather what they don’t let me do.
Ultimately, what you can with a programming language is seldom limited by the
language itself: there’s nothing you can do in C++ that you can’t do in C, given
infinite time.
As long as a language is turing-complete and compiles down to assembly, no
matter the interface, it’s the same machine you’re talking to. You’re limited
by… what your hardware can do, how much memory it has (and how fast it is),
what kind of peripherals are plugged into it, and so on.
There’s of course differences in expressiveness: some tasks might require more
or less code in different languages. The Java language is, or at least was,
infamous for being verbose: but other upsides made it an attractive choice for
many companies, today still.
And then there’s performance, debuggability (which, if it isn’t a word,
definitely should be one), and a dozen of other factors you might want to take
under advisement when “picking a language”.
The size of the forest
But consider this: of the complete set of combinations of all possible
instructions, only a tiny fraction are actually useful programs. A much tinier
fraction still, actually achieve the task you’ve set out to do.
So one could view “programming” as searching for the right program within that
set. And one could view the virtue of “stricter” languages in reducing the size
of the set you’re searching in, because there’s fewer “legal” combinations.
With that in mind, one might be tempted to rank languages by “how many programs
are legal”. I don’t expect everyone to achieve consensus on a single ranking,
but some divisions are well-accepted.
Consider the following JavaScript program:
JavaScript code
function foo(i) { console.log("foo", i); } function bar() { console.log("bar!"); } function main() { for (i = 0; i < 3; i++) { foo(i); } return; bar(); } main();
In this code, bar()
is never actually invoked – main
returns before it would
be.
Running it under node.js yields no warnings whatsoever:
Shell session
$ node sample.js foo 0 foo 1 foo 2
The same sample, as Go, also doesn’t yield any warnings:
Go code
package main import "log" func foo(i int) { log.Printf("foo %d", i) } func bar() { log.Printf("bar!") } func main() { for i := 0; i < 3; i++ { foo(i) } return bar() }
Shell session
$ go build ./sample.main $ ./sample 2022/02/06 17:35:55 foo 0 2022/02/06 17:35:55 foo 1 2022/02/06 17:35:55 foo 2
However, the go vet
tool (which ships with the default Go distribution),
bats an eyelash:
Shell session
$ go vet ./sample.go # command-line-arguments ./sample.go:18:2: unreachable code
Because even though our code is not technically incorrect, it’s… suspicious.
It looks a lot like incorrect code. So the linter gently asks “hey, did you
really mean that? if you did, all good, just maybe comment it out. if you
didn’t, now’s your chance to fix it”.
The same code, but in Rust, makes for a much noisier experience still:
Rust code
fn foo(i: usize) { println!("foo {}", i); } fn bar() { println!("bar!"); } fn main() { for i in 0..=2 { foo(i) } return; bar() }
Shell session
$ cargo run Compiling lox v0.1.0 (/home/amos/bearcove/lox) warning: unreachable expression --> src/main.rs:14:5 | 13 | return; | ------ any code following this expression is unreachable 14 | bar() | ^^^^^ unreachable expression | = note: `#[warn(unreachable_code)]` on by default warning: `lox` (bin "lox") generated 1 warning Finished dev [unoptimized + debuginfo] target(s) in 0.15s Running `target/debug/lox` foo 0 foo 1 foo 2
I love that it doesn’t just show what code is unreachable, but why that code
is unreachable.
Note that this is still a warning – just something we should look at when we get
a chance, but not a showstopper. (Unless we slap #![deny(unreachable_code)]
at
the start of our main.rs
, the equivalent of passing -Werror=something
to
gcc/clang).
Fuck around now, find out… when?
Let’s change our sample a little bit. Say we remove the definition of bar
entirely.
After all, it’s never called – what harm could it do?
JavaScript code
function foo(i) { console.log("foo", i); } function main() { for (i = 0; i < 3; i++) { foo(i); } return; bar(); } main();
Shell session
$ node sample.js foo 0 foo 1 foo 2
10/10 node.js implementations agree: nobody cares about bar
, because it’s
never actually called.
Go, however, is really cross about bar
‘s departure:
Go code
package main import "log" func foo(i int) { log.Printf("foo %d", i) } func main() { for i := 0; i < 3; i++ { foo(i) } return bar() }
Shell session
$ go run ./sample.go # command-line-arguments ./sample.go:14:2: undefined: bar
…and terse as ever.
The Rust compiler is also heartbroken:
Rust code
fn foo(i: usize) { println!("foo {}", i); } fn main() { for i in 0..=2 { foo(i) } return; bar() }
Shell session
$ cargo run Compiling lox v0.1.0 (/home/amos/bearcove/lox) error[E0425]: cannot find function `bar` in this scope --> src/main.rs:10:5 | 10 | bar() | ^^^ not found in this scope warning: unreachable expression --> src/main.rs:10:5 | 9 | return; | ------ any code following this expression is unreachable 10 | bar() | ^^^^^ unreachable expression | = note: `#[warn(unreachable_code)]` on by default For more information about this error, try `rustc --explain E0425`. warning: `lox` (bin "lox") generated 1 warning error: could not compile `lox` due to previous error; 1 warning emitted
…and still insistent that, were bar
to exist (which it currently doesn’t),
it would still never get called, and we still ought to… rethink our
position.
So, both Go and Rust reject these programs as illegal (they issue an error and
refuse to emit a compiled form of the program), even though, if I’m to be
entirely fair, it’s a perfectly fine program.
But there’s a perfectly reasonable, practical explanation for this.
node.js, is in essence, an interpreter. It does ship with a just-in-time
compiler (several, in fact), but that is an implementation detail. We can
imagine that execution is performed “on the fly”, as new expressions and
statements are encountered, and be reasonably close to the truth.
So, node.js needn’t concern itself with the existence of a bar
symbol until
the very moment it’s called (or accessed, or assigned to, etc.)
At which point, it will error out. At runtime, during the execution of our
program.
JavaScript code
function foo(i) { console.log("foo", i); } function main() { for (i = 0; i < 3; i++) { foo(i); } // 👇 (there used to be a 'return' here) bar(); } main();
Shell session
$ node sample.js foo 0 foo 1 foo 2 /home/amos/bearcove/lox/sample.js:10 bar(); ^ ReferenceError: bar is not defined at main (/home/amos/bearcove/lox/sample.js:10:3) at Object.
(/home/amos/bearcove/lox/sample.js:13:1) at Module._compile (node:internal/modules/cjs/loader:1101:14) at Object.Module._extensions..js (node:internal/modules/cjs/loader:1153:10) at Module.load (node:internal/modules/cjs/loader:981:32) at Function.Module._load (node:internal/modules/cjs/loader:822:12) at Function.executeUserEntryPoint [as runMain] (node:internal/modules/run_main:81:12) at node:internal/main/run_main_module:17:47
However, both the Go and Rust compilers, through different machinery, eventually
generate some native executable that is full of machine code, and relatively
self-contained.
And thus, they must know what code to emit for the whole main
function.
Including the address of bar
, which, although it is in an unreachable portion
of the code, we still wrote a “call” instruction for in our source code.
If we wanted to reproduce roughly what’s happening in node.js, we’d need to use
a function pointer instead, which could be null, or point to a valid function:
and we’d only find out when we actually call it.
This Go code compiles:
Go code
package main import "log" func foo(i int) { log.Printf("foo %d", i) } type Bar func() var bar Bar func main() { for i := 0; i < 3; i++ { foo(i) } bar() }
Shell session
$ go build ./sample.go
But panics during execution:
Shell session
$ ./sample 2022/02/06 18:08:06 foo 0 2022/02/06 18:08:06 foo 1 2022/02/06 18:08:06 foo 2 panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x48756e] goroutine 1 [running]: main.main() /home/amos/bearcove/lox/sample.go:17 +0x6e
However, it stops panicking if we actually initialize bar
to some valid
implementation:
Go code
package main import "log" func foo(i int) { log.Printf("foo %d", i) } type Bar func() var bar Bar // 👇 we initialize bar in an `init` function, called implicitly at startup func init() { bar = func() { log.Printf("bar!") } } func main() { for i := 0; i < 3; i++ { foo(i) } bar() }
We can do the same little charade in Rust:
Rust code
fn foo(i: usize) { println!("foo {}", i); } fn bar_impl() { println!("bar!"); } static BAR: fn() = bar_impl; fn main() { for i in 0..=2 { foo(i) } BAR() }
Shell session
$ cargo run Compiling lox v0.1.0 (/home/amos/bearcove/lox) Finished dev [unoptimized + debuginfo] target(s) in 0.14s Running `target/debug/lox` foo 0 foo 1 foo 2 bar!
Although, reproducing the crash is harder. Because we can’t just declare a
function pointer that points to nothing.
Rust code
$ fn foo(i: usize) { println!("foo {}", i); } // 👇 static BAR: fn(); fn main() { for i in 0..=2 { foo(i) } BAR() }
Shell session
$ cargo run Compiling lox v0.1.0 (/home/amos/bearcove/lox) error: free static item without body --> src/main.rs:5:1 | 5 | static BAR: fn(); | ^^^^^^^^^^^^^^^^- | | | help: provide a definition for the static: `=
;` error: could not compile `lox` due to previous error
If we want to account for the possibility of bar being there or not there, we
must change its type to Option
instead:
Rust code
fn foo(i: usize) { println!("foo {}", i); } // 👇 static BAR: Option<fn()>; fn main() { for i in 0..=2 { foo(i) } BAR() }
And we still must assign it something.
Shell session
$ cargo run Compiling lox v0.1.0 (/home/amos/bearcove/lox) error: free static item without body --> src/main.rs:5:1 | 5 | static BAR: Option
; | ^^^^^^^^^^^^^^^^^^^^^^^^- | | | help: provide a definition for the static: `= ; (other errors omitted)
In this case, we’ll assign None
because I’m trying to showcase what would
happen if bar
did not exist:
Rust code
fn foo(i: usize) { println!("foo {}", i); } static BAR: Option<fn()> = None; fn main() { for i in 0..=2 { foo(i) } BAR() }
But now we have an error at the call site:
Shell session
$ cargo run Compiling lox v0.1.0 (/home/amos/bearcove/lox) error[E0618]: expected function, found enum variant `BAR` --> src/main.rs:11:5 | 5 | static BAR: Option
= None; | -------------------------------- `BAR` defined here ... 11 | BAR() | ^^^-- | | | call expression requires function | help: `BAR` is a unit variant, you need to write it without the parentheses | 11 - BAR() 11 + BAR | For more information about this error, try `rustc --explain E0618`. error: could not compile `lox` due to previous error
Because now, BAR
is not a function, that can be called, it’s an Option
,
which could be one of either Some(f)
(where f
is a function we can call),
or None
(indicating the absence of a function we can call).
So, Rust forces us to account for both cases, which we can do with a match
for
example:
Rust code
fn foo(i: usize) { println!("foo {}", i); } static BAR: Option<fn()> = None; fn main() { for i in 0..=2 { foo(i) } match BAR { Some(f) => f(), None => println!("(no bar implementation found)"), } }
Shell session
$ cargo run Compiling lox v0.1.0 (/home/amos/bearcove/lox) Finished dev [unoptimized + debuginfo] target(s) in 0.24s Running `target/debug/lox` foo 0 foo 1 foo 2 (no bar implementation found)
And, with BAR
set to the Some
variant:
Rust code
fn foo(i: usize) { println!("foo {}", i); } static BAR: Option<fn()> = Some({ // we could define this outside the option, but we don't have to! // this is just showing off, but I couldn't resist, because it's fun. fn bar_impl() { println!("bar!"); } // the last expression of a block (`{}`) is what the block evaluates to bar_impl }); fn main() { for i in 0..=2 { foo(i) } match BAR { Some(f) => f(), None => println!("(no bar implementation found)"), } }
Shell session
$ cargo run Finished dev [unoptimized + debuginfo] target(s) in 0.00s Running `target/debug/lox` foo 0 foo 1 foo 2 bar!
So, if we compare the stance of all three languages here:
- JavaScript (via node.js here) plain doesn’t care if
bar()
exists until
you actually call it. - Go cares if it’s a regular function call, but it will let you build a
function pointer that points to nowhere, and panic at runtime - Rust will not let you build a function pointer that
points to nowhere at all.
JavaScript’s looseness is not an oversight here: the mechanism which it uses to
look up symbols is completely different from Go and Rust. Even though there’s
no mention of bar
anywhere in our code, it might still exist, as evidenced
by this ~crime~ sample code:
JavaScript code
function foo(i) { console.log("foo", i); } eval( `mruhgr4hgx&C&./&CD&iutyurk4rum.(hgx'(/A` .split("") .map((c) => String.fromCharCode(c.charCodeAt(0) - 6)) .join(""), ); function main() { for (i = 0; i < 3; i++) { foo(i); } bar(); } main();
Shell session
$ node sample.js foo 0 foo 1 foo 2 bar!
As for Rust, I should specify: safe Rust doesn’t let you do that.
If we let ourself write unsafe
code, an essential part of Rust, without
which one could not build safe abstractions atop the standard C library, or
system calls, for example, we can achieve crashdom:
Rust code
fn foo(i: usize) { println!("foo {}", i); } // initialize BAR with some garbage static BAR: fn() = unsafe { std::mem::transmute(&()) }; fn main() { for i in 0..=2 { foo(i) } BAR(); }
Shell session
$ cargo run Compiling lox v0.1.0 (/home/amos/bearcove/lox) error[E0080]: it is undefined behavior to use this value --> src/main.rs:5:1 | 5 | static BAR: fn() = unsafe { std::mem::transmute(&()) }; | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ type validation failed: encountered pointer to alloc4, but expected a function pointer | = note: The rules on what exactly is undefined behavior aren't clear, so this check might be overzealous. Please open an issue on the rustc repository if you believe it should not be considered undefined behavior. = note: the raw bytes of the constant (size: 8, align: 8) { ╾───────alloc4────────╼ │ ╾──────╼ } For more information about this error, try `rustc --explain E0080`. error: could not compile `lox` due to previous error
Mh, nope, it caught that one. Huh.
Fine, let’s do this then:
Rust code
fn foo(i: usize) { println!("foo {}", i); } const BAR: *const () = std::ptr::null(); fn main() { for i in 0..=2 { foo(i) } let bar: fn() = unsafe { std::mem::transmute(BAR) }; bar(); }
Shell session
$ cargo run Compiling lox v0.1.0 (/home/amos/bearcove/lox) Finished dev [unoptimized + debuginfo] target(s) in 0.13s Running `target/debug/lox` foo 0 foo 1 foo 2 zsh: segmentation fault (core dumped) cargo run
There. We had to go out of our way to ask for it, but we got it.
So, of those three languages, it wouldn’t be unreasonable to say that JavaScript
is the loosest one (letting us add things to the global scope at runtime,
evaluate arbitrary strings, etc.), Rust is the strictest one (not letting us
create a dangling function pointer in safe Rust), and Go is somewhere in the
middle.
Also, types
Similarly, we can see a clear distinction in how those three languages treat
types.
It is extremely easy (too easy perhaps) to make a JavaScript function that can
“add” two arbitrary things. Because function parameters don’t have types.
So, an add
function will just as happily add together two numbers, as it will
concatenate two strings:
JavaScript code
function add(a, b) { return a + b; } function main() { console.log(add(1, 2)); console.log(add("foo", "bar")); } main();
Shell session
$ node sample.js 3 foobar
In Go, it’s not that easy, because we have to pick a type.
We can do numbers:
Go code
package main import "log" func add(a int, b int) int { return a + b } func main() { log.Printf("%v", add(1, 2)) }
Shell session
$ go run ./sample.go 2022/02/06 19:01:55 3
And we can do strings:
Go code
package main import "log" func add(a string, b string) string { return a + b } func main() { log.Printf("%v", add("foo", "bar")) }
Shell session
$ go run ./sample.go 2022/02/06 19:02:25 foobar
But we can’t do both.
Or can we?
Go code
package main import "log" func add(a interface{}, b interface{}) interface{} { if a, ok := a.(int); ok { if b, ok := b.(int); ok { return a + b } } if a, ok := a.(string); ok { if b, ok := b.(string); ok { return a + b } } panic("incompatible types") } func main() { log.Printf("%v", add(1, 2)) log.Printf("%v", add("foo", "bar")) }
Shell session
$ go run ./sample.go 2022/02/06 19:05:11 3 2022/02/06 19:05:11 foobar
It’s… not very good, though. add(1, "foo")
will compile, but panic at
runtime, for example.
Luckily, Go 1.18 beta added generics, so maybe?
Go code
package main import "log" func add[T int64 | string](a T, b T) T { return a + b } func main() { log.Printf("%v", add(1, 2)) log.Printf("%v", add("foo", "bar")) }
Shell session
$ go run ./main.go ./main.go:10:22: int does not implement int64|string
Ah. Let’s see what the type parameters
proposal
suggests… oh. Okay.
Go code
package main import "log" type Addable interface { ~int | ~int8 | ~int16 | ~int32 | ~int64 | ~uint | ~uint8 | ~uint16 | ~uint32 | ~uint64 | ~uintptr | ~float32 | ~float64 | ~complex64 | ~complex128 | ~string } func add[T Addable](a T, b T) T { return a + b } func main() { log.Printf("%v", add(1, 2)) log.Printf("%v", add("foo", "bar")) }
Shell session
$ go run ./main.go 2022/02/06 19:12:11 3 2022/02/06 19:12:11 foobar
Sure, that… that works. But I mean, we’re not expressing a property of a type,
so much as listing all the types we can think of. I guess nobody will ever want
to implement the +
operator for a user type. Or add int128
/ uint128
to
the language.
Ah well.
As for contestant number 3, well… surely it’s going to do great right? After
all, these articles are just thinly-veiled Rust propaganda, so surely it’ll…
Rust code
use std::ops::Add; fn add<T>(a: T, b: T) -> T::Output where T: Add<T>, { a + b } fn main() { dbg!(add(1, 2)); dbg!(add("foo", "bar")); }
Shell session
$ cargo run Compiling lox v0.1.0 (/home/amos/bearcove/lox) error[E0277]: cannot add `&str` to `&str` --> src/main.rs:12:10 | 12 | dbg!(add("foo", "bar")); | ^^^ no implementation for `&str + &str` | = help: the trait `Add` is not implemented for `&str` note: required by a bound in `add` --> src/main.rs:5:8 | 3 | fn add
(a: T, b: T) -> T::Output | --- required by a bound in this 4 | where 5 | T: Add , | ^^^^^^ required by this bound in `add` For more information about this error, try `rustc --explain E0277`. error: could not compile `lox` due to previous error
Huh.
I mean, that’s good if you’re into that sort of thing.
I am, so I like it: first of all, we’ve asked for “any type that we can add”,
not just listed a bunch of concrete types. We’ve also allowed for T + T
to
return a type that isn’t T
! The function’s return type is
, which could be anything.
Second of all, the diagnostic here is fantastic: it tells us what we asked,
why it couldn’t be satisfied… I like it.
It doesn’t really give the rationale behind Add
not being implemented for
&str
and &str
, so I still serve a purpose. &str
is just a string slice:
it refers to some data that exists elsewhere, and doesn’t carry ownership of
the data itself.
In our example, the data is in the executable itself:
Rust code
fn add<T>(_: T, _: T) -> T { todo!(); } fn main() { dbg!(add(1, 2)); dbg!(add("foo", "bar")); }
Shell session
$ cargo build --quiet $ objdump -s -j .rodata ./target/debug/lox | grep -B 3 -A 3 -E 'foo|bar' 3c0d0 03000000 00000000 02000000 00000000 ................ 3c0e0 00000000 00000000 02000000 00000000 ................ 3c0f0 00000000 00000000 20000000 04000000 ........ ....... 👇 3c100 03000000 00000000 62617266 6f6f6164 ........barfooad 3c110 64282266 6f6f222c 20226261 7222296e d("foo", "bar")n 3c120 6f742079 65742069 6d706c65 6d656e74 ot yet implement 3c130 65647372 632f6d61 696e2e72 73000000 edsrc/main.rs... 3c140 01000000 00000000 00000000 00000000 ................
…so it’s valid for the whole time the program is executed: the expression
"foo"
is a &'static str
.
But to join “foo” and “bar” together, we’d have to allocate some memory. One
fairly natural way to do this would be to create a String
, which would
allocate memory on the heap. And String
implements Deref
, so
anything we can do with a &str
, we can also do with a String
.
So, long story short, you can’t do &str + &str
. You can, however, do String + &str
. If we look at the docs,
we find the rationale:
impl<'_> Add<&'_ str> for String
Implements the
+
operator for concatenating two strings.This consumes the
String
on the left-hand side and re-uses its buffer (growing
it if necessary). This is done to avoid allocating a newString
and copying
the entire contents on every operation, which would lead to O(n^2) running
time when building an n-byte string by repeated concatenation.The string on the right-hand side is only borrowed; its contents are copied into
the returnedString
.
So, if we convert our parameters to String
with .to_string()
:
Rust code
use std::ops::Add; fn add<T>(a: T, b: T) -> T::Output where T: Add<T>, { a + b } fn main() { dbg!(add(1, 2)); dbg!(add("foo".to_string(), "bar".to_string())); }
Shell session
$ cargo run Compiling lox v0.1.0 (/home/amos/bearcove/lox) error[E0277]: cannot add `String` to `String` --> src/main.rs:12:10 | 12 | dbg!(add("foo".to_string(), "bar".to_string())); | ^^^ no implementation for `String + String` | = help: the trait `Add` is not implemented for `String` note: required by a bound in `add` --> src/main.rs:5:8 | 3 | fn add
(a: T, b: T) -> T::Output | --- required by a bound in this 4 | where 5 | T: Add , | ^^^^^^ required by this bound in `add` error[E0277]: cannot add `String` to `String` --> src/main.rs:12:10 | 12 | dbg!(add("foo".to_string(), "bar".to_string())); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ no implementation for `String + String` | = help: the trait `Add` is not implemented for `String` For more information about this error, try `rustc --explain E0277`. error: could not compile `lox` due to 2 previous errors
…it doesn’t work either.
Because there’s no impl Add
.
Only impl Add<&str, Output = String> for String
. We don’t need ownership of
the right-hand-side operand for +
: we’re merely reading it and copying it
immediately following the contents of the left-hand-side operand.
So, we can make our code work, if we accept arguments of two different types:
Rust code
use std::ops::Add; fn add<A, B>(a: A, b: B) -> A::Output where A: Add<B>, { a + b } fn main() { dbg!(add(1, 2)); dbg!(add("foo".to_string(), "bar")); }
Shell session
$ cargo run Compiling lox v0.1.0 (/home/amos/bearcove/lox) Finished dev [unoptimized + debuginfo] target(s) in 0.21s Running `target/debug/lox` [src/main.rs:11] add(1, 2) = 3 [src/main.rs:12] add("foo".to_string(), "bar") = "foobar"
So, how did Rust fare here? Depends how you feel about things.
It’s a pretty radical design choice, to force you to be aware that, yeah, since
you’re creating a new value (out of the two parameters), you will have to
allocate. And so it forces you to allocate outside of the Add
operation itself.
Rust code
fn main() { // I know `to_string` allocates, it's not hidden behind `+`. // the `+` may reallocate (to grow the `String`). let foobar = "foo".to_string() + "bar"; dbg!(&foobar); }
Rust code
fn main() { let foo: String = "foo".into(); let bar: String = "bar".into(); // 🛑 this doesn't build: // the right-hand-side cannot be a `String`, it has to be a string slice, // e.g. `&str` let foobar = foo + bar; dbg!(&foobar); }
Rust code
fn main() { let foo: String = "foo".into(); let bar: String = "bar".into(); // this builds fine! let foobar = foo + &bar; dbg!(&foobar); }
Rust code
fn main() { let foo: String = "foo".into(); let bar: String = "bar".into(); let foobar = foo + &bar; dbg!(&foobar); // 🛑 this doesn't build! // `foo` was moved during the first addition (it was reused to store the // result of concatenating the two strings) let foobar = foo + &bar; dbg!(&foobar); }
Rust code
fn main() { let foo: String = "foo".into(); let bar: String = "bar".into(); let foobar = foo.clone() + &bar; dbg!(&foobar); // this builds fine! we've cloned foo in the previous addition, which // allocates. again, nothing is hidden in the implementation of `+`. let foobar = foo + &bar; dbg!(&foobar); }
That aspect of Rust is a turn-off to some folks. Even to folks who otherwise
love Rust for many other reasons. It’s often been said that there’s a
higher-level language (where you don’t worry about allocating so much) inside of
Rust, waiting to be discovered.
Well, we’re still waiting.
However, that very aspect is also what makes Rust so appealing for systems
programming. And its razor-sharp focus on ownership, lifetimes etc. is also the
underpinning of many of its safety guarantees.
Losing the thread
So, now that we’ve looked at unreachable code / undefined symbols, and types,
let’s talk about concurrency!
Code is said to be “concurrent” when several tasks can make progress at the same
time. And there’s multiple mechanisms through which this can be achieved.
JavaScript definitely lets us write concurrent code:
JavaScript code
function sleep(ms) { return new Promise((resolve, _reject) => setTimeout(resolve, ms)); } async function doWork(name) { for (let i = 0; i < 30; i++) { await sleep(Math.random() * 40); process.stdout.write(name); } } Promise.all(["a", "b"].map(doWork)).then(() => { process.stdout.write("n"); });
And it runs fine in node.js:
Shell session
$ node sample.js abbaabbababaababbababbaabaabaababaabbabababaaababbbaababbabb
We can see here that both task “a” and task “b” are making progress,
concurrently. Not in parallel: they never actually make progress at the same
time, they just do little bits of progress one after the other, and, to the
outside observer, there’s hardly a difference.
That means, for example, that you can definitely use node.js to write server
applications, that are serving a large number of concurrent requests.
Because you don’t strictly need request handler to run in parallel, you just
need to them to process input as it comes in: oh, a client is trying to connect,
let’s accept their connection! They sent a client hello, let’s send a server
hello so we can complete a TLS handshake.
Now the request’s coming in, there’s one header, two, three, etc. – this can all
be done piecemeal. And then we can stream a body back to them, one spoonful at a
time, where spoons are actually buffers.
node.js actually does offer
threads, but you
wouldn’t use them to handle HTTP requests concurrently – rather, you’d use them
to do cpu-intensive tasks in the background, not i/o-bound stuff.
If we turn our attention to Go, we can make a similar program fairly easily:
Go code
package main import ( "fmt" "math/rand" "sync" "time" ) func doWork(name string) { for i := 0; i < 30; i++ { time.Sleep(time.Duration(rand.Intn(40)) * time.Millisecond) fmt.Printf("%v", name) } } func main() { var wg sync.WaitGroup for name := range []string{"a", "b"} { wg.Add(1) go func() { defer wg.Done() doWork(name) }() } wg.Wait() fmt.Printf("n") }
Shell session
$ go run ./sample.go # command-line-arguments ./sample.go:24:10: cannot use name (type int) as type string in argument to doWork
Haha whoops, the “for range” syntax actually yields two values, and the first
one is the index, so we have to ignore it by binding it to _
.
Let’s try this again:
Go code
// omitted: package, imports, func doWork func main() { var wg sync.WaitGroup for _, name := range []string{"a", "b"} { wg.Add(1) go func() { defer wg.Done() doWork(name) }() } wg.Wait() fmt.Printf("n") }
Shell session
$ go run ./sample.go bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb
Oh dear, there was another mix up. Yet there were no compiler warnings,
strange…
Let’s try go vet?
Shell session
$ go vet ./sample.go # command-line-arguments ./sample.go:24:11: loop variable name captured by func literal
Haha right! Closures do work like that in Go.
Go code
func main() { var wg sync.WaitGroup for _, name := range []string{"a", "b"} { wg.Add(1) name := name go func() { defer wg.Done() doWork(name) }() } wg.Wait() fmt.Printf("n") }
There!
I guess that’s why the Go compiler is so fast – it barely checks for anything!
Now now, we’re not here to disparage any specific language.
Yeah no okay but I mean… how often have you made those mistakes?
Ahh but maybe it’s me who’s stupid. Maybe I’m a stupid stupid boy who just
will not learn – after all, only bad craftspeople blame their tools!
That is… that is so incredibly dense. Do you have any idea of the damage that
stupid, evil idiom has caused?
I don’t know, sounds to me like maybe you’re a bad craftsbear.
AS IT TURNS OUT, good craftspeople don’t shy away from questioning, criticizing,
and trying to improve their tools (or even switching to other tools entirely!)
It’s… it’s what they work with. What did you expect.
Anyone trying to tell you otherwise is probably romanticizing martyrdom and
fooling themselves into thinking that “hard work” must equate suffering, and
you deserve better companionship.
Anyway – the program finally does the thing:
Shell session
$ go run ./sample.go babbababaabbbabbbababbaabbbaabbabababbababbabaababbaaaaaaaaa
There’s really no observable difference, running both program like that in a
shell. We just see a stream of “a” and “b” randomly coming in. Two instances of
“doWork” are also executing concurrently in Go.
But there is an actual difference: see, Go has threads.
If we run our node.js example again, but under strace
, to look out for the
write
syscall, and reduce the number of iterations to 5 for a more manageable
output, we can see that…
Shell session
❯ strace -f -e write node ./sample.js > /dev/null write(5, "*", 1) = 1 strace: Process 1396685 attached strace: Process 1396686 attached strace: Process 1396687 attached strace: Process 1396688 attached strace: Process 1396689 attached [pid 1396684] write(16, "1 ", 8) = 8 strace: Process 1396690 attached [pid 1396684] write(1, "b", 1) = 1 [pid 1396684] write(1, "b", 1) = 1 [pid 1396684] write(1, "a", 1) = 1 [pid 1396684] write(1, "a", 1) = 1 [pid 1396684] write(1, "b", 1) = 1 [pid 1396684] write(1, "a", 1) = 1 [pid 1396684] write(1, "b", 1) = 1 [pid 1396684] write(1, "a", 1) = 1 [pid 1396684] write(1, "a", 1) = 1 [pid 1396684] write(1, "b", 1) = 1 [pid 1396684] write(1, "n", 1) = 1 [pid 1396684] write(12, "1 ", 8) = 8 [pid 1396689] +++ exited with 0 +++ [pid 1396688] +++ exited with 0 +++ [pid 1396687] +++ exited with 0 +++ [pid 1396686] +++ exited with 0 +++ [pid 1396685] +++ exited with 0 +++ [pid 1396690] +++ exited with 0 +++ +++ exited with 0 +++
strace
intercepts and prints information about system calls made by a process.
The -f
option stands for “follow forks”, and it’s especially useful because it
prefixes every line of output with a “pid”, which stands for “process
identifier”, but really, on Linux (where this experiment was done), processes
and threads are very much alike, and so we can actually pretend those pids are
tids (thread identifiers) instead.
…we can see that both “a” and “b” are written by the same thread (PID 1396684).
But if we run the Go program:
Shell session
$ go build ./sample.go && strace -f -e write ./sample > /dev/null strace: Process 1398810 attached strace: Process 1398811 attached strace: Process 1398812 attached strace: Process 1398813 attached [pid 1398813] write(1, "b", 1) = 1 [pid 1398809] write(1, "a", 1) = 1 [pid 1398813] write(1, "b", 1) = 1 [pid 1398813] write(5, " ", 1) = 1 [pid 1398809] write(1, "b", 1) = 1 [pid 1398813] write(1, "a", 1) = 1 [pid 1398809] write(1, "b", 1) = 1 [pid 1398813] write(1, "a", 1) = 1 [pid 1398813] write(5, " ", 1) = 1 [pid 1398809] write(1, "a", 1) = 1 [pid 1398813] write(1, "b", 1) = 1 [pid 1398809] write(1, "a", 1) = 1 [pid 1398809] write(1, "n", 1) = 1 [pid 1398813] +++ exited with 0 +++ [pid 1398812] +++ exited with 0 +++ [pid 1398811] +++ exited with 0 +++ [pid 1398810] +++ exited with 0 +++ +++ exited with 0 +++
We can see that:
- “a” is written by PID 1398809
- “b” is written by PID 1398813
…and we can see that occasionally, something writes the null byte (