In programming pedagogy, monads have a place as a mystical object from the functional programming world that’s hard to understand and even harder to explain. The stereotype about monad explanations is that they fall into two buckets: either comparisons to some kind of food item, or throwing complex mathematical jargon at you, what’s the problem?
But monads aren’t esoteric or magical at all, nor do they only occur in functional programming. In essence, a monad is a design pattern that allows you to chain together operations within a framework. Noticing monadic design can be quite helpful for programmers in any environment, particularly because it’s often undesirable! In many situations, monads have observable tradeoffs, and sometimes (as here) we can even collect concrete data to back this up.
I’m going to try and explain monads in a way that is:
- Geared towards Rust developers, with code samples in Rust, though I hope any sufficiently curious programmer can follow along
- Free of jargon: no mathematical formalism whatsoever
- Without analogies of any kind, and grounded in real programming problems
- Non-trivial: focusing on a production-worthy example with objectively measurable implications
- Practical: with advice all of us coders can use
In other words, I’m going to try and write the monad tutorial that I personally would have appreciated when I was younger. And I’m going to start at a place that’s unconventional: through property-based testing, where monads have profound performance characteristics.
Note: While this article’s primary goal is to explain monads, it also serves as a practical introduction to property-based testing and fault injection techniques. If you’re new to these, you’ll find an introduction to both alongside the monad explanation.
This post consists of five sections:
- Property-based testing goes over the basics
- Drawing the rest of the owl talks about a complex scenario: using property-based testing to inject faults
- Integrated shrinking shows how to reduce inputs of challenging complexity to smaller sizes
- Monads, finally is where we introduce monads in this context, and provide data for how costly they can be
- Rediscovering structure discusses some ways to mitigate the tradeoffs of monads in property-based testing
1. Property-based testing#
Testing is fundamentally about building models for how your code should behave, at just the right level of complexity: they should match the scope of what you’re testing, without going overboard and reinventing the whole system a second time.
The best explication of this general idea I’ve seen is in this piece by the great Argentinian writer Jorge Luis Borges:
…In that Empire, the Art of Cartography attained such Perfection that the map of a single Province occupied the entirety of a City, and the map of the Empire, the entirety of a Province. In time, those Unconscionable Maps no longer satisfied, and the Cartographers Guilds struck a Map of the Empire whose size was that of the Empire, and which coincided point for point with it…”
—On Exactitude in Science, Jorge Luis Borges
Nothing quite exemplifies testing-as-modeling like property-based testing—an approach where instead of specifying exact examples, you define models in terms of properties, or invariants, that your code should satisfy. Then, you test your models against randomly generated inputs.
Let’s take a simple example of a sort function, say my_sort
, defined over a slice of integers:
fn my_sort(slice: &mut [u64]) {
// ...
}
How should we go about testing it?
The most common way to do this is to list out a bunch of inputs and ensure they are correctly sorted, through example-based tests.
#[test]
fn test_my_sort() {
let mut input = [1, 2, 0, 2, 0, 5, 6, 9, 0, 3, 1];
my_sort(&mut input);
assert_eq!(input, [0, 0, 0, 1, 1, 2, 2, 3, 5, 6, 9]);
// More examples listed out.
}
Example-based tests are quite valuable, because they are easy to write and quite direct about what happens. But even in a simple example like sorting, it’s easy to imagine cases where your examples don’t quite cover every edge case.
How can more edge cases be covered? Well, one way to do so is to step back and ask, what is the sort trying to do? The goal of a sort function is to ensure that all the elements are in ascending order. Can we test that directly?
The first thing we’d need is to get some inputs to test with. All we care about is a list of numbers here, which seems like it should be easy to generate using a random number generator.
So maybe we write something like:
#[test]
fn test_my_sort_2() {
// Run the test 1024 times.
for _ in 0..1024 {
// Generate a random list of, say, 0 to 512 elements, with values
// between 0 and 10000.
let input = /* ... */;
let mut output = input.clone();
// Call the sort function on it.
my_sort(&mut output);
// Check that all values are in ascending order.
for i in 1..output.len() {
assert!(
output[i - 1] <= output[i],
"input {input:?} failed at index {i}, output {output:?}",
);
}
}
}
We now have a model of sorting that we’ve written down in code form: any pair of values must be in ascending order. (In this view, example-based tests are also simple models: for input X, the output should be Y.)
Now, we run the test, and…
thread 'test_my_sort_2' panicked at tests/tests.rs:33:13:
input [7496, 2087, 6900, 7927, 3840, 3065, 6472, 1186, 6464, 4512, 251, 5591, 3410, 2033, 5367, 2202, 5544, 2434, 6491, 8999, 9818, 2885, 8683, 1201, 6115, 2584, 2473, 6817, 5765, 5196, 9389, 5799, 9012, 293, 38, 1024, 9569, 4654, 7449, 7389, 8088, 5074, 3110, 938, 4944, 3859, 7368, 8978, 7524, 9503, 7406, 7591, 8213, 6445, 7000, 7354, 8967, 5549, 7935, 1866, 4048, 4043, 8905, 3154, 4771, 2364, 3982, 5088, 7317, 233, 3396, 1810, 3022, 9065, 454, 6181, 8257, 9598, 3982, 920, 5880, 4165, 4164, 930, 560, 9062, 5587, 6271, 5878, 2495, 9055, 3877, 4352, 1228, 8287, 8901, 3442, 373, 3635, 5316, 4423, 7688, 7919, 4465, 8991, 7043, 7696, 6875, 1478, 2428, 5127, 6809, 6175, 1415, 7263, 5145, 4153, 876, 1528, 6781, 5627, 6750, 3665, 2567, 6855, 141, 2144, 4491, 9121, 7982, 4131, 6337, 1926, 8797, 9382, 1702, 9559, 3910, 1715, 6661, 269, 4366, 6185, 5616, 365, 808, 4864, 3657, 9574, 3057, 7760, 6375, 2326, 7273, 6303, 7018, 8988, 6271, 988, 7796, 2390, 1689, 4279, 9586, 151, 9738, 3659, 7064, 1529, 8237, 4211, 2272, 8909, 7638] failed at index 173, output [38, 141, 151, 233, 251, 269, 293, 365, 373, 454, 560, 808, 876, 920, 930, 938, 988, 1024, 1186, 1201, 1228, 1415, 1478, 1528, 1529, 1689, 1702, 1715, 1810, 1866, 1926, 2033, 2087, 2144, 2202, 2272, 2326, 2364, 2390, 2428, 2434, 2473, 2495, 2567, 2584, 2885, 3022, 3057, 3065, 3110, 3154, 3396, 3410, 3442, 3635, 3657, 3659, 3665, 3840, 3859, 3877, 3910, 3982, 3982, 4043, 4048, 4131, 4153, 4164, 4165, 4211, 4279, 4352, 4366, 4423, 4465, 4491, 4512, 4654, 4771, 4864, 4944, 5074, 5088, 5127, 5145, 5196, 5316, 5367, 5544, 5549, 5587, 5591, 5616, 5627, 5765, 5799, 5878, 5880, 6115, 6175, 6181, 6185, 6271, 6271, 6303, 6337, 6375, 6445, 6464, 6472, 6491, 6661, 6750, 6781, 6809, 6817, 6855, 6875, 6900, 7000, 7018, 7043, 7064, 7263, 7273, 7317, 7354, 7368, 7389, 7406, 7449, 7496, 7524, 7591, 7638, 7688, 7696, 7760, 7796, 7919, 7927, 7935, 7982, 8088, 8213, 8237, 8257, 8287, 8683, 8797, 8901, 8905, 8909, 8967, 8978, 8988, 8991, 8999, 9012, 9055, 9062, 9065, 9121, 9382, 9389, 9503, 9559, 9569, 9574, 9586, 9598, 9818, 9738]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
Whoops, looks like the function has a bug. (Scroll the above example to the right!)
This example is quite unhelpful and hard to understand! It is possible to use this as an input to debug with, but it is quite painful. If we could use automation to turn this test case into a much smaller one that can still reproduce the bug, debugging becomes significantly easier. The process of doing so is called test case shrinking or reduction.
To recap—property-based testing consists of two components:
- Test case generation using a source of randomness.
- On failing a test, shrinking it down to a smaller, more understandable size.
Implementing a manual shrinker#
What counts as “smaller”? For a list of numbers, ideally we’d be able to minimize both the number of items in the list and the integers themselves. This suggests an algorithm for how to write a shrinker by hand:
-
First, try and minimize the size of the list using a binary search algorithm. For example:
- Try the first half of the list against the function.
- If that exhibits an error, attempt to recursively shrink this half.
- If that doesn’t work, try it with the last half of the list.
- If neither work or if the list has 1 or fewer elements, move on to the next step.
-
Once the list has been shrunk, start shrinking the elements within the list, applying binary search to each element.
After you’re done writing this algorithm, you’re well on your way towards creating the original property-based testing library, QuickCheck. This approach has you write two functions: a generator and a shrinker.
With this, you can get much more reasonable-looking output:
input [58, 33] failed at index 1, output [58, 33]
And for relatively simple cases like lists of integers, this kind of shrinking works quite well!
But we’re not here to test simple cases. We’re here for the difficult ones.
2. Drawing the rest of the owl#

Most real-world sort implementations don’t just work over a list of integers. They’re written to be polymorphic over anything which can be sorted. In Rust parlance, this means anything that implements the Ord
trait; and even if not, a custom comparator function can be provided.
But Ord
can be written by hand, and custom comparators are virtually always written by hand.
One immediate consequence is that it’s possible that the comparator function says two elements are equal, but they are actually different. In that case, should the order of elements be preserved?
- A sort implementation which preserves the order is called a stable sort.
- An implementation which does not is called an unstable sort.
Unstable sorts tend to be faster than stable ones, and there are valid reasons for preferring each at different times. (The Rust standard library has separate implementations for stable and unstable sorts.)
Additionally, hand-written implementations mean users can make mistakes! A production-grade sort algorithm must behave reasonably in the face of arbitrary user input, not just in the actual elements being sorted but also in the comparator function (full credit to Lukas Bergdoll for his extensive research here):
-
Ord safety: Users can write a comparator that’s simply incorrect. An easy way is to introduce a difference between ordering and equality, for example by returning
Ordering::Less
for two elements that are actually equal. Users could also return different answers for the same comparison when called at different times1. -
Panic safety: The comparator can panic in the middle of execution. Since panics can be caught, the input should be in some kind of valid state afterwards.
-
Observation safety: If any of the inputs are mutated by the comparator, those mutations must be carried through to the final output. (With Rust, mutation through shared references is possible via interior mutability, as seen in
RefCell
orMutex
).
In these cases, completing the sort successfully becomes impossible. But it’s important that we leave the input in a reasonable state.
How do we go about testing this? Trying to think of all the different failure modes seems really hard! But property-based testing can address this need through randomized fault injection.
Let’s focus on Ord safety for now, with a comparator that flips around the result 20% of the time:
#[derive(Clone, Copy, Debug)]
enum OrdBehavior {
Regular,
Flipped,
}
struct BadType {
value: u64,
ord_behavior: RefCell<Vec<OrdBehavior>>,
}
impl Ord for BadType {
fn cmp(&self, other: &Self) -> Ordering {
// Get the next behavior from the list.
match self.ord_behavior.borrow_mut().pop() {
Some(OrdBehavior::Regular) | None => {
self.value.cmp(&other.value)
}
OrdBehavior::Flipped => {
// Flip the behavior.
other.value.cmp(&self.value)
}
}
}
}
To generate a BadType
:
fn generate_bad_type() -> BadType {
// Generate a value between 0 and 10000;
let value = /* ... */;
// Generate a list of behaviors of length 0..128, where the elements are
// Regular 4/5 times and Flipped 1/5 times.
let ord_behavior: Vec<OrdBehavior> = /* ... */;
BadType {
value,
ord_behavior: RefCell::new(ord_behavior),
}
}
And to test this:
#[test]
fn test_my_sort_3() {
// Run the test 1024 times.
for _ in 0..1024 {
// Generate a list of BadTypes using generate_bad_type.
let input: Vec<BadType> = /* ... */;
// Call sort as before.
let mut output = input.clone();
my_sort(&mut output);
// Sorting isn't really well-defined in this case, but we can
// ensure two properties:
//
// 1. my_sort doesn't panic (implicitly checked by getting here)
// 2. all the input values are still present in the output
let mut input_values: Vec<u64> =
input.iter().map(|v| v.value).collect();
let mut output_values: Vec<u64> =
output.iter().map(|v| v.value).collect();
// Sort the input and output values, and assert that they match.
my_sort(&mut input_values);
my_sort(&mut output_values);
assert_eq!(input_values, output_values);
}
}
Our original approach continues to work well—that is, right until the test finds a bug and we need to shrink a failing input.
3. Integrated shrinking#
How does one go about writing a shrinker for Vec
? Doing so seemed relatively straightforward for a list of integers. But this is a list where the elements are pairs of an integer and another list. Also:
-
We’ve just tested Ord safety so far—once we’ve added fault injection for other kinds of safety, the complexity seems endless.
-
And even more importantly, there isn’t a great way to compose smaller shrinkers together to form larger ones. Writing shrinkers is a lot of work already, and what’s the point if you have to keep doing all of it over and over?
The practical result is that most of the time, writing a shrinker for types like Vec
is quite difficult. And writing one is also technically optional, since:
- If the test passes, shrinkers are never invoked. Simply write correct code, and shrinking just isn’t an issue!
- If the test fails, developers can debug using the original input. It’s painful but possible.
All in all, given the choice of writing a shrinker by hand or just moving on with their lives, most developers tend to choose the latter2. Because of this, most modern property-based testing frameworks, like proptest in Rust, try and take care of shrinking for you through some notion of integrated shrinking.
The idea behind integrated shrinking is: When you generate a random input, you don’t just generate the value itself. You also generate some context that is helpful for reducing the size of the input.
- In proptest, this combined value and context is called a value tree.
- Any implementation that accepts a random number generator and turns it into a value tree is called a strategy.
The proptest library comes with many different kinds of strategies that can be composed together to create more complex ones. To generate an instance of OrdBehavior
, we’re going to use two strategies:
- The
Just
strategy, which “just” returns a single value. - The
prop_oneof
strategy, which generates values from one of a possible list of strategies, where each choice has a given probabi