In this blog post I’ll talk about the theoretical background and the technical
implementation of cola, a text CRDT for real-time collaborative editing
written in Rust. Reading this is not at all necessary to be able to use cola,
so if that’s all you’re interested in you can safely skip this post and go
straight to the documentation.
The post is divided into three parts:
-
in the first part we’ll go over how to
represent the state of the document and the edits that transform it in a
way that guarantees convergence across all replicas. The ideas presented in
this part are widely covered in the literature,
so if you’re already familiar with text CRDTs you can probably skip it; -
in the second part we’ll see how to efficiently
implement the framework discussed in the first part in code; -
finally, in the third part we’ll look at some
benchmarks comparing cola to other Rust-based CRDT libraries.
First part: Intro to text CRDTs
A Conflict-free Replicated Data Type, or CRDT, is an umbrella term for any data
structure that can be replicated and modified concurrently across multiple
sites, and is guaranteed to converge without the need for a central authority
to coordinate the changes.
There are many different kinds of CRDTs: counters, sets, maps, lists, trees,
etc. Here we’ll focus on a sequence CRDT that can be used to represent plain
text documents.
In the setting of our problem we have a number of peers on a distributed
network all editing the same document. Each peer can insert or delete text and
immediately see the changes applied to their local replica. These edits are
then sent to the other peers which somehow integrate them into their own
replicas.
The only assumption we make on the network layer is that every edit is received
by all the peers in the network. Edits can be sent multiple times and in any
order, but they must eventually reach all the peers.
Our goal is to design a data structure that can, at the very minimum, converge
to the same state at every peer once all the edits have been received.
The final document we end up with should also “make sense” from the user’s
point of view. For example, we could have a CRDT that sorts the inserted
characters in alphabetical order. This would technically solve all concurrency
problems, but I doubt anyone would use it.
Anchors in a sea of text
Let’s start by looking at the simplest approach: doing nothing. When someone
inserts or deletes some text we just broadcast the edit “as is”: insert "abc" at offset 8
, delete between offset 10 and offset 13
, etc.
This is clearly wrong because all we need to diverge is for another peer to
concurrently modify the document in the 0..offset
region.
The problem is that offsets depend on the state of the document at the time an
edit was made. We can’t just exchange offsets without also exchanging the
context they depend on.
What if we used characters to refer to positions in the document instead of
offsets? We could then send insert "abc" after the 'd'
or delete between the 'f' and the 'o'
.
When a peer receives that edit it doesn’t matter where the 'd'
ended up,
they’ll know they have to insert "abc"
after it.
But what if there are multiple 'd'
s in the document? What if the 'd'
was
deleted? We still have to think some things through, but the idea of using
content to refer to positions is a step in the right direction.
Let’s solve the first problem: how do we tell our 'd'
s apart?
In our universe time only flows in one direction, so we could
assign an increasing number to each character as it’s inserted into the
document. The first character a peer inserts gets the number 0, the second 1,
and so on. Timestamping in this way works because for every n, a peer can only
insert its n-th character once, so we can use this number as a stable
identifier for a position in the document.
Or can we? After all there could many n-th characters in the same document, one
for every peer that ever contributed to it. To distiguish them we also assign
each peer a unique identifier, called a ReplicaId
in cola, and
use the pair ReplicaId.n
to uniquely identify a character.
Guaranteeing uniqueness of the n half is both possible and easy: we just
increment a local counter. Doing the same for the ReplicaId
half without a
central server is not. The best we can do is to use random integers big enough
to make the probability of a collision negligible, like a
UUID.
This n number is also called a temporal offset in cola, and the ReplicaId.n
pair is called an Anchor
. Anchor
s are incredibly important
because they allow us to specify both insertions and deletions in a
concurrency-enabled way. Insertions are identified by a single Anchor
which
tells us where to insert the text, while deletions are identified by two
Anchor
s, one for the start and one for the end of the deleted region.
Breaking ties
When a peer integrates an insertion it may find other insertions in its replica
that have the same Anchor
. How do we decide which one comes first?
Intuitively we’d want the last insertion to come first.
But the concept of a “last” operation is not well defined in a distributed
system. Using wall clock timestamps is problematic because wall clocks lack
monotonicity and also suffer from clock drift. What we really want is a way to
determine if an insertion was made in an environment in which another insertion
was already present.
This is exactly what Lamport timestamps are used for. A Lamport
clock is a logical clock that is updated by a very simple algorithm: every time
a peer inserts some text it increases its clock by one, and when it receives a
remote insertion it sets the clock to max(current, remote_timestamp) + 1
.
This guarantees that if insertion A
is made by a peer that has already
integrated insertion B
, then A
’s Lamport timestamp will be greater than
B
’s.
We can now handle conflicting insertions by sorting them in descending order by
their Lamport timestamp.
start at the insertion’s Anchor. We then skip blocks until we find the first
one whose Lamport timestamp is less than or equal to the one of the insertion
that’s being integrated.
There’s one last case we need to handle: conflicting and concurrent
insertions, i.e. insertions with the same Anchor
and Lamport timestamp.
From the user’s point of view there’s not an order that’s more “correct” than
the others, so all we care about is consistency between peers. cola breaks the
tie by sorting in ascending order on the ReplicaId
of the peer that made the
insertion.
sorted by their ReplicaId in ascending order. Here the ‘R’ is inserted by
Peer 1 and the ‘D’ by Peer 2, so the ‘R’ goes before the ‘D’.
Deletions
With insertions out of the way let’s now focus on deletions. As I’ve briefly
mentioned, every time a peer deletes some text we transform the start and end
offsets representing the deleted range into Anchor
s, which we can broadcast
to the other peers.
Deletions are a bit easier to reason about because, unlike with insertions, we
don’t have to worry as much about concurrency or causality issues. Two peers
concurrently deleting the same region of text produces the same result as if
only one of them had done it: the text is gone.
There are however three problems we need to tackle before we can consider
deletions solved:
-
what do we do with the deleted regions of text? We can’t entirely remove
them from the document because there may be edits we haven’t yet seen
whoseAnchor
s lie within them; -
how do we know when a remote deletion is ready to be integrated? We’ll
diverge if a peer integrates a deletion before it has received all the
content that the peer who created the deletion had when they made it; -
within the region of text included between the start and end
Anchor
s of
a deletion, how do we determine which characters should be deleted and
which should be kept? We’ll diverge if the peer who integrates a deletion
deletes text that wasn’t yet received by the peer who created the
deletion.
T