Clocks and Causality – Ordering Events in Distributed Systems by alexzeitler

Share This Article

Sed ut perspiciatis unde.

Introduction

System events could be arranged in an order based on the time they occurred. Clocks keep time and produce timestamps. Conventional clocks (such as time-of-day clocks) use a common reference to learn time. That reference could be internal hardware or a common service that serves time using protocols like NTP. However, because of clock drifts and/or assumptions around network time delays, timestamps from conventional clocks are not always mutually comparable, and therefore events cannot be reliably ordered using timestamps from conventional clocks.

A logical clock is a custom clock that is designed to produce timestamps that can be reliably compared. (We will see shortly that timestamps from logical clocks take a different form than the timestamps from time-of-day clocks). If multiple nodes in a distributed system can rely on a centralized logical clock, then most issues discussed in this article become irrelevant. However, a centralized clock by definition is neither fault-tolerant nor can offer performance beyond a limit. In this article, therefore, we are primarily interested in distributed logical clocks spread across multiple nodes.

For a distributed logical clock to function, we expect each of the participating nodes to have its own clock that cooperates with clocks on other nodes in order to produce the next timestamp. When dealing with distributed logical clocks, we are primarily interested in the time of occurrence of system events on any node so that we could order such events across nodes according to their timestamps.

Events cannot obviously consider the effects of future events. However, because events may occur faster than the notification of such events between nodes, events may be unaware of some past events. Events that occur without the knowledge of each other are called concurrent events. Events, at best, can be ordered according to their originating timestamps in real-time only when there are no concurrent events. That is, none of the logical clocks discussed in this article help with ordering events in real-time in the presence of concurrent events. That said, not all clocks can order events even in the absence of concurrent events. This article discusses why such clocks are still useful.

The article will also demonstrate that in non-real-time (i.e., after the fact), we can arrange events in a total order using some clocks. Total order means if each of the nodes have a collection of events, then all nodes can individually arrive at the same order of events. Total order respects happened-before (and causality) relationships. The Event Ordering section at the end of this article discusses total order in detail.

Clock Designs

The various logical clock designs we will study in this article essentially implement a variation of the following solution:

Each timestamp produced by a logical clock consists of two components:

an id for the current event, and
the ids of some or all events the node is aware of thus far (aka history).

Tracking all historical events in a timestamp is space consuming. Richness in the timestamp therefore needs to be purchased with space or time complexity. And different designs make different choices in this area. However, a common compaction scheme used by different clock designs is to track history with the use of numbers for event ids. Under this compaction scheme, an event with timestamp [ 5 ] not only represents that the event id is the number 5, but that the node generating that event has knowledge of the existence of (some) event [ 4 ] (because there could be multiple [ 4 ] events) and possibly all events prior to and equal to [ 4 ], depending on the clock.

Note that causality and history (i.e., happened-before events) are related concepts, but not identical. If the knowledge from event A is used to produce event B, then A caused B. If B merely happened before C, but event C did not use the knowledge from event B, then B and C are not causally related, just temporally so. Most applications, for simplicity, use temporal relationship as a proxy for causality.

Let us look at various logical clock designs.

Lamport Clock

Timestamps produced by a Lamport clock take the least amount of space, O(1) in terms of the number of nodes in the system, compared to other clock designs. A Lamport timestamp captures the event id and some history of events the node is aware of at the time the event is generated, all using a single unique number. When a node generated event id [ 5 ], the node claims to have knowledge of some event that is numbered [ 4 ] and no knowledge of any other event that is numbered [ 5 ] or above.

Lamport timestamps do not capture which node generated the event. Therefore, there might be events with the same timestamps from different nodes circulating in the system. Consequently, looking at the timestamps of two events, say [ 4 ] and [ 5 ], one cannot answer whether event [ 5 ] is aware of that particular event [ 4 ] or a different event [ 4 ]. Overall, given that event [ 4 ] has knowledge of some event [ 3 ], and [ 3 ] of some [ 2 ], and so on, it is understood that event [ 5 ] has knowledge of the existence of some events [ 4 ] through [ 1 ].

A Lamport clock can be implemented as follows:

Timestamps are sequential numbers associated with events.
Each node maintains its own sequence starting with number 0.
When an event is generated, the node increments its number by one and associates that number with the event.
When a node learns an event from another node, the node ensures its number to be the highest of its own number and the event number it learned from the other node.
- In this article, we consider a learning event as any other event, and have the node increment its number by one.
- Some applications do not timestamp learning events.

Overall, Lamport timestamps are strictly increasing numbers in any given node, and at least monotonically increasing across nodes. See Figure 1 for an illustration. Arrows are pointed at effects by causes.

Why are Lamport timestamps useful? Lamport timestamps can be used to arrange events in a historical (i.e., happened-before) order after the fact. Specifically,

Events across nodes can be ordered in a way that also honors local order: Because each timestamp produced by a node is a strictly increasing number, when events are ordered based on timestamps, all local events of a node will retain their order of generation even when events from other nodes are in the mix.
Events can be arranged based on history (aka happened-before): While there are duplicate event timestamps across nodes, when events are arranged in an increasing order based on timestamps, an event with the knowledge of another event will always follow it in the order (because the event id reflects a higher number than the ids of the events it is aware of).
Events can be causal ordered: Causes are by definition known events; therefore, causal ordering follows from 2 above.

Lamport clocks have three deficiencies:

A total order of events is not deterministically possible with Lamport timestamps. Events have duplicate timestamps, e.g., multiple events with id [ 4 ], and those events cannot be ordered in any deterministic way.
While events can be arranged in a way that honors historical and causal ordering, you can only do so after the fact, but not in real-time. A node may know about or have generated event [ 5 ] first and then learned about event [ 4 ] some time later. As a result, if order preservation while processing non-concurrent events in real-time is important, Lamport clock is not adequate.
Relatedly, it is not evident from Lamport timestamps if an event definitely occurred after another event or if the two events are concurrent. For example, in Figure 1, it is not clear if event [ 5 ] is an effect of event [ 4 ] produced on node B or event [ 4 ] produced on node C.

Lamport Origin Clock

A variant of Lamport Clock, hereafter referred to as Lamport Origin Clock, can produce a timestamp that is a doublet consisting of [node id, Lamport timestamp]. Lamport origin timestamp can be used to arrange events after the fact in a predictable order using originating node id as the second sort property. This removes the first deficiency of Lamport clocks discussed above, but it does not remove the other deficiencies of not knowing the order in real-time even for non-concurrent events.

Figure 2 is an update of Figure 1 with origin information included in the timestamp.

Lamport origin timestamp can also be used as a unique identifier for events because the combination of node id and Lamport timestamp is unique across events from any node. The

Clocks and Causality – Ordering Events in Distributed Systems by alexzeitler

Clocks and Causality – Ordering Events in Distributed Systems by alexzeitler

Share This Article

Newsletter

Introduction

Clock Designs

Lamport Clock

Lamport Origin Clock

HackTech

Leave a comment Cancel reply

Editor's Choice

Clocks and Causality – Ordering Events in Distributed Systems by alexzeitler

Clocks and Causality – Ordering Events in Distributed Systems by alexzeitler

Share This Article

Newsletter

Introduction

Clock Designs

Lamport Clock

Lamport Origin Clock

HackTech

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter