nullprogram.com/blog/2022/05/14/
This article was discussed on Hacker News.
While considering concurrent queue design I came up with a generic,
lock-free queue that fits in a 32-bit integer. The queue is “generic” in
that a single implementation supports elements of any arbitrary type,
despite an implementation in C. It’s lock-free in that there is guaranteed
system-wide progress. It can store up to 32,767 elements at a time — more
than enough for message queues, which must always be bounded. I
will first present a single-consumer, single-producer queue, then expand
support to multiple consumers at a cost. Like my lightweight barrier,
I’m not presenting this as a packaged solution, but rather as a technique
you can apply when circumstances call.
How can the queue store so many elements when it’s just 32 bits? It only
handles the indexes of a circular buffer. The caller is responsible
for allocating and manipulating the queue’s storage, which, in the
single-consumer case, doesn’t require anything fancy. Synchronization is
managed by the queue.
Like a typical circular buffer, it has a head index and a tail index. The
head is the next element to be pushed, and the tail is the next element to
be popped. The queue storage must have a power-of-two length, but the
capacity is one less than the length. If the head and tail are equal then
the queue is empty. This “wastes” one element, which is why the capacity
is one less than the length of the storage. So already there are some
notable constraints imposed by this design, but I believe the main use
case for such a queue — a job queue for CPU-bound jobs — has no problem
with these constraints.
Since this is a concurrent queue it’s worth noting “ownership” of storage
elements. The consumer owns elements from the tail up to, but excluding,
the head. The producer owns everything else. Both pushing and popping
involve a “commit” step that transfers ownership of an element to the
other thread. No elements are accessed concurrently, which makes things
easy for either caller.
Queue usage
Pushing (to the front) and popping (from the back) are each a three-step
process:
- Obtain the element index
- Access that element
- Commit the operation
I’ll be using C11 atomics for my implementation, but it should be easy to
translate these into something else no matter the programming language. As
I mentioned, the queue fits in a 32-bit integer, and so it’s represented
by an _Atomic uint32_t
. Here’s the entire interface:
int queue_pop(_Atomic uint32_t *queue, int exp);
void queue_pop_commit(_Atomic uint32_t *queue);
int queue_push(_Atomic uint32_t *queue, int exp);
void queue_push_commit(_Atomic uint32_t *queue);
Both queue_pop
and queue_push
return -1 if the queue is empty/full.
To create a queue, initialize an atomic 32-bit integer to zero. Also
choose a size exponent and allocate some storage. Here’s a 63-element
queue of jobs:
#define EXP 6 // note; 2**6 == 64
struct job slots[1<<EXP];
_A
!--more-->