We’re once again at the end of our internship season, and it’s my task
to provide a few highlights of what the dev interns accomplished while
they were here.
The program was big! We had 152 software engineering interns, drawn
from 58 schools across 19 different countries. And that’s not even
counting the 31 tech interns in areas like production engineering, IT
engineering, network engineering, and more.
The intern program is so big and diverse that it’s impossible to
faithfully summarize it with just a few projects. But, something is
better than nothing, so here are the projects I’ll discuss this time
around:
-
Rajeev Godse wrote a query language based on Linear Temporal Logic
for querying complex events out of system logs. -
Semyon Savkin worked on our AI Assistants team, building an
efficient tokenizer in OCaml. -
Sasha Hydrie added concurrency support to our tracing syntax, making
it suitable for use with our asynchronous programming frameworks.
Now let’s dive into the details!
Concord (which we’ve
talked
about
before) is a platform for building systems that let counterparties
connect to us to trade directly with or through us.
Concord’s architecture is built around a single, totally-ordered
stream of transactions. One nice thing about this approach is that
the transaction stream is a great debugging aid: an incredibly
detailed source-of-truth that you can dive into when you’re trying to
figure out why the system is misbehaving.
Unfortunately, the tooling we had for digging into this treasure trove
was a bit limited. All we really had was the ability to find and grab
individual messages. But sometimes you want more than that! You want
to search for sequences of events that match some specified criteria.
We did have some tooling for this. In particular, one of our
engineers had built a stream query system based on linear temporal
logic, or, LTL
for short. LTL is a well-studied logic that takes basic propositional
logic and adds to it two key operators: next
and until
.
Roughly, p next q
means that predicate p
holds currently, and that
predicate q
holds thereafter. And p until q
means that p
holds
currently, and will continue to hold until q
starts holding.
If it’s not obvious to you how to use these two operators to build
meaningful queries, well, join the club. It can be a bit of a puzzle
to figure out how to convert meaningful queries into the fairly
low-level logical statements that LTL is built on. To make matters
worse, the only syntax we had for writing these queries was a fairly
awkward s-expression based format. As a result, almost no one used
the LTL query engine.
That’s where Rajeev’s project came in. Rajeev’s goal was to build an
easier-to-use, SQL-like query language to act as a frontend to the
LTL query engine. The language wouldn’t be quite as expressive as
LTL, but it would be a lot easier to use.
We don’t really have space to go into detail on how the language
works, but here’s an example of a query for retrieving and printing
out retail orders paired with the first execution received by each
order:
FIND wholesaling.retail_order.new Retail_order
THEN FIRST wholesaling.route.fill.external Fill
WHERE .retail_order_id = Retail_order.retail_order_id;
PRINT
, Fill.time - Retail_order.time AS order_to_fill_time
, Retail_order.retail_order_id AS retail_order_id
, Retail_order.time AS arrival_time
;
He built the parser for this new language on top of
Angstrom, a
parser-combinator library for OCaml. It wasn’t too hard to get a
working parser; the biggest challenge was getting good error
messages. But after some careful wrestling with the system, Rajeev
was able to get it to track enough context to generate good error
messages in the cases that mattered.
In addition to getting the basic system in place, Rajeev had time to
add a few interesting temporal operators to the language, including:
-
LAST p BEFORE q
, which matches messagesM1
andM2
such that
M2
satisfi