Can chess, with regular expressions?
Yes. Can chess, with regular expressions.
Over the holidays I decided it’s been too long since I did something
with entirely no purpose. So without further ado, I present to you …
Regex Chess: sequence of 84,688 regular expressions that,
when executed in order, will play a (valid; not entirely terrible) move given a chess board as input.
Here, I’ll show you.
Current executing regular expression will show here…
Specifically, this is the entirety of the program that is playing a move against you (no really, I’m not kidding, it really is this short):
let regex_list = [/* a very long list of regular expressions */]
let board = “rnbqkbnr / pppppppp / 8 / 8 / 8 / 8 / PPPPPPPP / RNBQKBNR w KQkq – 0 1”;
for (regex of regex_list) {
board = re.replace(regex.pattern, regex.target)
}
display(board)
By the end of this post you’ll (hopefully)
understand why this sequence of regular
expressions is possible,
and also what the specific regular expressions do.
(If you’re someone who subscribed to this blog in the last six months or so,
and have gotten accustom to me writing about “serious” and “important” things,
please treat this as your fair warning that this is MY WEBSITE and I MAKE THE RULES HERE
and so today you’re learning about RegexChess whether you like it or not.)
As always, code for this project is available on GitHub.
Getting Started: A Regular Expression CPU
So how are we going to get regular expressions to play chess?
Well, by making a regular expression computer, of course!
More specifically, we’re going to design a Branch-Free,
Conditional-Execution,
Single-Instruction Multiple-Data instruction set.
And then make a sequence of regular expressions that interpret these instructions.
(Kind of like a GPU instruction set. And a bit of ARM.
But a lot slower.)
And then from there we can program our new computer to play chess.
So let’s get started.
(Some people may say I have an unhealthy obsession with building
weird computers, c.f. my game of life computer
or my printf computer.
Those people are wrong,
and just have an unhealthy obsession with the banal and ordinary.)
Computer Design
Let me get started by explaining how I’m going to organize the data
that the computer is going to operate over.
Because we’re working with regular expressions,
the current state
of the computer is going to be represented as
a single string containing both the program “stack”, and all variables in the following format:
%%
#stack:
top item on stack
second item on stack
….
#variable1: value 1
#variable2: value 2
…
#variablek: value k
Each instruction
will either manipulate some variables on the stack,
or will read or write to a given variable. Let’s look at some basic instructions.
Basic Stack Operations
The Push
Instruction
Here’s the implementation of the push
command that adds a value to the top of the stack:
def push(const):
return [(r“(%%n#stack:n)”,
r“g<1>“+const+r“n”)]
You should read the return type of these functions as a list of tuples.
Each tuple represents a regex transformation to apply,
where the left element is the pattern to match, and the right element is the replacement.
As a brief regular expression refresher.
Each tuple in this list has two parts: the regular expression, and the replacement.
A regular expression will match a string if it can find a substring of whatever it’s being applied against (the state, in our case).
Most characters in a regular expression match themselves, but parentheses create
a “match group” that can be referenced later.
The second argument is the replacement string.
Again, most characters mean “Replace with this character”,
but special sequences like g<1> are back-references that refer to previously captured groups.
In this case, g<1> references the first captured group (anything matched within the first set of parentheses)—which in this case is the “%%n#stack:n” header.
So, as a result of this operation executing on the stack, we find the occurrence of
“%%n#stack:n” in the state,
and insert the constant value below that (to the top of the stack).
Let’s see this in practice. Say we start with an empty stack:
%%
#stack:
If we execute push("hello")
, the regular expression will:
- Match the pattern
%%n#stack:n
at the start of our state - Capture this header into group 1 (the parentheses in the pattern create this capture group)
- Replace it with the captured group (
g<1>
) followed by our constant “hello” and a newline
This gives us:
%%
#stack:
hello
If we then execute push("world")
, the same process repeats and we get:
%%
#stack:
world
hello
The regular expression always matches at the top of the stack area, so new items get pushed on top while preserving the existing stack contents below them.
The Pop
Instruction
The pop
instruction removes the top element from the stack:
def pop():
return [(r“(%%n#stack:n)([^n]*)n”,
r“1”)]
Here we start to see a few of the special operators that make regular expressions powerful.
The [^n] block means “match any character that isn’t a newline” and the * means “match zero or more of these.”
So taken together we’re looking for a line that starts with “%%n#stack:n”,
and then on the next line, zero or more characters that aren’t a newline (so, the entire line).
The replacement is just the first line, which has the effect of removing the second line,
popping the top of the stack.
Let’s see how this works in practice. Say we start with this stack:
%%
#stack:
world
hello
When we execute pop()
, the regular expression will:
- Match the pattern beginning with
%%n#stack:n
(captured in group 1) - Match any characters until the next newline (captured in group 2 – the “world”)
- Replace everything matched with just group 1 (the header), effectively removing the top element
This gives us:
%%
#stack:
hello
Each pop operation removes exactly one element from the top of the stack, preserving any remaining elements below it.
Variable <-> Stack Instructions
Variable Lookup
To load the contents of a variable onto the top of the stack:
def lookup(variable):
# Find the variable’s value and push it onto the stack
return [(r“(%%n#stack:)([^%]*n#”+variable+“: )([^n]*)n“,
r“1n323n”)]
This regular expression is a bit more complex than our previous ones. Let’s break down what it does:
- The
[^%]*
matches basically any character (% only appears at the start of the program)
and so lets us find any variable anywhere in the program. - The
[^n]*
matches the variable’s value by capturing everything until the end of the line - The replacement creates a copy of the value and places it on top of the stack
Let’s see how this works in practice. Say we start with this state:
%%
#stack:
#foo: hello
#bar: world
#baz: 42
If we execute lookup("bar")
, the regular expression will:
- Match the stack header in group 1
- Match everything up to and including “#bar: ” in group 2
- Match “world” in group 3
- Use these groups to reconstruct the state with the value copied to the top of the stack
And so running the replacement will result in the following state:
%%
#stack:
world
#foo: hello
#bar: world
#baz: 42
The lookup operation preserved the original variable and its value while also placing a copy of the value on top of the stack. This allows us to read variable values without modifying them.
Variable Assignment
Assigning to a variable presents an interesting challenge: we don’t know if the variable
already exists. We need to handle both cases: updating an existing variable or creating
a new one.
Let me show you the implementation and then I’ll walk you through it case by case.
def assign_pop(varname):
return [
(r“(%%)n#stack:n([^n]*)n” +
“([^%]*#” + varname + r“: )[^n]*”,
r“1`n#stack:n32”),
(r“(%%)([^`]n?#stack:n)([^n%]*)n([^%]*)”,
r“1`24#” + varname + r“: 3n”),
(r“%%`”,
r“%%”)
]
To begin let’s assume the variable already exists. That is, the stack starts
off looking like this, and assume we’re calling assign_pop(“bar”):
%%
#stack:
world
#foo: hello
#bar: something
#othervar: othervalue
When we run this regular expression list, we create the following capture groups:
%%
#stack:
world
#foo: hello
#bar: something
#othervar: othervalue
After the replacement operation, we get this output:
%%`
#stack:
#foo: hello
#bar: world
#othervar: othervalue
Now we proceed on to the next instruction, and we don’t match it because the second regex fails if there’s a back-tick after the program start %%. So nothing happens. And then finally, the third regex cleans things up for us.
Handling Non-Existent Variables:
Let’s consider what happens if the variable doesn’t already exist. Again, assume we’re calling assign_pop("bar")
:
%%
#stack:
world
#foo: hello
#othervar: othervalue
The first regex tries to match but fails because there is no “#bar” anywhere. So it doesn’t do anything. But now the second regex tries to match and succeeds. It creates the following capture groups:
%%
#stack:
world
#foo: hello
#othervar: othervalue
From here, we perform the rewrite and get the following output:
%%
#stack:
#foo: hello
#bar: world
#othervar: othervalue
And then the third regex is applied and does nothing.
There are lots of instructions that use this trick to make sure we don’t
apply things in the order we don’t want. For example, as an exercise for
yourself, try to understand how the “is equal” instruction works:
def eq():
return [
(r“(%%n#stack:n)([^n]*)n2n”,
r“1`Truen”),
(r“(%%n#stack:n)([^`][^n]*)n([^n]*)n”,
r“1Falsen”),
(Branch-Free) Conditionals
Programming languages, in order to be interesting, usually need to have some kind of
control flow. It’s very hard to write some program without ever having an if statement.
So let’s now show how we’re going to do this.
(And I hope you did your homework, because we’re going to use the
same conditional execution trick again!)
Here’s the implementation of a conditional instruction:
def cond(tag):
return [(r“%(%n#stack:nTrue)”,
r“%1`”),
(r“%(n#stack:nFalse)”,
tag+r“1`”),
(r“n(True|False)`n”,
“n“)]
Let’s walk through how this is going to work, starting with the case where the
top of the stack is False.
%%
#stack:
False
#variable: value
Initially, the first regex will fail to match, because the top element on the stack
isn’t True. So we go to the next regex, and see if it applies. This one does match,
and makes the corresponding match groups.
%%
#stack:
False
#variable: value
After we apply the replacement, we get the following stack.
%tag
#stack:
False`
#variable: value
(And finally, using the same cleanup trick, we’ll remove the used marker.)
Now see what happened here? The program no longer begins with %%
.
This means that EVERY instruction will fail to match, because they always make sure
that the program begins with %%. So nothing else will happen…. until we reactivate
it later with the following simple instruction:
def reactivate(tag):
return [(r“%”+tag+r“n([^%]*)”,
r“%%n1”)]
Let’s now return to the True case for the conditional. This is the easy case: we
basically don’t do anything at all. We replace the stack with True` on the second
regex, and then delete this line on the third. Easy.
Notice that this means our code is actually branch-free,
because every instruction is a conditional instruction. (Kind of like ARM’s predicated execution, where most instructions can be conditionally executed based on status flags rather than using explicit branch instructions.)
Loops (are impossible)
Because our program just consists of a sequence of regular expressions,
you can’t loop at all! That, technically, means we can’t actually perform
Turing Complete
But we can do any bounded computation by just unrolling any loops we may have.
And fortunately computing the next move in a chess position is a bounded computation, so we can do just that.