[Simon Tatham, 2025-02-14]
- Introduction
- XOR in boolean logic
- Bitwise XOR on integers
- Some applications of XOR
- Mathematical concepts that look like XOR
- Footnotes
Introduction
Recently I was called on to explain the ‘XOR’ operator to
somebody who vaguely knew of its existence, but didn’t have a
good intuition for what it was useful for and how it
behaved.
For me, this was one of those ‘future shock’ moments when you
realise the world has moved on. When I got started in computers,
you had to do low-level bit twiddling to get anything very
interesting done, so you pretty much couldn’t avoid
learning about XOR. But these days, to a high-level programmer,
it’s much more of an optional thing, and you can perfectly well
not know much about it.
So I collected some thoughts together and gave a lecture on
XOR. Slightly to my own surprise, I was able to spend a full
hour talking about it – and then over the course of the next
couple of weeks I remembered several other things I could
usefully have mentioned.
And once I’d gone to the effort of collecting all those
thoughts into one place, it seemed a waste not to write it all
down somewhere more permanent. (The collecting is the hardest
part!)
So here’s a text version of my XOR talk – or rather, the talk
that it would have been if I’d remembered to include
everything the first time round.
XOR in boolean logic
We’ll start by looking at XOR as a boolean logic operator, or
equivalently a logic gate: a function that takes two individual
bits, and outputs a single bit.
What is it?
If I’m going right back to first principles, then I should
start by actually defining XOR.
Any boolean function of two inputs can be defined by showing
its truth table: for each of the four combinations of inputs,
say what the output is. So I’ll start by showing the truth table
for XOR.
a | b | a XOR b |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
b | |||
---|---|---|---|
0 | 1 | ||
a | 0 | 0 | 1 |
1 | 1 | 0 |
But just saying what it is doesn’t give a good
intuition for what things it’s useful for, or when to use it. So
in the next few sections I’ll present a few different ways
of thinking about what XOR does.
For such a simple operation, it’s possible to describe what it
does in a lot of quite different-looking ways. But all of them
are true at once! So you don’t need to keep thinking
about XOR as being just one of the following concepts. You can
suddenly switch from thinking of it one way to another, any time
that’s convenient. Nothing stops being true if you do.
“Exclusive OR”
The X in ‘XOR’ stands for the word “exclusive”. (Some assembly
language syntaxes abbreviate it as ‘EOR’ instead, on the grounds
that “exclusive” doesn’t actually begin with an X.
But most people prefer the X spelling, in my
experience.)
In the normal English language, the conjunction ‘or’ has an
ambiguity: if I say ‘you can do this or that’, and in
fact someone tries to do both this and that, does that
count? It depends on the context. A parent telling a child “You
can have this dessert or that one” certainly means ‘but
not both’ – that’s the whole point. But on the other hand, if
the same parent wants the child to do something useful, and says
“Do your homework, or tidy your room!”, they’d probably be
extra pleased if the child did both.
So, when you want to be clear about which version of ‘or’ you
mean, you might say that you’re including the ‘both’
case, or excluding it.
In computing, ‘OR’ as a boolean operator is always taken to
mean the inclusive version: A or B or
both. And when you want A or B but not
both, you talk about exclusive OR.
Looking at the truth tables above, you can see that that’s
exactly what the XOR operation is doing:
- If a = 1, or if b = 1,
then a XOR b = 1. - But not both: if a
and b are both 1,
then a XOR b = 0.
The ‘not equals’ operator
Another way to look at the same truth table
is: a XOR b = 1 whenever the inputs a
and b are different from each other. If they’re
the same (either both 0 or both 1),
then a XOR b = 0.
So you could look at a XOR b as meaning the same
thing as a ≠ b: it’s a way to compare two
boolean values with each other.
Conditional inversion
A third way to look at the same truth table is to consider each
value of one of the inputs a, and look at what XOR does
to the other variable b in each case:
- If a = 0, then a XOR b is the same thing as just b.
- If a = 1, then a XOR b is the opposite of b: 0 becomes 1 and 1 becomes 0.
So another way to look at the XOR operation is that you’re
either going to leave b alone, or invert it (swapping 0
and 1), and a tells you which one to do. If you imagine
XOR as a tiny computing device, you could think of the
input b as ‘data’ that the device is operating on,
and a as a ‘control’ input that tells the device what to
do with the data, with the possible choices being ‘flip’ or
‘don’t flip’.
a XOR b means: invert b, but only
if a is true.
But the same is also true the other way round,
because a XOR b is the same thing
as b XOR a. You can swap your point of view to
thinking of a as the ‘data’ input and b as
‘control’, and nothing changes – the operation is the same
either way round.
That is, a XOR b also
means: invert a, but only if b is
true!
Parity, or sum mod 2
Here’s yet another way to look at the XOR
operation. a XOR b tells you whether an odd
number of the inputs are true:
- If a = b = 0, then zero of the inputs
are true. 0 is even, and a XOR b = 0. - If a = 1 and b = 0, or the other way round,
then one of the inputs is true. 1 is odd,
and a XOR b = 1. - If a = b = 1, then two of the inputs
are true. 2 is even, and a XOR b = 0 again.
Put another way: if you add together the two inputs,
and then reduce the result modulo 2 (that is, divide by 2 and
take the remainder), you get the same answer
as a XOR b.
In particular, if you XOR together more than two
values, the overall result will tell you whether an odd or even
number of the inputs were 1 – no matter how many inputs you
combine.
Difference mod 2
So XOR corresponds to addition mod 2. But it also corresponds
to subtraction mod 2, at the same time: if you take the
difference a − b and reduce that mod 2,
you still get the same answer a XOR b!
- If a = b, then the
difference a − b is 0, and so
is a XOR b. - If a ≠ b, then the
difference a − b is either +1 or −1. Mod 2,
those are the same as each other – they’re both just 1.
What properties does it have?
If you have a complicated boolean expression involving lots of
XORs, it’s useful to know how you can manipulate the expression
to simplify it.
To begin with, XOR is both commutative
and associative, which mean (respectively)
that
a XOR b = b XOR a
(a XOR b) XOR c = a XOR (b XOR c)
In practice, what this means is that if you see a long list of
values or variables XORed together, you don’t need to
worry about what order they’re combined in, because it
doesn’t make any difference. You can rearrange the list of
inputs into any order you like, and choose any subset of them to
combine first, and the answer will be the same regardless.
(This is perhaps most easily seen by thinking of XOR as
‘addition mod 2’, because addition is also both commutative and
associative – whether or not it’s mod anything – so XOR inherits
both properties.)
Secondly, 0 acts as an identity, which means
that XORing anything with zero just gives you the same thing you
started with:
a XOR 0 = 0 XOR a = a
So if you have a long list of things XORed together, and you
can see that one of them is 0, you can just delete it from the
list.
Thirdly, everything is
self-inverse: if you XOR any value with itself,
you always get zero.
a XOR a = 0
One consequence of this is that if you have a long list of
variables being XORed together, and the same variable occurs
twice in the list, you can just remove both copies. Anything
appearing twice cancels out. For example, the two copies
of b can be removed in this expression:
a XOR b XOR c XOR b = a XOR c
Another consequence is that if you have a
value a XOR b, and you want to recover
just a, you can do it by XORing in another copy
of b (if you know it), to cancel the one that’s already
in the expression. Putting something in a second time is the
same as removing it:
(a XOR b) XOR b = a XOR (b XOR b) = a XOR 0 = a
Bitwise XOR on integers
Now we’ve seen what XOR does on individual bits, it’s time to
apply it to something larger.
To a computer it’s most natural to represent an integer in
binary, so that it looks like a string of 0s and 1s. So if you
have two integers, you can combine them bitwise (that
is, ‘treating each bit independently’) using a logical operation
like XOR: you write the numbers in binary one above the
other, and set each output bit to the result of
combining the corresponding bits of both input numbers via
XOR.
Here are a couple of examples, one small and carefully chosen,
and the other large and random-looking:
Binary | Hex | Decimal | |
---|---|---|---|
a | 1010 |
A |
10 |
b | 1100 |
C |
12 |
a XOR b | 0110 |
6 |
6 |
Binary | Hex | Decimal | |
---|---|---|---|
a | 11001111001000011111000101111011 |
CF21F17B |
3475108219 |
b | 10011011010000100100111001011101 |
9B424E5D |
2604813917 |
a XOR b | 01010100011000111011111100100110 |
5463BF26 |
1415823142 |
The small example contains one bit for each element of the XOR
truth table. If you look at vertically aligned bits in the
binary column of the table, you’ll see that in the rightmost
place both a and b have a 0 bit; in the leftmost
place they both have a 1 bit; in between, there’s a position
where only a has a 1, and one where only b has a
1. And in each position, the output bit is the boolean XOR of
the two input bits.
Of course, this idea of ‘bitwise’ logical operations – taking
an operation that accepts a small number of input bits, and
applying it one bit at a time to a whole binary integer – is not
limited to XOR. Bitwise AND and bitwise OR are also well defined
operations, and both useful. But this particular article is
about XOR, so I’ll stick to that one.
Still has all the same properties
Earlier I discussed a number of
mathematical laws obeyed by the one-bit version of XOR: it’s
commutative, associative, has 0 as an identity, and every
possible input is self-inverse in the sense that XORing it with
itself gives 0.
In the bitwise version applied to integers, all of these things
are still true:
- a XOR b = b XOR a for any integers a and b
- (a XOR b) XOR c = a XOR (b XOR c) similarly
- a XOR 0 = 0 XOR a = a, where 0 means the integer zero, i.e. with all its binary bits 0
- a XOR a = 0
So if you have a complicated expression containing bitwise
XORs, you can simplify it in all the same ways you could do it
with single-bit XORs.
Bitwise difference
With individual bits, I said earlier
that a XOR b means the same thing
as a ≠ b: given two inputs bits, it returns 1 if
the bits are different from each other, and 0 if they’re the
same.
Of course, applied bitwise to whole integers, this is true
separately in every bit position: each bit of the output integer
tells you whether the two corresponding input bits were unequal.
So if a = b, then a XOR b = 0. And
if a ≠ b, then a XOR b ≠ 0, because
two unequal integers must have disagreed in at least one bit
position, so that bit will be set in their XOR.
So in some sense you can still use bitwise XOR to tell you
whether two entire integers are equal: it’s 0 if they are, and
nonzero if they’re not. But bitwise XOR gives you more detail
than that: it also gives you a specific list of which bit
positions they differ in.
Bitwise conditional inverter
I also said earlier that you could see
XOR as a ‘conditional inversion’ operator: imagine one
input b to be ‘data’, and the other input a to be
a ‘control’ input that says whether you want to flip the data
bit.
Applied bitwise to whole integers, this is still true, but more
usefully so: the control input a says which data
bits you want to flip.
For example, in the Unicode character encoding (and also in its
ancestor ASCII), every character you might need to store in a
string is represented by an integer. The choice of integer
encodings has some logic to it (at least in places). In
particular, the upper-case Latin letters A, B, C, …, Z and their
lower-case equivalents a, b, c, …, z have encodings that differ
in just one bit: A is 65, or binary 01000001, and a is 97, or
binary 01100001. So if you know that a character value
represents a Latin letter, you can swap it from upper case
to lower case or vice versa by flipping just that one
bit between 0 and 1.
And you can do that most easily, in machine code or any other
programming language, by XORing it with a number that has just
that one bit set, namely binary 00100000, or decimal 32. For
example, in Python (where the ^
operator means
bitwise XOR):
>>> chr(ord('A') ^ 32) 'a' >>> chr(ord('x') ^ 32) 'X'
Of course, the other thing I said earlier also applies: it
doesn’t matter which of the inputs to XOR you regard as
the data, and which as the control. The operation is the same
both ways round.
In particular, suppose you’ve already XORed two values a
and b to obtain a number d that tells you which
bits they differ in. Then you can turn either of a
or b into the other one, by XORing with the
difference d, because that flips exactly the set of bits
where a has the opposite value to b.
In other words, these three statements are
all equivalent – if any one of them is true, then so
are the other two:
a XOR b = d
a XOR d = b
b XOR d = a
Addition or subtraction without carrying
Another thing I mentioned about XOR on single bits is that it
corresponds to addition mod 2, or
equivalently, subtraction mod 2.
That isn’t quite how it works once you do it bitwise on whole
words. Each individual pair of bits is added mod
2, but each of those additions is independent.
One way to look at this is: suppose you were a schoolchild just
learning addition, and you’d simply forgotten that it was
necessary to carry an overflowed sum between digit positions at
all. So for every column of the sum, you’d add up the input
digits in that column, write down the low-order digit of
whatever you got, and ignore the carry completely.
If this imaginary schoolchild were to do this procedure in
binary rather than decimal, the operation they’d be performing
is exactly bitwise XOR! Bitwise XOR is like binary
addition, without any carrying.
Later on I’ll show an interesting
consequence of this, by considering what does happen to
the carry bits, and how you can put them back in again
afterwards. There’s also a mathematically
meaningful and useful interpretation of the corresponding
‘carryless’ version of multiplication, in which you
make a shifted version of one of the inputs for each 1 bit in
the other, and then add them together in this carryless XOR
style instead of via normal addition.
Some applications of XOR
Now we’ve seen what XOR is, and a few different things
it’s like, let’s look at some things it’s used for.
Cryptography: combine a plaintext with a keystream
XOR is used all over cryptography, in many different ways. I
won’t go into the details of all of those ways – I’m
sure I don’t even know all of them myself! – but I’ll show a
simple and commonly used one.
Suppose you have a message to encrypt. One really simple thing
you could do would be to take an equally long stream of
random-looking data – usually known as a ‘keystream’ – and
combine the two streams, a byte or a word at a time, in some way
that the receiver can undo. So if your message was “hello”, for
example, you might simply transmit a byte made by combining “h”
with keystream byte #1, then one that’s “e” combined with
keystream byte #2, and so on.
This seems absurdly simple – surely real cryptography is
terrifyingly complicated compared to this? But it really is a
thing that happens. As long as each byte is combined with the
keystream byte in a way that makes it possible to recover the
original byte at the other end, this is a completely sensible
way to encrypt a message!
That’s not to say that there isn’t terrifyingly complicated
stuff somewhere in the setup. But in this particular
context, the complicated part is in how you generate
your stream of random-looking bytes; the final step where you
combine it with the message is the easy part. One option is for
your keystream to be a huge amount of genuinely random data, as
big as the total size of all messages you’ll ever need to send;
this is known as a ‘one-time pad’, and is famously actually
unbreakable – but also amazingly impractical for almost all
purposes. More usually you use a stream cipher, or a block
cipher run in a counter mode, to expand a manageably small
actual key into a keystream as long as you need.
Anyway. I’ve so far left out the detail of exactly how you
“combine” the keystream with the message. But given the subject
of this article, you can probably guess that it’s going to be
XOR.
XOR isn’t the only thing that would work. If your
message and your keystream are each made of bytes, then there
are plenty of other ways to combine two bytes that let you
recover one of them later. For example, addition mod
28 would be fine: you could make each encrypted byte
by adding the message byte and keystream byte, and then
the receiver (who has access to exactly the same keystream)
would subtract the keystream byte from the sum to recover the
message byte.
But in practice, XOR is generally what’s used, because it’s
simpler. Not so much for software – CPUs generally have an ‘ADD’
instruction and an ‘XOR’ instruction which are just as fast and
easy to use as each other – but encryption is also often done in
hardware, by specially made circuitry. And if you’re building
hardware, addition is more complicated than XOR, because you
have to worry about carrying between bit positions, which costs
more space on the chip (extra logic gates) and also extra time
(propagating signals from one end to the other of the addition).
XOR avoids both problems: in custom hardware, it’s far cheaper.
(It’s also very slightly more convenient that the
sender and receiver don’t have to do different
operations. With XOR, the sender who applies the keystream and
the receiver who takes it off again are both doing exactly the
same thing, instead of one adding and the other
subtracting.)
Pixel graphics: draw things so it’s easy to undraw
them again
Let’s set the wayback machine to the mid-1980s, and go back in
time to when computers were smaller and simpler (at least, the
kind you could afford to have in your home). Home computer
graphics systems stored a very small number of bits for each
pixel on the screen, meaning that you could only display a
limited number of different colours at a time; and even with
that limitation on framebuffer size, home computers had so
little RAM in total that it was a struggle to store two entire
copies of what was displayed on the screen and still have enough
memory for anything else (like whatever program
was generating that screenful of graphics).
In an environment limited like that, what do you do if you want
to draw an object that appears and disappears, or that moves
around gradually?
If your moving object is opaque, then every time you
‘undraw’ it, you have to restore whatever was supposed to be on
the screen behind it. That means you have to remember
what was behind it – either by storing the actual pixels, or by
storing some recipe that knows how to recreate the missing patch
of graphics from scratch. Either one costs memory, and the
second option probably costs time as well, and you don’t have a
lot of either to spare.
Nevertheless, you do that when you really have to. But when
you don’t have to, it’s always helpful to take
shortcuts.
So one common graphical dodge was: don’t make graphical
objects opaque if you don’t have to. Any time you can get
away with it, prefer to draw a thing on the screen in a way that
is reversible: combine each pixel M of the
moving object with the existing screen pixel S, in such a
way that you can undo the operation later, recovering the
original screen pixel S from the combined
value C by remembering what M was.
As in the previous section, there are plenty of ways to do this
in principle. You could imagine treating the bits of each pixel
as an n-bit integer for some small n, and doing
addition and subtraction on them (again, mod
2n). For example, you could draw by addition,
setting C = S + M, and undraw by
subtraction to recover S = C − M.
But these systems typically had far fewer than 8 bits per
pixel, so each byte of the screen would have more than one pixel
packed into it. Or, worse, the screen might be laid out in ‘bit
planes’, with several widely separated blocks of memory each
containing one bit of every pixel. Doing ordinary addition on
the pixel values is very awkward in both cases.
In particular, consider the first of those possibilities, where
you have several pixels packed into a byte. Suppose the 8 bits
of the byte are treated as two 4-bit integers, or four 2-bit
integers. In order to do parallel addition of each of
those small chunks, you can’t use the normal CPU’s addition
instruction, because a carry off the top of one pixel value
propagates into the next, causing unwanted graphical effects. So
you’d somehow need to arrange to suppress the
carry between pixels, but leave the
carry within a pixel alone.
So it’s much easier not to try this in the first place.
Combining the pixel values via XOR instead of addition means
that you automatically avoid carries between pixels, because
there are no carries at all. This is also more
convenient in the ‘bit plane’ memory layout, because each bit of
a pixel is treated independently; if you tried to do ordinary
addition in that situation, you’d have to make extra effort to
propagate carries between corresponding bits in each bit
plane.
Here’s a simple example, in which two lines have been drawn
crossing each other. You can see that in the second diagram,
drawn using XOR, there’s a missing pixel at the point where the
two lines cross. That pixel is part of both lines, so
it’s been drawn twice. Each time it was drawn, the pixel value
flipped between black and white, so drawing it twice sets it
back to the background colour.
That missing pixel looks like a tiny blemish. The first version
of the picture looks nicer. But it makes it harder to
undraw one of those lines later. If you just undraw it by
setting all the pixels of one line back to white, then you leave
a hole in the remaining line. And if you don’t want to leave a
hole, then you need some method of remembering which
pixel not to reset to white.
In the XOR version of the picture, it’s the easiest thing in
the world to undraw one line and leave the other one undamaged.
Simply draw the line again that you want to
undraw. Putting a second copy of something into an XOR
combination is the same as removing it. This kind of
graphical ‘blemish’ was considered a reasonable price to pay, in
situations where efficient undrawing and redrawing was
needed.
Most types of 1980s home computers had an option to draw in XOR
mode, for this kind of reason. It was one of the easiest ways
for a beginning computer programmer to draw pretty animations. A
common type of demo was to simulate two points moving around the
screen along a pair of independent paths, and draw a line
between the two points at each moment – but to allow the
previous few versions of the line to persist as well as the
latest one, undrawing each one after it had been on the screen
for a few frames.
Here’s an example of the kind of thing. In this example each
end of the line is following a path given by a different
Lissajous
curve:

Because we’re drawing the lines in XOR mode, the procedure for
turning each frame of this animation into the next involves
drawing only two lines: the new one that we’re adding at the
leading edge of the motion, and the one that’s about to vanish
from the trailing edge. So the program only has to remember the
endpoint coordinates of the 10 (or however many) lines it
currently has on screen – or perhaps not even bother with that,
and just recalculate the (n − 10)th coordinate in each
frame to work out what line to undraw.
So this lets you produce pretty and interesting results, with
almost no memory cost (you don’t have to store extra copies of
lots of pixels). It uses very little CPU as well (you don’t have
to redraw all the lines in every frame, only the ones
you’re adding and removing), which makes the animation able to
run faster.
This style of animation was a common thing for beginning
programmers to play with, but it was also used by professionals.
Perhaps most famously, a moving line drawn in this style acted
as the main antagonist of the 1981 video
game Qix.
Even in the 1990s, with slightly larger computers, XOR-based
drawing and undrawing was still a useful thing: in early GUI
environments like the Amiga, when you dragged a window around
the screen with the mouse, the system didn’t have enough CPU to
redraw the entire window as you went. Instead it would draw an
outline frame for where the window was going to end up, and once
you’d put it in the right place, you’d let go of the mouse
button and the window would be properly redrawn in the new
location, just once. And again the outline frame would be drawn
using XOR, so that it didn’t cost extra memory to remember how
to undraw it on each mouse movement.
But going back to the above animation: if you watch it
carefully, you might notice that when the moving endpoints
change direction and a lot of the lines cross each other, the
missing pixels at the crossing points give rise to a sort of
interference effect. Those missing pixels started off as a
necessity to save memory, but it turns out they’re also quite
pretty in their own right.
Another favourite kind of 1980s graphical program for beginners
was to explore those interference patterns more fully. Here’s an
image created by drawing a sequence of lines, all with one
endpoint in the bottom left corner of the frame, and with the
other endpoints being every single pixel along the top row. In
ordinary drawing mode this would just be a slow way to fill in a
big black triangle. But in XOR mode, the lines interfere with
each other wherever more than one of them passes through the
same pixel:
