I wrote a peephole optimizer (for QBE) by ingve

Share This Article

Sed ut perspiciatis unde.

Send to HN

academic, developer, with an eye towards a brighter techno-social life

[prev]

[next]

2022-03-30
I wrote a peephole optimizer (for QBE), part 1

All source code for this blog post can be found here.

One of the things we did not tackle when writing our PL/0 compiler was an optimizer. Our code generator output C and then we let our C compiler handle the rest. If we wanted optimizations, we could turn them on in the C compiler. That’s great but it didn’t help us learn anything about optimizations.

I test a lot of C compilers when testing oksh. One of those compilers is called C11 compiler that compiles C into QBE and then has QBE transform its intermediate language into assembly, which then follows the usual pipeline of being run through the system assembler and ultimately the system linker.

We saw QBE before when we wrote our brainfuck compiler in QBE IL. When observing how QBE transforms its IL into assembly, we noticed that it missed a chance at optimization in the case where you moved a zero into a register. The generated assembly used a mov instruction to move an immediate of zero into the register. But the fastest and smallest way to put a zero into a register (at least, on amd64 machines) is to xor the register with itself.

I noticed the lastest cproc, which uses the latest QBE, still fails to make this optimization. At one point in time, the author of cproc sent me a diff for QBE that did make the optimization but I guess it was never applied to the QBE repository. As the savings for this optimization really can add up over a large project, and since it gave us an opportunity to look at peephole optimizers, I decided to write a small program that could be run in the cproc pipeline in between QBE and the system assembler to fix up this optimization. Maybe we could find more optimizations as well.

A peephole optimizer

The simplest type of optimizer that we can write without having to dive into the vast literature on compiler optimizations is a peephole optimizer. In this technique, we read in some number of lines of assembly and perform basic pattern matching on those lines looking for instructions that we know can be replaced with better ones (or entirely eliminated). “Better” in this case brings up the first interesting question about optimization: better can be contextual based on your goals. Do you want instructions that take less time to execute? Do you want instructions that require fewer bytes to encode, making for a smaller program? Someting else? What is faster may not be smaller. What is smaller may not be faster. For us with our first optimization, we are lucky that the optimized version is both faster and smaller.

The peephole, or window, for our first optimization is just one line. There are some other peephole optimizations that can be performed with a peephole of just one line: removing useless instructions, such as moving a register into itself or adding zero to a register, immediately come to mind.

After we’ve completed the mov to xor optimization, we’ll tackle an additional optimization that requires a larger peephole to enable.

We will name our peephole optimizer O in honor of the C compiler flag (-O) that is commonly used to tell the C compiler to turn on its optimizer.

Choosing a language

I am going to write O in C. Not because C is necessarily the best language for this task; it probably isn’t. But because if we write O in C, then we can run cproc on O itself and determine the size savings on itself. I am also going to rely on direct knowledge of cproc to make our lives easier when figuring out what strings to match on. There is also a little bit of argc and argv handling with that knowledge so that O can be plugged into cproc when we are finished.

With all of that said, our main function looks like this:

static int
usage(void)
{

	(void) fputs("usage: O in.s [-o out.s]n", stderr);

	return 1;
}

int
main(int argc, char *argv[])
{
	FILE *fp;

	if (argc == 4) {
		if (strcmp(argv[2], "-o") != 0)
			return usage();

		if (freopen(argv[3], "w+", stdout) == NULL) {
			(void) fprintf(stderr, "O: error: couldn't open %sn",
			    argv[3]);
		}
	} else if (argc != 2) {
		return usage();
	}

	if (!strcmp(argv[1], "-")) {
		O(stdin);

		return 0;
	}

	if ((fp = fopen(argv[1], "r")) == NULL) {
		(void) fprintf(stderr, "O: error: couldn't open %sn",
		    argv[1]);

		return 1;
	}

	O(fp);

	(void) fclose(fp);

	return 0;
}

Because of how cproc works, when the final stage of the compilation is to output assembly (-S), cproc automatically appends -o [file.s] so O needs to support that. If cproc is going to write an object file or binary, then O will write to stdout and the assembler will read the output in via a pipe. Additionally, input from cproc is always stdin, demarcated as -, so we need to support that too.

To avoid too much complexity, if we do have to output to a file, we can freopen(3) that file to stdout. Most of the time, we will be outputting to stdout so it makes sense to make outputting to a file the special case.

Reading in one line at a time

We can use the getline(3) function to read in one line at a time into a buffer. While the getline(3) function is “new” with POSIX.1-2008, I think all modern Unix systems have the function. If your system does not, you can get a copy of getline(3) here.

Reading in one line at a time might look something like this:

static void
O(FILE *fp)
{
	char *line = NULL;
	size_t size = 0;

	while (getline(&line, &size, fp) != -1)
		(void) fputs(line, stdout);

	free(line);
}

Yes, you really do need the call to free(3) at the end there.

This would read in one line at a time and then immediately print it out. Of course, we need to perform a check to match for potential lines to replace, so let’s improve this a bit:

static void
one(const char *line)
{

	if (xorq(line))
		return;

	if (xorl(line))
		return;

	(void) fputs(line, stdout);
}

static void
O(FILE *fp)
{
	char *line = NULL;
	size_t size = 0;

	while (getline(&line, &size, fp) != -1)
		one(line);

	free(line);
}

Now we are reading in one line at a time, and checking if it’s a movq line that can be replaced with an xorq line. If not, perhaps it is a movl line that can be replaced with an xorl line. If not, then it is not a line that can be optimized this way so we should just print it out as-is.

Find and replace

Because we know exactly what QBE is going to output, we can use that to our advantage. We don’t need to tokenize the output; we can simply check for exact strings. That saves us a lot of logic. A movq line that we want to match will always look like this:

	movq $0, %r__

Where __ reflects any of the 64-bit registers. Therefore, we can match exactly that and since that’s the only form this line can be in, it is the only thing we need to check for. We want to replace that movq line with this:

	xorq %r__, %r__

If we were to write that as a function, it might look like this:

static int
xorq(const char *line)
{

	if (!strncmp("tmovq $0, %r", line, 12)) {
		(void) fprintf(stdout, "txorq %%r%c%c, %%r%c%cn", line[12],
		    line[13], line[12], line[13]);

		return 1;
	}

	return 0;
}

If you wanted to be paranoid, you could check that strlen(line) > 13 before calling fprintf.

We check to see if the first twelve characters of the line match a movq line that would be better if it was optimized into a xorq line. If it does, we instead write the equivalent xorq line. We are able to use the fact tha

I wrote a peephole optimizer (for QBE) by ingve

I wrote a peephole optimizer (for QBE) by ingve

Share This Article

Newsletter

[prev]

[next]

2022-03-30
I wrote a peephole optimizer (for QBE), part 1

A peephole optimizer

Choosing a language

Reading in one line at a time

Find and replace

HackTech

Leave a comment Cancel reply

Editor's Choice

I wrote a peephole optimizer (for QBE) by ingve

I wrote a peephole optimizer (for QBE) by ingve

Share This Article

Newsletter

[next]

2022-03-30I wrote a peephole optimizer (for QBE), part 1

A peephole optimizer

Choosing a language

Reading in one line at a time

Find and replace

Leave a comment Cancel reply

Editor's Choice

Sign Up to Our Newsletter

2022-03-30
I wrote a peephole optimizer (for QBE), part 1