For reasons I won’t get into, I’ve been working on a tricky reverse engineering puzzle recently: how to patch the operating system of a 26-year-old synthesizer. To be specific, the Kurzweil K2500, a sample-based synthesizer released in 1996.
As with many digital musical instruments, this synthesizer is really just a computer with some extra chips. In this case, it’s a computer based around the CPU that was popular at the time: the Motorola 68000, which was also famously used in the original Macintosh and the Sega Genesis. I want to patch the operating system of this beast to do all sorts of other things, most of which which I’ll leave to the imagination in this already-very-long post.
Finding the Operating System #
Modifying the operating system sounds great, but how do we get access to the code in the first place? Luckily, the K2500 operating systems are still provided by the manufacturer on what looks like an old FTP site. Downloading and unzipping the operating system gives us a .KOS
file, which seems to be a custom format. Opening the file in Hex Fiend shows its bytes directly:
Unfortunately, nothing stands out here. There seems to be a human-readable 4-byte header at the top: SYS0
, possibly followed by other header bytes, but it’s really hard to tell. Regardless, we already know that this operating system runs on a Motorola 68000 CPU. Let’s just try interpreting the data as a binary, and see how far we can get.
Enter Ghidra #
The operating system file we’re using is probably raw machine code: literally the instructions and data interpreted by the CPU itself. To make any sense of this whatsoever, we’re going to need to disassemble it, to turn it back into assembly code – and hopefully eventually decompile it back into C-style code.
To do this, let’s use a tool called Ghidra: an open-source reverse-engineering program built, maintained, and released by the United States National Security Agency. (Yes, that one. Really.) To start, let’s import the .KOS
file directly into Ghidra and analyze it with the default settings, which will search for instructions.
Scrolling through the file shows that parts of the data have been analyzed by Ghidra as valid 68k instructions, but much of the file remains unanalyzed. Strangely, scrolling further through the files shows that Ghidra has correctly identified a number of human-readable strings in the file (great!) but the code seems to be referring to the strings offset by some amount, showing up as cut-off strings in Ghidra.
This is because we just loaded the entire .KOS
file into Ghidra, ignoring the fact that it has a header and likely some other extra bytes. This is a pretty big problem. Any cross-references between functions will be inaccurate as we continue to reverse-engineer the data, sending us in the wrong direction nearly every time we try to follow a reference. We need to fix this first.
Reverse Engineering the Bootloader #
To reverse-engineer the .KOS
file, it would be extremely useful to dig into the code that creates or consumes these files. We don’t have the creation code, but we do have access to the code that consumes these files: the bootloader for the synth itself, which is also still available online! Let’s load it into Ghidra and make an assumption to make our lives easier: let’s guess that the first 8 bytes of the file are part of a header.
Where did that number come from? Well, I tried 0, +4, +8, +12, +16, and +20 byte offsets, and +8 disassembled the most correctly. Yes, this took a while. In hindsight, all of this also happens to work because the code in the file gets loaded into address
0x0
in memory. If it was loaded somewhere else, we’d have to figure out what that location is before we could effectively disassemble the code.
Just like before, let’s look for something human-readable first. Searching through the strings brings up a couple error strings that seem like they might get thrown by the code we care about:
Ghidra has identified what it calls XREFs here – cross-references, indicating that these strings are called from a certain place. Let’s follow this reference:
Aha! Now we’re getting somewhere. This looks an awful lot like a switch statement, decompiled by Ghidra here as an if
tree. It seems like there are a series of error codes (0x100
through 0x105
, then 0x200
, 0x201
, etc.) that each correspond with an error string that presumably gets printed on the screen. Let’s keep pulling on this thread. Using Ghidra’s “Find References” function, we end up at this function:
We’re getting closer! Ghidra’s done something great for us here: the decompiled code includes some variable names, automatically determined based on the strings that those variables point to. Given that we