You might think of the Intel 386 processor (1985) as just an early processor in the x86
line, but the 386 was a critical turning point for modern computing in several ways.1
First, the 386 moved the x86 architecture to 32 bits, defining the dominant computing
architecture for the rest of the 20th century.
The 386 also established the overwhelming importance of x86, not just for Intel, but for the entire computer
industry.
Finally, the 386 ended IBM’s control over the PC market, turning Compaq into the architectural
leader.
In this blog post, I look at die photos of the Intel 386 processor and explain what they reveal
about the history of the processor, such as the move from the 1.5 µm process to the
1 µm process.
You might expect that Intel simply made the same 386 chip at a smaller scale, but there were
substantial changes to the chip’s layout, even some visible to the naked eye.2
I also look at why the 386 SL had over three times the transistors as the other 386 versions.3
The 80386 was a major advancement over the 286: it implemented a 32-bit architecture,
added more instructions, and supported 4-gigabyte segments.
The 386 is a complicated processor (by 1980s standards), with 285,000 transistors, ten times the number of the original 8086.4
The 386 has eight logical units
that are pipelined5 and operate mostly autonomously.6
The diagram below shows the internal structure of the 386.7
The 386 with the main functional blocks labeled. Click this image (or any other) for a larger version. I created this image using a die photo from Antoine Bercovici.
The heart of a processor is the datapath, the components that hold and process data.
In the 386, these components are in the lower left: the ALU (Arithmetic/Logic Unit), a barrel shifter to shift data, and the registers.
These components form regular rectangular blocks, 32 bits wide.
The datapath, along with the circuitry to the left that manages it, forms the Data Unit.
In the lower right is the microcode ROM, which breaks down machine instructions into
micro-instructions, the low-level steps of the instruction.
The microcode ROM, along with the microcode engine circuitry, forms the Control Unit.
The 386 has a complicated instruction format.
The Instruction Decode Unit breaks apart an instruction into its component parts
and generates a pointer to the microcode that implements the instruction.
The instruction queue holds three decoded instructions.
To improve performance, the Prefetch Unit reads instructions from memory before they are
needed, and stores them in the 16-byte prefetch queue.8
The 386 implements segmented memory and virtual memory, with access protection.9
The Memory Management Unit
consists of the Segment Unit and the Paging Unit:
the Segment Unit translates a logical address to a linear address, while the Paging Unit
translates the linear address to a physical address.
The segment descriptor cache and page cache (TLB) hold data about segments and pages;
the 386 has no on-chip instruction or data cache.10
The Bus Interface Unit in the upper right handles communication between the 386 and the external
memory and devices.
Silicon dies are often labeled with the initials of the designers. The 386 DX, however,
has an unusually large number of initials. In the image below, I have enlarged the tiny initials so they are visible.
I think the designers put their initials next to the unit they worked on, but
I haven’t been able to identify most of the names.11
The 386 die with the initials magnified.
The shrink from 1.5 µm to 1 µm
The original 386 was built on a process called CHMOS-III that had 1.5 µm features (specifically the gate channel length for a transistor).
Around 1987, Intel moved to an improved process called CHMOS-IV, with 1 µm features,
permitting a considerably smaller die for the 386.
However, shrinking the layout wasn’t a simple mechanical process. Instead, many changes were
made to the chip, as shown in the comparison diagram below.
Most visibly, the Instruction Decode Unit and the Protection Unit in the center-right are
horizontal in the smaller die, rather than vertical.
The standard-cell logic (discussed later) is considerably more dense, probably due to
improved layout algorithms.
The data path (left) was highly optimized in the original so it remained essentially unchanged, but smaller.
One complication is that the bond pads around the border needed to remain the same size so bond wires could be attached.
To fit the pads around the smaller die, many of the pads are staggered.
Because different parts of the die shrank differently, the blocks no longer fit together as compactly, creating wasted space at the bottom of the die.
For some reason, the numerous initials on the original 386 die were removed.
Finally, the new die was labeled 80C386I with a copyright date of 1985, 1987; it is unclear what “C” and “I” indicate.
Comparison of the 1.5 µm die and the 1 µm die at the same scale. Photos courtesy of Antoine Bercovici.
The change from 1.5 µm to 1 µm may not sound significant, but it reduced the die size by
60%.
This allowed more dies on a wafer, substantially dropping the manufacturing cost.12
The strategy of shrinking a processor to a new process before designing a new microarchitecture
for the process became Intel’s tick-tock strategy.
The 386 SX
In 1988, Intel introduced the 386 SX processor, the low-cost version of the 386,
with a 16-bit bus instead of a 32-bit bus.
(This is reminiscent of the 8088 processor with an 8-bit bus versus the 8086 processor
with a 16-bit bus.)
According to the 386 oral history,
the cost of the original 386 die decreased to the point where the chip’s package cost about as
much as the die.
By reducing the number of pins, the 386 SX could be put in a one-dollar plastic package
and sold for a considerably reduced price.
The SX allowed Intel to segment the market, moving low-end customers from the 286 to the 386 SX, while preserving the
higher sales price of the original 386, now called the DX.13
In 1988, Intel sold the 386 SX for $219, at least $100 less than the 386 DX.
A complete SX computer could be $1000 cheaper than a similar DX model.
For compatibility with older 16-bit peripherals, the original 386 was designed to support a mixture of 16-bit and 32-bit buses, dynamically
switching on a cycle-by-cycle basis if needed.
Because 16-bit support was built into the 386, the 386 SX didn’t require much design work.
(Unlike the 8088, which required a redesign of the 8086’s bus interface unit.)
The 386 SX was built at both 1.5 µm and 1 µm.
The diagram below compares the two sizes of the 386 SX die.
These photos may look identical to the 386 DX photos in the previous section,
but close examination shows a few differences.
Since the 386 SX uses fewer pins, it has fewer bond pads, eliminating the staggered pads of
the shrunk 386 DX.
There are a few differences at the bottom of the chip, with wiring in much of the 386 DX’s
wasted space.
Comparison of two dies for the 386 SX. Photos courtesy of Antoine Bercovici.
Comparing the two SX revisions,
the larger die is labeled “80P9”; Intel’s internal name for the chip was “P9”, using their
confusing series of P numbers.
The shrunk die is labeled “80386SX”, which makes more sense.
The larger die is copyright 1985, 1987, while the shrunk die (which should be newer) is copyright 1985 for some reason.
The larger die has mostly the same initials as the DX, with a few changes.
The shrunk die has about 21 sets of initials.
The 386 SL die
The 386 SL (1990) was a major extension to the 386, combining a 386 core and other functions on one chip to save power and space.
Named “SuperSet”, it was designed to corner the notebook PC market.14
The 386 SL chip included an ISA bus controller, power management logic, a cache controller
for an external cache, and the main memory controller.
Looking at the die photo below, the 386 core itself takes up about 1/4 of the SL’s die.
The 386 core is very close to the standard 386 DX, but there are a few visible differences.
Most visibly, the bond pads and pin drivers have been removed from the core.
There are also some circuitry changes. For instance, the 386 SL core supports the System Management
Mode, which suspends normal execution, allowing power management and other low-level hardware
tasks to be performed outside the regular operating system.
System Management Mode is now a standard part of the x86 line, but it was introduced in the 386 SL.
The 386 SL die with functional blocks labeled. Die photo courtesy of Antoine Bercovici.
In total, the 386 SL contains 855,000 transistors,15 over 3 times as many as the regular 386 DX.
The cache tag RAM takes up a lot of space and transistors.
The cache data itself is external; this on-chip circuitry just manages the cache.
The other new components are largely implemented with standard-cell logic (discussed