
Future Chips Will Be Hotter Than Ever by voxadam
For over 50 years now, egged on by the seeming inevitability of Moore’s Law, engineers have managed to double the number of transistors they can pack into the same area every two years. But while the industry was chasing logic density, an unwanted side effect became more prominent: heat.
In a system-on-chip (SoC) like today’s
CPUs and GPUs, temperature affects performance, power consumption, and energy efficiency. Over time, excessive heat can slow the propagation of critical signals in a processor and lead to a permanent degradation of a chip’s performance. It also causes transistors to leak more current and as a result waste power. In turn, the increased power consumption cripples the energy efficiency of the chip, as more and more energy is required to perform the exact same tasks.
The root of the problem lies with the end of another law:
Dennard scaling. This law states that as the linear dimensions of transistors shrink, voltage should decrease such that the total power consumption for a given area remains constant. Dennard scaling effectively ended in the mid-2000s at the point where any further reductions in voltage were not feasible without compromising the overall functionality of transistors. Consequently, while the density of logic circuits continued to grow, power density did as well, generating heat as a by-product.
As chips become increasingly compact and powerful, efficient heat dissipation will be crucial to maintaining their performance and longevity. To ensure this efficiency, we need a tool that can predict how new semiconductor technology—processes to make transistors, interconnects, and logic cells—changes the way heat is generated and removed. My research colleagues and I at
Imec have developed just that. Our simulation framework uses industry-standard and open-source electronic design automation (EDA) tools, augmented with our in-house tool set, to rapidly explore the interaction between semiconductor technology and the systems built with it.
The results so far are inescapable: The thermal challenge is growing with each new technology node, and we’ll need new solutions, including new ways of designing chips and systems, if there’s any hope that they’ll be able to handle the heat.
The Limits of Cooling
Traditionally, an SoC is cooled by blowing air over a heat sink attached to its package. Some data centers have begun using liquid instead because it can absorb more heat than gas. Liquid coolants—typically water or a water-based mixture—may work well enough for the latest generation of high-performance chips such as Nvidia’s new AI GPUs, which reportedly consume an astounding 1,000 watts. But neither fans nor liquid coolers will be a match for the smaller-node technologies coming down the pipeline.
Heat follows a complex path as it’s removed from a chip, but 95 percent of it exits through the heat sink. Imec
Take, for instance,
nanosheet transistors and complementary field-effect transistors (CFETs). Leading chip manufacturers are already shifting to nanosheet devices, which swap the fin in today’s fin field-effect transistors for a stack of horizontal sheets of semiconductor. CFETs take that architecture to the extreme, vertically stacking more sheets and dividing them into two devices, thus placing two transistors in about the same footprint as one. Experts expect the semiconductor industry to introduce CFETs in the 2030s.
In our work, we looked at an upcoming version of the nanosheet called A10 (referring to a node of 10 angstroms, or 1 nanometer) and a version of the CFET called A5, which Imec projects will appear two generations after the A10. Simulations of our test designs showed that the power density in the A5 node is 12 to 15 percent higher than in the A10 node. This increased density will, in turn, lead to a projected temperature rise of 9 °C for the same operating voltage.
Complementary field-effect transistors will stack nanosheet transistors atop each other, increasing density and temperature. To operate at the same temperature as nanosheet transistors (A10 node), CFETs (A5 node) will have to run at a reduced voltage. Imec
Nine degrees might not seem like much. But in a data center, where hundreds of thousands to millions of chips are packed together, it can mean the difference between stable operation and thermal runaway—that dreaded feedback loop in which rising temperature increases leakage power, which increases temperature, which increases leakage power, and so on until, eventually, safety mechanisms must shut down the hardware to avoid permanent damage.
Researchers are pursuing advanced alternatives to basic liquid and air cooling that may help mitigate this kind of extreme heat. Microfluidic cooling, for instance, uses tiny channels etched into a chip to circulate a liquid coolant inside the device. Other approaches include jet impingement, which involves spraying a gas or liquid at high velocity onto the chip’s surface, and immersion cooling, in which the entire printed circuit board is dunked in the coolant bath.
But even if these newer techniques come into play, relying solely on cool