The physics of semiconductor scaling require thinking about die area utilization in a new way.
The concept of dark silicon sounds almost mysterious, but it is a simple matter of physics. With advances in technology nodes and the ability to pack more and more transistors on the same die, design engineers are reaching a wall where only a fraction of a design can be powered on due to power and thermal implications.
Moreover, the challenges that force this kind of complex power management will only continue to grow at each new process node, observed Greg Yeric, senior principal engineer at ARM, during in a short course at the 2014 IEEE Electron Device Meeting. Projections estimate the dark silicon fraction will be about one-third of total area in the 20nm technology node (including 16/14nm finFETs), increasing to as much as 80% by the 5nm node. Real products are likely to achieve better results, but clearly power consumption imposes an increasingly severe design constraint. (See related story here)
Put another way, dark silicon is the gap between what can be done geometrically and the power density seen when trying to utilize the silicon 100%, explained Steve Carlson, vice president of marketing for low power and mixed-signal solutions at Cadence. “At the same time, you’re trying to improve the computational efficiency that you have in your SoC and we see that there’s well-documented 100X to 1000X difference in terms of energy efficiency in terms of hard-coded functionality in transistors versus using a programmable processor. So where the use model for a particular device is well understood, you take some of stuff that you would have had in software, you put it into hardware, and now you’re going to have a longer battery life and it’s going to be a cooler running device.”
While there are a number of these specialized processors that fill up all of the silicon that can be put on advanced geometries, they can’t all be used at the same time or there will be a power density problem, he said. “You don’t want to go to a programmable solution that would let you use less transistors because it’s not going to be energy competitive, so you have to go through this complex orchestration of turning thing on and off. And it turns out that most of those specialized processors are off most of the time, and that’s where you get this high percentage of dark silicon.”
Arvind Shanmugavel, director of application engineering at Ansys-Apache, noted that over the past couple of decades, due to the sensitivity of power and thermal, dark silicon isn’t discussed explicitly. Rather, it is built into design methodologies that designers adopt to achieve a good level of scaling with both power and thermal implications in mind, and is reflected in the architecture.
Dark silicon implications
More than designing for dark silicon, the challenge is how to design around it, Shanmugavel asserted. “That’s really the bigger question that we have to answer: How are we designing with power and thermal in mind in today’s designs?”
One approach involves architecture design. “By using different types of architectures overall power can be reduced at the hardware architecture level. What we mean by that is at the hardware level, when we do power gating or clock gating, we’re essentially able to power off parts of the design so that it does not consume any power. That is done to reduce the total thermal design for power envelope (TDP) of chips when they are operating.”
Waking them up, and making sure other parts of the chip know they are awake, isn’t so simple, though. “Suppose block A wants to talk to block B, but B is dark,” said Bernard Murphy, CTO at Atrenta. “First, A has to realize that B is dark and go into a wait state until B wakes up. Next, B has to wake up. Finally, A has to know that B has woken up so it can re-initiate its request. All of this requires handshaking in the communication. Next you have to deal with how long it takes for B to wake up before it returns to a usable state. If this is too long, you might be saving power but quality of service (QoS) could be severely impacted (choppy phone calls for example). As a result, you have to consider whether you should retain some state in B through retention logic, even when it is dark, which would allow B to wake up faster. Retention logic impacts design not only in adding the logic to save state and recover on wakeup, but also in adding multiple reset types, including reset that does not affect retention logic (for eg a software reset) versus a full reset (for a hardware reset).”
Murphy noted that dark silicon has an impact on the design process, as well. “You may decide that B can go dark, but during verification you realize that a part of B is being called on more frequently than you had expected and you can’t afford the latency required to turn B on each time, or the increased power drain that is also implied. So you decide that you need to pull that piece of logic out of B and put it into an always-on domain. This requires re-partitioning, restructuring the power intent, changing timing constraints and the floorplan (implementation these days is evolving almost in parallel with the design) and very possibly changing assertions and testbenches. It’s pretty messy. This is why there is growing demand for tools to help with this restructuring.”
Similarly, Shanmugavel pointed to different types of architectures that are evolving, such as ARM’s big.LITTLE architecture, which makes sure the optimal cores are used for different tasks. “We have one big core that can perform the really heavy lifting, whereas the smaller cores can perform smaller tasks but consuming only a fraction of the power. At a hardware architecture level we are able to achieve this level of powering on and doing power efficient design by using these techniques.”
Software-level architecture, which entails doing scheduling during run-time, is another option for dealing with dark silicon. “As the processor is operating, runtime power and thermal management is becoming very important. Some examples are doing scheduling of different instructions at different periods of time can optimize the overall power consumption, such as doing operations like Dynamic Voltage and Frequency Scaling (DVFS) and getting the feedback of how many instructions are being cued up—or whether my TDP (thermal design power) is going to be broken if I operate at this frequency for a sustained period of time. Those are questions that the software architecture takes into account before kicking in the DVFS types of schemes,” he continued.
Another approach is to look at the actual hardware techniques that people are using to reduce the overall power. That becomes very important, especially as designs push into finFET technology nodes. “Reducing the operating voltage of the chip can significantly reduce the overall power consumed, and we’re not only getting into just simple voltage reduction techniques but also into near -threshold computing, where devices are literally operating near the threshold of the actual device with very low margin for noise,” Shanmugavel said. “Near threshold computing can significantly reduce the overall power consumed because power has a quadratic relationship in terms of voltage. We are also seeing techniques where people are trying to lower the overall switching capacitance and the switching activity during the place and route phase where several different hardware techniques are being employed for reducing the overall power of the SoC.”
There are reliability implications to design approaches, of course—both operational and lifetime reliability. Both can be affected by the design style chosen for dark silicon.
In a large SoC, in order to meet the TDP envelope, design engineers tend to turn on only some parts of the die and for small fractions of time. Because there are a large number of power delivery networks along with large fluctuations in the overall power noise, ESD (electro-static discharge) and Electromigration (EM) issues increase, and become more important to model.
Carlson said the added complication of dark silicon has pushed design techniques toward the extreme in terms of applying things like power shutdown and DVFS, given the increasing number of power domains in design today. “It’s not uncommon to see 200 or 300 different power domains that are all managed and switched independently. There are a lot of use cases that need to be examined to make sure that you don’t inadvertently turn too many of those on at once, or you run into thermal issues.”
Beyond just thermal issues, dark silicon complexities may mean an SoC is over-designed, which is a waste of valuable resources. Adding more cores doesn’t mean they will actually be used, for example.
“You may actually have an eight-core system, but because of this problem four of them might have to be dark,” said Krishna Balachandran, product marketing director for low power at Cadence. “As such, the computations you think you’re going to be able to perform or the speed at which you’re going to be able to perform a task will be impacted because of the power concern and that’s how the dark silicon materializes.”
The complexity of managing everything that dark silicon demands is going through the roof, not to mention the verification side of the design task that will drive the need for more resources like emulation into play.
“Physics demand that this is the new normal. If you’re really trying to squeeze the most out of your battery life or control temperature to the greatest degree, you have to use techniques like this. As long as there is leakage, you’re going to have to shut stuff down and make it dark. Otherwise, you’re at a disadvantage,” Carlson added.
Shanmugavel agreed. “At bleeding- edge nodes, the dark silicon issue is only going to get worse. As we go into 10nm we can certainly fit more transistors in the same area. We are able to scale the transistor dimensions. We are able to scale the metal geometries and die sizes, and functionality is going to constantly increase. Moreover, we are going to rely more heavily on architectural partitioning to understand what parts of the die can operate at what periods of time. That is going to be the single biggest implication as we go into much more functionalities on the same die.”
Looking back over ASIC design for the past 15 years, designers were tending to design with pure performance in mind, he continued. “Performance was the only metric that people thought about while designing silicon, and they wanted to put as many transistors, have the highest frequency, get the best performance. Today, the philosophy is very different. The first thing is what is my end product? Am I going to satisfy the TDP of the end product? From that, you work backwards onto your design. It’s a complete paradigm shift in terms of how we think about design going forward.”
Mark Milligan, vice president of marketing at Calypto, observed that dark silicon dispels the notion that transistors are free. “Power is the number one reason for the high percentage of dark silicon; hence, it should be considered during the design phase as an important design metric. Designers need to control overall project cost and thus reduce the percentage of dark silicon in their design. Designers need to be educated on choice of low -power micro-architectures for their design and what are the tradeoffs.”