Methodology, techniques and expectations need to be redefined at every process node.
The emphasis on lowering power in everything from wearable electronics to data centers is turning into a perfect storm for the semiconductor ecosystem. Existing methodologies need to be fixed, techniques need to be improved, and expectations need to be adjusted. And even then the problems won’t go away.
In the past, most issues involving power—notably current leakage, physical effects such as electromigration, electrostatic discharge, RC delay and reduced battery life from inefficient designs—were dealt with by large, sophisticated engineering teams at leading-edge process nodes. When they couldn’t solve those problems the foundries stepped in and adjusted their processes. But with 55nm now considered a mainstream process for the Internet of Things, and most designs now using multiple cores and power domains—sometimes as many as 100 power domains per design—everyone is being forced to grapple with incredibly complex power techniques.
To make matters worse, process engineers can’t bail them out anymore. The manufacturing side already is wrestling its own power-related problems, such as shrinking gate oxides between ever-thinner wires, increasing dynamic power density at 16/14nm and beyond, and a massive and very costly effort to create next-generation processes to handle increasingly complex designs. That leaves even the best design engineering organizations are struggling to make everything work, within a very tight power budget, and on an increasingly tight schedule.
“The ITRS roadmap was optimistic,” said Krishna Balachandran, low power product management director at Cadence. “The prediction was that at every node from 45nm to 10nm, power would decrease 4.5X while performance would increase 1.3X, and you would double the number of transistors. More recently ARM CTO Mike Muller said that you would only get a 2.4X increase in performance and power would drop about 60%.”
Verification and the flow
Power is hardly a new concept when it comes to design. The classic tradeoff between power, performance and area, or PPA, has been around for decades. The reality, though, is that until the advent of smart phones, the power component was largely an afterthought. If a design didn’t make the power budget in the first generation of a design, chances were good it could be fixed, or at least improved, in the next.
“I’ve been doing low power design for 22 years,” said Srikanth Jadcherla, low power verification architect at Synopsys. “Classic low power used to be multipliers and adders. In the 1990s and early 2000s, we did OS management. The second generation was SoC firmware. Now we’re in the third generation, which is low power by default.”
In effect, power is now a critical component in every design, and in some it is the most critical.
“Power is four numbers—density, which is the heat aspect; delivery, which is management of peaks; leakage, which is idle power, and lifetime, which is reliability. Automotive and medical are textbook cases for outbound power management. The focus is on what you are controlling and connected control. In an automobile, at least you have a 12-volt battery. In medical, though, it’s really tough because you may be relying on fluorescence to measure whether someone has a disease. That has to go from 0 to multi-amp draw very quickly. We’re also moving into the IoT era, so people need to move from client/server architectures to client/aggregrated server architectures. There is a lot of innovation happening there.”
Given the decreasing benefits of scaling, innovation is required on all fronts. Cadence’s Balachandran noted that at least part of the reason for reduced power and performance benefits from scaling is due to the advent of multicore architectures, which don’t match up with the ITRS predictions.
Architectural changes remain the biggest knob to turn in power reduction. Power needs to be thought about at the architectural level and integrated into the design, and the design needs to be adjusted to optimize power. Heterogeneous multicore is one new twist on that model. Some chipmakers are even attempting to add hardware accelerators onto an SoC to keep within their thermal and power budgets.
“We’re also seeing much more interest in near-threshold and sub-threshold designs,” said Balachandran. “You have to decrease the power of the processor and memory access power. If you can streamline memory access, that makes the power problem much better. 3D-IC is yet another way.”
There have been continual improvements everywhere, of course. Arvind Shanmugvel, senior director of applications engineering at Ansys, said the current generation iPhone offers 50 times the performance of the first generation with a 4X reduction in power. But it’s also getting harder to achieve those kinds of gains at advanced nodes.
“At 16/14nm and 10nm, you have a lot of challenges involving power integrity, reliability and thermal issues,” Shanmugvel said. “We’re seeing dynamic voltage drop in the middle end of line metal layers which can be 10% to 20% of the overall drop. That’s a large amount. We’re also seeing electromigration at lower nodes due to higher drive strength. RC delay is increasing. And at 10nm, thermal analysis will be foundry-mandated.”
Tools and techniques
Concern about power has prompted companies to start looking at a variety of “new” techniques—sub-threshold and near-threshold approaches, for example, along with various packaging approaches, new memory types—high-bandwidth memory, Hybrid Memory Cube, ReRAM, MRAM—and custom logic. While all of them are well into the R&D phase, with some production versions in the market, the challenge is reaching critical mass so there is enough history to make good choices.
Engineers typically don’t have time to think about what’s on the horizon. The metrics that matter most are reliability—increasingly a function of power over time—and the time it takes to get a complex design out the door. That complexity is exacerbated by an increasing number of power domains and power states in order to keep most of the silicon dark. And this isn’t just happening at the leading edge nodes. It’s increasing at every node, including 55nm, where power is a major market consideration for IoT devices. But it does get much worse as the nodes progress.
“At 65 and 130nm, we had about 10 characterized PVT (process, voltage, temperature) corners,” said Wolfgang Helfricht, Platform Marketing Director for the Physical Design Group at ARM. “At 16/14nm, there are 50 or more default PVT corners giving designers a lot of options for optimizing power and performance. The challenge is turnaround time. You want to integrate IP as fast as possible at each voltage domain, but that’s a logistics challenge because you also need to make sure you deal with all the corners and verify that your IP and SoC are working in all cases. Additionally, if you’re utilizing power gating and different sleep modes, each mode requires different characterizations and verification.”
This is showing up on the verification side, as well.
“There are now hundreds of power domains, and you need much finer-grain control for more power states,” said Ellie Burns, product marketing manager for design verification technology at Mentor Graphics. “The complexity is exploding in verification. And we’re seeing the need for much finer-grain control of power in all markets. You need to look at all possible interactions because if you don’t verify them, a device might not even come out power down on reset.”
Burns said what’s changed is that having one power-aware tool is no longer enough. They all have to be power-aware, and there needs to be more formal analysis of power.
“The tools and methodology were okay to put together an SoC and verify it,” she said. “But with 100 power domains and RTL interactions, that’s no longer viable. It’s all breaking. We need a change in methodology so you can design IP with power in mind and pass it into the SoC. We are starting to see a trend to understand coverage and understand the state space, but the state space is huge.”
Methodologies are difficult to shift, though. They require rethinking entire processes within a design organization, from what steps are taken when and by whom. Mobile device chipmakers have been wrestling with these kinds of issues for the past few process nodes. In a large organization, this kind of change is difficult at best.
On a positive note, though, standards are getting better, and the tools at are least becoming more versatile. Anand Iyer, director of product marketing for the low power platform group at Calypto, said the ability to estimate and then take more accurate readings is critical in large designs because of the need to understand system as well as localized power issues.
“Modeling and standardization by themselves do not solve the problem,” Iyer said. “You need to be able to do power analysis at a high level, then figure out what to do with that data.”
Given the number of IP blocks, transistors, memories, and wires on a chip, and the complex scheme for keeping most of them dark most of the time, it’s no surprise that power has become a major issue. But the effects are compounding as more things are added, including “always on” activities such as security. Good security by itself may require rethinking how the pieces go together, what stays on, and what’s needed to minimize the power needed to maintain a certain level of protection.
“One thing that is really growing in importance is power integrity,” said Steven Woo, vice president of enterprise solutions technology at Rambus. “A good analogy is what happens if you turn on all the water inside a building. You lose pressure everywhere. For a chip, if you turn on every subsystems, that’s devastating. You may not have enough voltage to turn on everything, and power integrity goes down.”
Security requires power to operate, but the flip side is that power is noisy. “When you activate circuits you can monitor that noise,” said Woo. “There’s a growing problem with differential power analysis. What it really comes down to is that you’re trying to give confidence for some period of time, so now you have to determine what is a useful lifetime and how long you’re going to guard it.”
Companies selling into the automotive industry are facing other constraints, in part because of the design cycle and in part because there are so many standards that parts need to adhere to. Power is a growing concern in those markets, in part because of the heat already being generated in a small space, and in part because of the impact on reliability of parts. The hotter the parts run, the more likely that will reduce reliability.
“There has always been a debate about architectural and system-level power savings versus doing everything in the guts of the silicon,” said Bernard Murphy, CTO at Atrenta. “Saving power power by clock gating late in the game is almost not worth it by the time you reach RTL, although at the IP level it may make sense. But the MCU guys aren’t doing any of that because of the qualifier cycle with the automakers. They’re only doing biasing at this point, not clock gating or voltage islands.”
He said that one thing that is getting more popular is clock gating of memories, using a redundant write operation and more intelligent ways to shut down part of the memory.
Power has rapidly emerged as one of the thorniest issues in design, and there is no simple fix. More components mean more dark silicon, with battery life or energy consumption now considered part of the value proposition in electronics of all sorts. And at each new process node, the problems are getting harder to solve, more numerous, and more time consuming.
This isn’t a formula for success. Schedules are breaking, verification coverage is under pressure, and reliability is being questioned. Despite all the attention being paid to the ability to churn out chips at the next few process nodes, it may not be the process that holds up progress in upcoming designs. Power is a limiting factor, and it’s getting harder to solve.