What’s Wrong With Power Signoff

First of two parts: At advanced nodes there isn’t just a single checklist item anymore, creating major implications for design teams.

popularity

Power signoff used to be a checklist item before a design went to tapeout. But as power has become a critical factor in designs, particularly at advanced nodes, signing off on power now needs to be done at multiple points throughout the design flow. That alone adds even greater complexity to already complex design processes because it requires fixed reference points and scenarios for taking measurements. Moreover, it has to be done with far more precision, taking into account myriad possible interactions and power domains, and with much more uncertainty about how use cases ultimately will affect the total chip’s power budget.

Companies working on advanced designs are finding there are so many overlapping points that need to address power—and which need to be constantly fine-tuned within power budgets—that in many cases entire flows need to be shifted. As bad as this sounds, it gets worse when you consider that every measurement and change has to be synchronized. Indeed, power is a global issue with a complex interactions that can easily push a design well past its allotted power budget. And it gets even more difficult to sort out once software is added into the mix, in part because initial software development has to be tweaked many times just to stay current with all the changes in other parts of the design caused by power management.

“We’ve managed to control the leakage current with finFETs and improve performance, but what’s happened is that power signoff is being magnified by these changes,” said Aveek Sarkar, vice president of product engineering and support at ANSYS/Apache. “So we’ve got reduced noise margin, reduced EM and ESD tolerance, and increased temperature effects.”

The solution to some of these problems has been architectural—more power domains, more switching, and even new transistors and more sophisticated packaging approaches. But the greater cell density, increased dynamic power density and more metal layers also increase the impedance and inductance, said Sarkar. “We’re seeing more noise fluctuation or deviation from a constant supply. The only way to solve this is to really understand how big a problem it is and improve the accuracy of the simulation. The power grid needs to be modified. You need to factor in voltage supply and ground. And the circuits have to be simulated while the circuit-switching scenario is actually happening. Plus, you have to include the package and the PCB. You may not have enough routing layers in the package.”

A growing concern, regardless of market sector
What started out as a critical issue for chips in the mobile market has shifted to all markets at advanced nodes, and even at 28nm and 40nm, which are now considered mainstream. Missing power budgets on processors and advanced SoCs used to mean fixing the problem in the next release, particularly if the first release was a prototype rather than a final design. But at the most advanced nodes, the chip may not work at all.

“Because power is less precisely quantified than timing, it tends to be pushed a little bit, so designers who are meticulous about eliminating negative slack will allow some overage in power numbers,” said Rob Aitken, an ARM fellow. “This manifests itself in products. It’s easy to see that a product operates at 2.1 GHz, but it’s more difficult to determine its power numbers, because they are both challenging to measure and dependent on workload. But leakage numbers are also dependent on process distribution, and there can be an order of magnitude difference or more between best and worst case. There is also some process dependence on dynamic power, but it’s less.”

Aitken noted that timing signoff occurs at the block level because there is a power budget for each block. And with IP, he said you can depend on a vendor’s characterization if you know how it was done and what the numbers really mean—which corners were simulated, what switching activity was assumed for dynamic power, and how all of it ties into expected workloads. But that’s still only part of the picture.

“There’s a need to be able to think about this problem more holistically and to move power information around easily between levels of abstraction in a system,” he said. “There are tools that do this and models that can support it, but they are incomplete. There needs to be a better flow than using Excel spreadsheets to try to connect mcpat models for architects with SPICE for circuit designers. There’s also a need to understand the assumptions and margins that are built in at various levels and to make sure that those are necessary and accurate. The differences between average and worst-case workloads are key here, as is the influence of dynamic and (relatively) static IR drop, package design, thermal behavior and so on.”

What needs to be fixed and when
The challenge is getting enough points of reference for power early enough to actually do something about them. In the past, that used to be done with multiple iterations at a single process node. Those reference points are not available at the most advanced nodes, though.

“If you use a system over time, you can go back and look at the data,” said , president and CEO of Synapse Design. “Now you capture the data and compare that against what is the target as soon as you run the first step of the design. So you may have 60,000 instances that tell you you’re off 30%, but the data needs to be 90%-plus accurate. So it’s a series of capture and predict for area, timing and runtime. And then you find the code is not written in an optimal manner. You don’t get accurate data from that until you’re at the last 10% to 20% of your schedule, and that’s too late. So the first time you can really measure that is too late.”

The best solution—and still not a perfect one for any chipmaker—is to build in a rigorous methodology and predict, through experience, where the flaws are likely to show up.

“You also have power management constraints that have to be taken into account,” said Erich Marschner, verification architect at Mentor Graphics. “From an IP perspective, you need to know what flexibility you have in an IP block and how you can communicate to an end user what they can and cannot do with that block. In the UPF world, you may have a power domain that cannot be subdivided. Or you have isolation requirements, and you have to do that in a way that can meet the power budget.”

He said the larger issue is how to manage power from the system level. “One of the things we’ve been discussing in regard to system-level power modeling is how to get a more abstract power model that allows the process to be more streamlined. The software is trying to make decisions, but if you have an abstract power model you may be able to make good enough decisions. That requires heuristics and learning, though.”

And it requires some experience in working with complex power management schemes, which are becoming more complex at each new process node and with the introduction of new transistor types, new materials, and more advanced software.

“The whole point of power management is the ability to stay within the power budget,” Marschner said. “But you also have a continuum based on whether every block has to be power sensitive or not. Some blocks are on all the time, so you don’t need power management for them. And then with system-level power, the hardware design is only half the problem. A lot of it is controlled by how the software utilizes the hardware. If you optimize the software, that goes well beyond the hardware. “

Coming in part two: Methodologies and tool strategies for dealing with power—and picking your battles to maximize resources.



Leave a Reply


(Note: This name will be displayed publicly)