New tools, standards, languages and methodologies will be necessary to automate growing challenges at all process nodes.
Analyzing and managing power at the system level is becoming more difficult and more important—and slow to catch on.
There are several reasons for this. First, design automation tools have lagged behind an understanding of what needs to be done. Second, modeling languages and standards are still in flux, and what exists today is considered inadequate. And third, while system-level power has been a growing concern, particularly at advanced nodes and for an increasing number of mobile devices that are being connected to the Internet, many chipmakers are just now beginning to wrestle with complex power management schemes.
On the tools front, some progress has been made recently.
“It might not be 100% there yet, but the tools are now starting to become available,” said Rob Knoth, product management director for Cadence‘s Digital & Signoff Group. “So we’re at a bit of an inflection point where maybe a year or five years from now we’ll look back and see this is about the time when programmers started moving from the, ‘’Hey, we really need to be doing something about this’ stage into the, ‘We are doing something about it’ mode.”,
Knoth pointed to technologies such as high-level synthesis, hardware emulation, and more accurate power estimation all being coupled together, combined with the ability to feed data from the software workloads directly all the way through silicon design to PCB design to knit the whole system together.
There has been progress in the high-level synthesis area, as well, in part because engineering teams have new algorithms and they want to be able to find out the power of that algorithm.
“It’s no longer acceptable to just look at an old design and try to figure it out,” said Ellie Burns, product manager of the Calypto Systems Division at Mentor, a Siemens Business. “It doesn’t really work very well anymore. So you have to be able to say, ‘I want to experiment with an algorithm. What power does it have during implementation?’”
This can mean running the design through to implementation as quickly as possible to determine power numbers. “Power is most accurate down at the gate level,” Burns said. “We’re a million miles from that, so what do you do? We’ve also seen some applications of machine learning where you start to learn from the gate-level netlist, etc., and can begin to store that and apply that from emulation.”
All of these techniques and others are becoming important at 10/7nm, where dynamic current density has become problematic, and even at older nodes where systems are required to do more processing at the same or lower power.
“Part of this is optimizing signal integrity,” said Tobias Bjerregaard, CEO of Teklatech. “Part of it is to extend timing. What’s needed is a holistic approach, because you need to understand how power affects everything at the same time. If you’re looking at power integrity and timing, you made need to optimize bulk timing. This is not just a simple fix. You want to take whatever headroom is available and exploit what’s there so that you can make designs easier to work with.”
Bjerregaard said these system issues are present at every process node, but they get worse as the nodes shrink. “Timing, routability and power density issues go up at each new node, and that affects bulk timing and dynamic voltage drop, which makes it harder to close a design and achieve profitability.”
PPA
Design teams have always focused on power/performance/area triumvirate, but at the system level power remains the biggest unsolved problem. Andy Ladd, CEO of Baum, said virtual platform approaches try to bring performance analysis to the system level, but power is not there yet.
“Power is all back-end loaded down at the gates and transistor level, and something needs to shift left,” Baum said. “For this we need a faster technology. A lot of the tools today just run a piece of the design, or a segment of a scenario. They can’t run the whole thing. If you really want to optimize power at the system level, you have to include the software or realistic scenarios so the developers know how that device is going to run in a real application. Something needs to change. The technology has got to get faster, and you still have to have that accuracy so that the user is going to have confidence that what they are seeing is good. But it has to change.”
Graham Bell, vice president of marketing at Uniquify, agreed there is a real gap at the system level. “We don’t see solutions that really understand the whole hierarchy from application payloads, all the different power states that each of the units or blocks inside the design, whether they are CPUs or GPUs or other special memory interfaces. All of these things have different power states, but there is no global management of that. So there needs to be some work done in the area of modeling, and there needs to be some work done in the area of standards.”
The IEEE has been actively pushing along these lines for at least the last few years but progress has been slow.
“There have been some initial efforts there but certainly instead of being reactive, which a lot of solutions are today, you really want to have a more proactive approach to power management,” Bell said.
The reactive approach is largely about tweaking gates. “You’re dealing with the 5% to 10% of power,” said Cadence’s Knoth. “You’re not dealing with the 80% you get when you’re dealing at the algorithm level, at the software level, at the system level — and that’s really why power is really the last frontier of PPA. It requires the entire spectrum. You need the accuracy at the silicon and gate level, but yet you need the knowledge and the applications to truly get everything. You can’t just say, ‘Pretend everything is switching at 25%,’ because then you are chasing ghosts.”
Speaking the same language
One of the underlying issues involves modeling languages. There are several different proposals for modeling languages, but languages by themselves are not enough.
“I look at some of those modeling languages that look at scenarios, and they are great, but where do they get their data from?” asked Mentor’s Burns. “That seems to be the problem. We need a way to take that, which is good for the software, but you need to bring in almost gate-level accuracy.”
At the same time, it has to be a path to implementation, Ladd said. “You can’t create models and then throw them away, and then implement something else. That’s not a good path. You’ve got to have an implementation path where you are modeling the power, and that’s going to evolve into what you’re implementing.”
Consistent algorithms could be helpful in this regard, with knobs that helps the design team take it from the high-level down to the gate level.
“The algorithm itself needs to be consistent,” said Knoth. “Timing optimization, power measurement — if you’re using the same algorithms at the high level as well as at the gate level, that gives the correlation. We’re finally at the point where we’ve got enough horsepower that you can do things like incredibly fast synthesis, incredibly large capacities, run actual software emulation workloads, and then be able to harvest that.”
Still, to be able to harvest those, the data vectors are gigantic. As a result, the gate-level netlist for SoC power level estimation is not practical. The data must somehow be extracted because it’s tough enough to get within 15% accuracy in RTL, let alone bringing that all the way back up to the algorithm.
Increasingly at smaller geometries, thermal is also a consideration that cannot be left out of the equation.
Baum’s Ladd noted once the power is understood, then thermal can be understood.
“This is exactly why we’ve all been chasing power so much,” Knoth said. “If you don’t understand the power, thermal is just a fool’s errand. But once you understand the power, then you understand how that’s physically spread out in the die. And then you understand how that’s going to impact the package, the board, and you understand the full, system-level componentry of it. Without the power, you can’t even start getting into that. Otherwise you’re back into just making guesses.”
Fitting the design to the power budget
While power has long been a gating factor in semiconductor design, understanding the impact at the system level has been less clear. This is changing for several reasons:
• Margin is no longer an acceptable solution at advanced nodes, because the extra circuitry can impact total power and performance;
• Systems companies are doing more in-house chip design in complex systems, and
• More IP is being reused in all of those designs, and chipmakers are choosing IP partly on the basis of total system power.
Burns has observed a trend whereby users are saying, “‘This is my power budget, how much performance can I get for that power budget?’ I need to be pretty accurate because I’m trying to squeeze every bit of juice out. This is my limit, so the levels of accuracy at the system level have to be really really high.”
This requires some advanced tooling, but it also may require foundry models because what happens in a particular foundry process may be different than what a tool predicts.
“If an IP vendor can provide power models, just like performance models, that would benefit everybody,” said Ladd. “If I’m creating an SoC and I’m creating all these blocks and I had power models of those, that would be great because then I can analyze it. And when I develop my own piece of IP later, I can develop a power model for that. However, today, so much of the SoC is already made in third party IP. There should be models for that.”
UPF has been touted as the solution to this, but it doesn’t go far enough. Some vendors tout hardware-based emulation as the only solution to this in order to fully describe the functionality.
“You need the activity all together, throughout the design,” said Burns. “That’s the difficult part. If you had the model on the UPF side, we need that. But then how do we take how many millions of vectors in order to get real system-level activity, and maybe different profiles for the IP that we could deliver?”
Knoth maintained that if the design team is working at a low enough granularity, they are dealing with gates. “UPF for something like an inverter, flip flop or even a ROM is fine, but when you abstract up to an ARM core level or something like that, suddenly you need a much more complex model than what UPF can give you.”
While the UPF debate is far from over, Bell recognized there really is a gap in terms of being able to do the system-level modeling. “We’re really trying to do a lot of predictive work with the virtual prototyping and hardware emulation, but we’re still a long way away from actually doing the analysis when the system is running, and doing it predictively. We hear, ‘We’ll kind of build the system, and see if all of our prototyping actually plays out correctly when we actually build the systems.’ We’ve played with dynamic voltage and frequency scaling, we do some of the easy things, or big.LITTLE schemes that we see from ARM and other vendors, but we need to do a lot more to bring together the whole power hierarchy from top to bottom so we understand all of the different power contributors and power users in the design.”
Further, he asserted that these problems must be solved as there is more low power IP appearing in the marketplace, such as for DDR memories.
“We’re moving to low power schemes, we’re moving to lower voltage schemes, and what we’re trying to do with a lot of that IP is to reduce the low power footprint. The piece that designers need to struggle with is what happens with their ability to have noise immunity and have reliability in the system. As we push to lower power in the system, we’re reducing voltages, and then we are reducing noise margins. Somehow we have to analyze that and, ideally, in the actual running design somehow predictably adjust the performance of the design to work with actual operating conditions. When you power up with an Intel processor, it actually sets the supply voltage for the processor. It will bump it up and down a certain number of millivolts. That kind of dynamic tuning of designs is also going to have to be a key feature in terms of power use and power management,” he said.
Related Stories
Transient Power Problems Rising
At 10/7nm, power management becomes much more difficult; old tricks don’t work.
Power Challenges At 10nm And Below
Dynamic power density and rising leakage power becoming more problematic at each new node.
Closing The Loop On Power Optimization
Minimizing power consumption for a given amount of work is a complex problem that spans many aspects of the design flow. How close can we get to achieving the optimum?
Toward Real-World Power Analysis
Emulation adds new capabilities that were not possible with simulation.
Dealing with system level power would have been a no-brainer if Accellera had insisted on combining the Verilog-AMS standard with the new SystemVerilog standard back in 2002. As it stands nobody is working on supporting power simulation in SystemVerilog.
As a long time participant in the standards efforts it seems unlikely to be fixed unless someone pays me to do it (independently), or an AI picks it up (more likely).
Try here for a discussion on how to fix it –
https://ieee-collabratec.ieee.org/app/groups/1635/Open-SystemVerilog