Power Budgets Optimized By Managing Glitch Power

While not a focus until now, earlier readings can be made in design to better understand the impact of glitch power.

popularity

“Waste not, want not,” says the old adage, and in general, that’s good advice to live by. But in the realm of chip design, wasting power is a fact of physics. Glitch power – power that gets expended due to delays in gates and/or wires – can account for up to 40% of the power budget in advanced applications like data center servers. Even in less high-powered circuits, such as those found on the edge or in IoT devices, glitch can still pose a substantial problem for power budgets.

“Typically, what we have seen with a lot of mobile companies is that they have roughly about 15% glitch power,” said Qazi Faheem Ahmed, technical product manager at Siemens EDA. “The additional waste power dissipation could exceed 40 to 45% if they have AI accelerators or other things on top of that. It depends on what kind of applications are being run. All compute intensive applications would have more glitch power dissipation. It’s also dependent on the technology node they’re using. Advanced nodes, below seven nanometers would have highly pronounced glitch power dissipation compared to mature nodes.”

For many years, glitch power was largely ignored. Because of the nature of the glitches, it was difficult, if not impossible, to get an accurate reading before the RTL level. Now, it’s become possible to get some data when it comes to mature nodes, and even, sometimes, for more advanced ones. This is an important development given the increased focus on issues like battery life and power dissipation in mobile and automotive devices.

While real-time EDA glitch power tools are likely years away, new tools are emerging to allow for better problem solving with glitch. There are also some best practices that designers should keep in mind, such as lowering voltage, to better meet the needs of their power budgets.

Glitch power issues on edge vs. advanced node
Glitch power issues occur when the interconnect delay exceeds the gate delay, leading to an imbalance of toggles. This results in extra transitions that consume more power, and the problem is evident even in mature nodes.

“You can think of glitch as multiple transitions on a given clock cycle,” said William Ruby, senior director of product management for low power solutions at Synopsys. “Before the output settles into its final value, you have multiple transitions taking place where everything except for the final one, which results in the correct values settling down. Everything else before that during a clock cycle is fundamentally wasted power.”

While glitch power may only account for around 15% of the budget in these cases, that’s still a considerable amount. The glitch power in these situations happens, “primarily because they’re trying to reduce the performance in order to meet the power budgets,” said Ahmed. “When you reduce the performance, you’re packing more computation into a single cycle, which is also causing the same end results, basically more glitches.”

Until recently, that wasted power wasn’t seen as a particularly big concern for edge or IoT devices. But the rise of mobile phones has made it a larger priority. While the software of mobile devices “has become smart in making hardware idle,” when it’s not needed, according to Suhail Saif, principal product manager at Ansys, there are so many apps on a single phone that it’s very difficult to put the complete hardware to idle, because all there is always something going on. “When something is going on, there is a glitch power involved. So, the battery drain is very real there.”

But even on more pedestrian devices, reducing glitch power has increasingly become a priority. In the past five years, the lack of proper glitch power-specific EDA tools meant it was rarely a focus unless it had to be, such as in power-intensive applications, Saif noted.

“There was no robust analysis tool. Five or 10 years back, the glitch analysis was very rudimentary. This EDA solution is guessing and telling me that it is 6% of my total power, but is it really 6%? I doubt that. Maybe it’s 3%. These reasons were supporting the perspective that maybe it’s not worth doing anything. ‘Let me just bite the bullet and pay attention to the more important stuff.’ That is changing now, and the reason is, these numbers are going up as the technology advances, as the transistor shrinks, and as we rely more and more on GPU computing. These numbers are no longer ignorable.”

In some cases, lowering glitch power has become a priority not just as a matter of conserving a power budget, but due to safety implications. Prakash Madhvapathy, product marketing director at Cadence, said there are automotive considerations where, despite the presence of a large battery, excess power can lead to other problems.

“When you have, let’s say, an infotainment system, which is mostly housed in the head unit, that unit doesn’t have any room to dissipate power,” Madhvapathy said. “It’s very constrained in its ability to have air flow through that or to have metal things to dissipate the power, so power becomes very important in that case.”

The limits of tools
Because the gate delays and signal delays in advanced node technology be often comparable, it can be difficult to detect glitch power early in design. Essentially, it’s difficult to separate where the delays are coming from, making it hard to get an accurate picture. With more mature nodes, Madhvapathy said it’s easier to get a read on those delays at the synthesis level.

“You do have tools that can simulate and give you active power, and there are tools that you can run to also get the glitch power,” Madhvapathy explained. “But the way glitches work, glitch power mainly shows up more accurately once you have a layout at the RTL level. You can see that there are levels of logic, but you cannot tell at the logical level if the paths are balanced or not. It’s only when you synthesize that you can tell. Further, it’s only when you lay that out that you can tell the actual delay, because the wires, the connectors inside the SoC, will have an RC delay that will add to delays all the different components, different gates. That means the actual delay will be different from what you just got from pure synthesis or looking at the RTL design. So really, it has to be the post layout. I’m not saying that this is the best approach, but the reality is that when designers are doing iterations on the design they cannot afford to go all the way through a place and route, then get the final in a most optimal layout, perform analysis, then come back and redo the design.”

For most designs, this means glitch power can only be calculated at the sign-off stage, no matter how many simulations are run.

“When you’re at the RTL state, that’s zero delay, which means any glitches would not be visible when you do power estimation,” said Siemens’ Ahmad. “The glitch power estimation part would be missing in your estimates. As you go down the design process, and you finally get to the sign-off stage, you might have delay-aware vectors at the gate level, which are expensive to create, simulate, and then get the results. It’s all time consuming. You can do it for 200 cycles, but then you see this hidden power coming out.”

One solution, proposed by Cadence’s Madhvapathy, is to calculate glitch power close to the final design, “but not go all the way to the final design to start measuring. In between, you have to have runs that will go all the way out to layout. Once you’re there, you can measure glitch power once or twice in your design process and figure out if it’s playing a major role in the power you measured. If you know that glitch power is playing a major role, and you do know that you have logic levels that are not matched, you’re going to face similar problems when you get to the final design and final layout.”

Still, it’s an inexact fit, but the incorporation of AI into EDA tools has led to big leaps in accurate estimations.

Historically, doing all the work to determine glitch power was “very computationally intensive to get the RTL to be physically aware, because at RTL level, you haven’t implemented the design yet,” said Joe Davis, senior director of product management for Calibre interfaces and mPower power integrity analysis tools at Siemens Digital Industries Software. “But in order to get these effects, you have to have some model, some idea of what the implementation is going to look like. That’s a ton of work. Now, you can build some AI model, and that’s going to depend on your implementation flow, your synthesis tool, and so forth. In the last few years, the estimation tools have been building in this physically aware flow that’s doing the work, and now you can get that glitch estimation earlier in the flow, at a cost.”

Because those kinds of simulations are so compute-heavy, Davis’ colleague Ahmed said that, for the moment, they are usually reserved for cases where energy efficiency is a paramount concern. For everything else, the best practice remains building in a margin at the start of design, and hoping you don’t have to restart once the calculations come in.

Solutions
There are steps that can be taken to ensure the hard work doesn’t go to waste due to unforeseen glitch power issues. Particularly in mobile, where a heavy emphasis is on fine-tuning systems to ensure battery life is as long as possible, it’s possible to lower glitch power considerably. Ansys’s Saif said that when it comes to major design powerhouses, such as Intel and Samsung, “the first order of business for them is to say, ‘let me figure out which of my designs are more prone to glitch power, and which are not.’ Everybody has finite amounts of time, so they focus on the designs are more prone to glitch power wastage.”

Some of those companies can better avoid the pitfalls of glitch power because of how fine tuned their proprietary libraries are.

“Intel does their own libraries, Samsung does their own because they have foundry control,” added Saif. “They have better control on this because they are able to design it from the library gate itself. From the time they are designing library cells, they are making sure that these cells are kind of glitch proofed, instead of falling to the glitch trap easily. This is a new era, really. Everybody, from foundry to design houses, to RTL architects, to implementation layout engineers, is coming to the realization that they need to do something smart about this as early as possible, so that these numbers don’t blow out of the chart.”

Cadence’s Madhvapathy added that there are simple solutions to attenuate the effects of glitch power that can be taken into consideration, even without going through the entire process to get an accurate reading. Small reductions in voltage, “can have a square effect, and you can reduce the power quite a bit,” he said. “How you apply that is the question, because you don’t have separate power domains in a particular design. Maybe in an SoC you’ll have different power islands, and those are all powered by separate elements that are on the board, which are power management ICs. There, you’re able to shut down power to one part of design when it’s part active, and then thereby control the power that way. The problem is when you are trying to apply similar techniques to reduce the voltage on a leg of an AND gate input.”

Perhaps the best method of ameliorating glitch power issues can be summed up by the Boy Scouts’ motto: be prepared. While detecting the exact amount of glitch power may normally come later in the process, Ahmed observed that there are still steps you can take earlier on to avoid problems.

“In order to do anything with power, irrespective of whether that’s a contribution of glitches or just a standard average or dynamic power, you have to start early,” he said. “The first thing you want to do, when you’re in the RTL stage, is just have functional vectors coming from the test bench. These are not representative of the real scenarios your end device would be used in, so they’re not usually used for power measurement. You could even be in a very early stage; you don’t even have any sorts of vectors to use. What you can do at that time to is identify potential liquidity logic and make sure that you code it for low power RTL, so that your RTL does not exhibit glitches when it converts into a design. There, you can do some structural checking of your RTL itself, using power tools to do a quick check and say, ‘Yes, you have a lot of reconvergence in your logic.’”

Conclusion
Glitch power has long been difficult to calculate early in the design process, but for mature nodes, that’s getting easier. While real time analysis remains a possibility in future tools, current, advanced EDA tools do offer some shift left capabilities for detecting glitch power.

While it has primarily been considered a problem solely for high-power uses, such as data centers, the rise of mobile and automotive has made conserving battery power and safely dissipating power important for applications on the edge as well.

While altering voltage and an extensive proprietary library can help avoid glitch issues, the best solution short of running extensive simulations is to properly budget for the issue early in design, as well as have functional vectors from the test bench.

Related Reading
IC Power Optimization Required, But More Difficult To Achieve
As chips and systems grow in complexity, power budgets are getting stretched. Just shifting left doesn’t solve all problems.



Leave a Reply


(Note: This name will be displayed publicly)