Why dynamic power, static leakage and thermal issues need to be dealt with throughout the design process.
Implementation is still the step that makes or breaks power budgets in chip design, despite improvements in power estimation, power simulations, and an increase in the number of power-related architectural decisions. The reason: All of those decisions must be carried throughout the design flow.
“If implementation decides to give up, then it doesn’t really matter at the end of the day,” said Krishna Balachandran, product management director for low power at Cadence. “You’re not going to meet the power target.”
One issue involves the tool algorithms at the synthesis and place-and-route stages. Some years ago, the algorithms for both of these steps were tuned for timing and area, but power typically was not optimized until after the timing had been met. That left some room for optimizing power on the table because it was not considered at the same time and at every step along with timing, he said.
But with power now a leading design consideration, implementation tools had to be re-architected. The tools were revamped from the ground up to include power as a cost function throughout the optimization process be it logic synthesis, floorplanning, placement or routing. For example, power is considered as a fundamental factor in the placement algorithms to decide whether a current placement state will be replaced by an alternative placement state in an attempt to get a better placement result in the end.
Balachandran explained that an ROI engine had to be built into it, which means it is studying the impact of taking that move and either accepting it or rejecting it based upon a cost function. By building in leakage and dynamic power as components, then as the placement engine makes these calculations it can decide when and where to optimize power. “It cannot be something you do afterwards, because anything you do afterwards is not going to be as good as it could have been,” he said.
The rising importance of dynamic power
“It comes into architectural decisions, as well as implementation decisions,” said Rod Metcalfe, product management group director at Cadence. “The architectural decisions tend to be done slightly earlier in the flow, and the final implementation decisions are done during place and route, primarily. Certainly there are many things the implementation tools can do to improve power, and that’s really what much of the new technology has done. We’re able to look at dynamic activity, optimize power based on dynamic objectives, as well as static leakage power. So implementation can have quite an effect over and above the architectural decisions, like the clock gating and like the architecture selection, which also has a huge impact on power.”
At the same time, more engineering teams are making use of power analysis based on implementation, and they may make changes based on what they see even if these changes tend to be less drastic.
“When you make an architectural change, that is something very fundamental that happens earlier on in the flow,” Metcalfe said. “But based on the final analysis, when you have implementation you have all the final parasitics and you have more detailed information, so you can do more analysis with this high quality information. That will drive architectural decisions. I’ve seen [engineering teams] changing clock-gating strategies based on what they see during implementation, but the cycle is longer. You can make an RTL change much quicker than having to go all the way through implementation.”
Like everything else in hardware, it’s better to do as much as possible earlier on, which is why there is increasing usage of power estimation and power optimization at the earlier stages rather than at implementation. “Implementation is all about, I’ve got a spec. I’ve got it tightened up. Now I’ve got to make sure I can deliver that spec. I don’t want to lose it. I’ve got to make sure I can deliver what I thought I could at an earlier stage. I don’t want any surprises at that stage,” Balachandran said.
FinFETs add a new wrinkle. While dynamic power density increases with finFETs, one of the big advantages is that they can be operated at a lower voltage—in some cases below 0.5 volts.
“This immediately provides dynamic power savings, but can introduce new timing challenges due to variation and waveform distortion effects,” said Mary Ann White, director of product marketing for the Galaxy Design Platform at Synopsys. “There is a lot more variation with the smaller finFET process geometries, especially at 10nm and below, due to the shrinking node process and wire alignment of the various lithographic effects. At ultra-low voltages, the variation is more magnified where waveform distortion also happens due to increased wire resistance and Miller effects (higher capacitance).”
EDA tools can handle the process variation effects with more accurate parametric on-chip variation, based on optimization and analysis from synthesis through place and route and sign-off. In addition, they can take into account waveform distortion at advanced nodes to provide tighter sign-off correlation for ultra-low voltage operation, White said.
Thermal issues bring challenges
As an adjunct to power, thermal issues also impact implementation, propelling design teams to placing numerous buffers or high- speed drivers in the same area.
“Let’s say you have a very wide bus, and you put all of those registers next to each other,” said Aveek Sarkar, vice president of product engineering and support at . “This creates two scenarios in cases when all of these bus registers or buffers fire at the same time. It creates massive voltage or power flashes on the chip that pretty much nobody can design for, so implementation-wise if there was something you could do to reduce that, it would be very helpful. People end up creating thermal hot spots because they cluster a lot of high power cells together.”
In these cases, thermal and voltage drop issues go hand-in-hand because of the high-speed drivers and high-strength cells placed close to each other. It’s the things engineering teams miss that creates problems that occur down the line, he stressed. “That clustering of buffers together — those are obvious things that you can take care of. The placement, where you floor-plan it, where you place some of the temperature sensors — those are the things you start to look at carefully.”
Power is a competitive advantage
For some companies, being able to save every drop of power translates into a competitive advantage. As a result, micro-architects and designers are looking at saving power earlier, according to Abishek Ranjan, director of engineering at Mentor Graphics.
As such, newer tools are addressing the growing need and demand that RTL designers need to get more involved. “The RTL designers, since they are working at the RTL level, have limited foresight into what is going to happen at the back-end level. There are many changes they can do. Many are micro-architectural and sequential in nature, and many are combinational,” he said.
Designers have at their disposal combinational clock gating, which is mostly retained throughout the implementation flow, but they also play around with the re-positioning of the data-path operators, such as shifters, multipliers, multiplexors, and adders. “While at the RTL it might look very tempting to replace a multiplication with two additions, and this will give significant power savings, a lot of these combinational decisions that are taken by RTL designers based on the feedback that they are getting purely at RTL are more often than not undone by the implementation tool in favor of performance, timing, and area requirements. At the end, what happens is that you have a chip that is performing worse in terms of power than what you had estimated early on,” Ranjan said.
Ellie Burns, senior product manager at Mentor Graphics, said it’s really important to have the designers make the shift in their heads. “We think of the functionality of the design as ones and zeroes, and what its data is. But if the RTL designers think more that the functionality also includes power, and that power has a leak or a problem, then it is a functional problem. And if you change it in your RTL, then that by definition lasts through the implementation because you have actually made a functional change. So, it is important to recognize that a functional change in the RTL by definition lasts all the way through implementation. And the tools are getting better to see things in the RTL that are too complex for the designers to see. The tools take this complexity and reduce it to say, ‘Here’s a functional problem that you have.’”
At the same time, Tobias Bjerregaard, CEO of Teklatech, asserted that it’s important to define what we mean by “power.” There is power consumption or total power, as well as power integrity, which is the power delivery and how the power on chip gets to the circuits.
“The funny thing about power integrity is that implementation is what the mess is all about, because we are trying to deliver something through an imperfect physical system,” Bjerregaard said. “The whole concept of trying to build something perfect using imperfect materials —metal, which has resistance, and inductance at a package level—we are trying to deliver power in a perfect way with imperfect materials. So the physical implementation is what’s causing the problem in the first place. From a power integrity perspective, you can say that implementation impacts it in a very negative way. It is a power integrity issue in the first place.”
Total power is different because the devices use power, but discussing how implementation impacts that in a negative way usually includes a discussion of how it goes through the backend flow, from synthesized netlist to placing and routing the design, and what happens on the way there.
“From a total power perspective, power is impacted negatively because we use power- hungry methods to achieve a lot of things like, most importantly, timing,” Bjerregaard said. “With implementation you can’t talk about one thing without also talking about another, so it’s always a question of how we manage and balance these different metrics like power and timing and area, which are the basic three. The problem there is that during implementation, routability is one of the biggest issues at 10nm and below, and the reason is that wires don’t scale as fast as cells. So we can pack the transistors and the cells closer but we can’t route the design. That really determines the area, and therefore cost, at advanced nodes.”
That spills over to timing because with routability issues, the design must be routed all the way around to get a wire through, and that causes timing problems. So how do you fix those timing problems?
“We fix them by buffering up the path so they are faster, and that costs a lot of power,” he said. “The problem here, which we see quite a lot at advanced nodes — even at 16 and 14nm — is that the amount of power burned in just closing timing is quite significant. The power impact on closing timing in these advanced nodes can be 20%, depending on the design. This is power being burned in buffering, and every time I hear about the overhead of closing timing due to one thing or another, I see it as a possibility to not use that power.”
Interestingly, one of the things that happens during implementation that impacts power integrity in a negative way, Bjerregaard said, is that the tools make some bad calls because they don’t have the full picture. “At the early stages of the design flow, we make a lot of assumptions. For instance, we make the assumption that the power grid has an even distribution of power, that everything is hooked up in a sensible way, and so on. And then everything looks fine—or at least we can tell how bad it looks. But once we start implementing, then things start getting more heterogeneous. Maybe a RAM doesn’t have as strong of a power connection because there was a need for routing resources in this area, or we didn’t get enough vias in, and so on.”
There is a widespread belief that early stage power must correlate with late stage, but Bjerregaard thinks that’s the wrong way to look at it. “What’s important to understand is that early stage tools make assumptions, so if you go down the flow and you suddenly see a discontinuity in what you’d expected — for instance, worst voltage drop increases dramatically from one step to the other, or total power goes up — it may mean that you’ve implemented something that wasn’t according to your assumptions.”
If it deteriorates gently, i.e., it gets a little worse, a little better, a little worse, that’s fine because you can’t predict every physical detail, he said. But if suddenly there is a jump or discontinuity, it means that you’ve done something that is not according to your assumptions. That needs to be fixed immediately, not after sign-off, because after sign-off you have a long design flow loop to close, and that is not guaranteed to be converging. In that sense, he said the early stage assumptions become the spec, and that goes hand in hand with tracking power integrity throughout the physical implementation flow.
When it comes down to it, to be aggressive about power savings means leakage must be controlled at the transistor level, Drew Wingard, CTO of Sonics, pointed out. “To do that, there are two choices if you ever want them to run fast: you can reduce the voltage to some transistors, or you can cut off the voltage to those transistors. If you don’t need them to ever run fast, then you basically can not use short channels or have transistors with high threshold voltages that don’t leak as much. But that’s not a very attractive solution to most people who have some times when they want to run fast.”
If the goal is to reduce the supply voltage sometimes, this enters the realm of dynamic voltage and frequency scaling, with the accompanying challenges associated with characterizing the behavior of the circuit at multiple different voltage operating points, and trying to make sure all of the electrical requirements are met.
“The characterization associated with implementation tends to blow up a bit,” Wingard said. “You’d like to think that whatever you did for the fast operating point will be safe at the slow operating point, and that is true if you don’t want to be aggressive about the frequency you operate at in that slower operating point in general. But most people want to be a bit aggressive, so they end up re-characterizing.”
Along with that, the design must be analyzed at a number of different operating points in order to perform DVFS.
Finally, all of that characterization needs information from the back end of the implementation. “Once you have the full chip laid out, then you know the actual value of all these capacitances,” he said. “Now you can begin to estimate how much charge is required to recharge all this capacitance, and then you can figure out how slowly you need to go in order to bring up the circuit safely.”
Given the complexities of designs today, the impact of implementation on power, as well as the impact of power on implementation, are not solved problems. And as bleeding edge nodes move closer to reality, the challenges are only set to continue growing.