Dynamic voltage and frequency scaling resurfaces, but it’s not the only option to improve performance and lower power.
Design teams are beginning to consider dynamic power management techniques as a way of pushing the limits on performance and low power, leveraging approaches that were sidelined in the past because they were considered too difficult to deploy.
Dynamic voltage and frequency scaling (DVFS), in particular, has resurfaced as a useful approach. Originally intended to dynamically balance performance and power consumption at runtime, chipmakers experimented with it for several years before pushing it aside. Now, this approach and others are getting more attention for a number of reasons. Among them:
Still, it’s unclear how popular these techniques ultimately will become. This is still difficult engineering, and at this point it’s well beyond the low-power capabilities of many design teams.
“The number of devices per technology is drastically increasing,” said Roland Jancke, head of the department for design methodology at Fraunhofer IIS’ Division of Engineering of Adaptive Systems. “I/O devices differ from core devices and memory devices, and all of them are optimized for their specific tasks and operating conditions. However, the principle of DVFS has recently been reintroduced with the invention of adaptive body biasing. This allows the technology to be drastically simplified while the designer still has the freedom to choose at runtime between low power or high performance, theoretically for each individual device, and in practice for specific blocks or areas of the design.”
In practice, DVFS is being employed at the very high end, where it is used to push performance and overdrive a device, as well as at the very low end, where it is being utilized for sub-threshold computing by constantly modulating the voltage.
“They do this not just to save power,” said Gideon Intrater, CTO at Adesto. “It’s also because the capabilities of their transistors change with temperature. So they actually need to dynamically modulate the voltage all the time to make sure that they get the right performance levels of their transistors.”
Using DVFS to push designs has been discussed at conferences in the past, but nearly all implementations so far have been test chips. “There’s a lot of work in the design effort,” said Intrater. “Obviously, you also need a variable voltage source, but that is a reasonably simple thing to design.”
Along with that, expertise in this area is sparse. “In theory, it’s great if you know how do it because it accelerates everything,” said Raphael Mehrbians, vice president and general manager of the memory products division at Adesto. “But there may only be one person in the world who knows how to do that.”
Low-power churn
Still, if one area is open to change, it’s low-power design. Power has been seen as the primary gating factor for performance for some time, and it remains one of the biggest impediments to device scaling. SoCs typically have multiple power domains and voltages, and thermal dissipation has been a major concern for more than a decade.
“As soon as you get to the finFET nodes, the problem changes and moves back to being more about dynamic power than about leakage power,” said Pete Hardee, director of product management in the system and verification group at Cadence. “The dominant power intent standard today is UPF, which allows the components that need to be inserted into the design to manage power gating for primarily controlling leakage power. Leakage power became a bigger issue for planar CMOS, because it gets impossible to fully turn off the transistors. There’s always leakage running through the transistor even when the gate is turned off. That problem actually is addressed by the finFET quite well, in that the transistors behave themselves a lot better and you actually can turn them fully off. Meanwhile, we’re also seeing leakage power is a huge component of battery-operated equipment that’s off a lot of the time.”
At the most advanced nodes, leakage is cropping up as a problem once again. A 7nm finFET is leakier than a 16/14nm finFET, and at 3nm Samsung plans to move to gate-all-around nanosheet FETs to control leakage.
That’s only part of the problem, though. High-end server and AI chips are becoming so dense that dynamic power density is causing thermal issues, exacerbated by the fact that some of the processing elements in these chips are always on.
To help manage that, a number of different clock domains are being used in designs with separate processing, and engineering teams are trying to run each calculation or sub-processing element on these big chips at the most efficient clock frequency possible, Hardee said. “There’s an explosion of clock domains in the designs that we see in order to be more efficient for dynamic power.”
This, in turn, made clock domain crossing an important element of advanced-node chip verification. But it’s also shifting the perception that this is an implementation problem — and often an implementation afterthought — into one where it is being seen as an active concern for low-power verification.
“Not only do you want to run at the most efficient clock frequency possible, but in order to save as much dynamic power as possible, you want to turn the clock off to as much circuitry as possible when it’s not being used,” Hardee said. “As a result, there is a huge emergence of clock gating optimizations being made at the RTL stage. To some degree, synthesis tools can optimize dynamic power by introducing clock gating. That’s very much done at a leaf level for considering whether or not each individual register is active or not, and gating the clock when it’s not. But engineering teams are looking for bigger savings than that. They’re looking at being able to turn off bigger blocks of circuitry for a longer period of time, in terms of not having them driven by the clock the whole time. When I say ‘turn off,’ I don’t mean power gating. I mean clock gating. This is driving the emergence of verification tools that are necessary to check that the optimizations that aren’t the RTL are not adversely affecting the circuit functionality. Sequential equivalence checking can help here.”
Sequential equivalence checking can compare the functionality of a specified design as a spec, as well as its implementation. In the case of clock gating, the spec design would be the design without clock gating, which has the full functionality, whereas the implementation would be the design that introduces clock gating and a sequential equivalence checking app to ensure the functionality is equivalent.
“That becomes a sequential equivalence checking problem rather than a logical equivalence checking problem, because, of course, we’re changing the clock schedule of the circuitry, and that goes beyond usually the ability of logical equivalence checking tools to check that,” he said.
“To make sure all of those components that the power intent introduces are introduced correctly — things like isolation cells, state retention registers, power switches — all of that checking is still important. But we’re seeing a big increase in the need to optimize dynamic power, which is creating this enormous explosion of clock domains. And that is making CDC clock domain crossing a verification problem rather than implementation problem. We’re also seeing people looking to optimize clock gating a lot more broadly, which is seeing a big boost in the need for sequential equivalence checking.”
Still too difficult?
DVFS is still on the fringe of all of this. “I don’t necessarily see that that’s been done as much as people thought it would be dynamically,” he said. “There was a lot of talk about that, but people found it pretty difficult to do in practice. What we are seeing is there are IPs being designed that have various modes, multiplexes selecting various block rates that can be used in different modes, but I’m not sure we’re seeing that switch dynamically in the way that, a few years ago, we were talking about DVFS as a big thing. Maybe that was a little too difficult to implement. We’re definitely seeing multiple modes of IPs where different clock frequencies can be multiplexer selected. When people talk about multi-mode CDC, you really want to be able to verify all of the modes, all the clock frequencies that the circuitry can run with, and you want to be able to do that verification at one time for that chip. But are people switching those frequencies dynamically, or are they just selecting ones for the overall chip?”
Others agree. “[DVFS] started as theory, and there were Ph.D.s who applied it,” said Phil Dworsky, global head of strategic alliances at SiFive. “There were a couple of companies in the world who could do it and, probably 15 years ago or so, the industry developed methodology and tooling to try to enable the masses on this so there would be a much bigger group of people who were then able to design chips using these crazy Ph.D.-only techniques. Automation that would help them do it and achieve the benefits. But then there was still a software level that was key to making it all work, and that still took a while after that.”
Today, areas of power are shut down regularly, and power control from architectural clock gating to low-level, fine-grain automatic clock gating is everywhere.
“That structural or architectural clock gating is really just another form of an on/off switch that says, ‘I don’t need to have everything on at the same time.’ Then the question is, there’s a whole software impact of whether I have to reload everything to restart or do I have things that can just be shut down and started again in context without having to reload the context. And there is a whole class of things that can do that. I see that everywhere, this block level shut down, but the dynamic voltage scaling. There were a couple big processor companies that really went deep into that, and you saw publicly Apple and Intel both get very deep into very fine-grained control or sweeping these things over to try to maintain battery life,” Dworsky said.
This is particularly hard at the system level. “The implementation of it is solved at the gate level and the subsystem level, and there are power controllers that can do a lot of this stuff for you,” Dworsky said. “There are coarse-level things being done, but I don’t know about dynamic shifting all over the place because there’s ramp up and ramp down times, and the whole question of running fast and stopping, or running slower and using the time available — these are scheduling issues. You don’t want to miss your slot or you can lose data, and that kind of thing you have to finish in the time allotted. It’s hard at the system level to know how to make that work.”
At the end of the day, managing dynamic power is always going to be a challenge, but tools, methodologies and approaches are maturing to enable engineering teams to achieve their goals across the board in power management. Now the question is whether enough people will actually use them to leverage more advanced power techniques.
I would say DVFS is fairly mature, but there’s just no way to verify it works with the likes of VCS. VCS-AMS would work if it handled Verilog-AMS properly, but it doesn’t (and nobody else does much better).