Increasing interactions and complexity require more tools and cross-domain techniques.
Low-power design in advanced nodes and advanced packaging is becoming a multi-faceted, multi-disciplinary challenge, where a long list of issues need to be solved both individually and in the context of other issues.
With each new leading-edge process node, and with increasingly dense packaging, the potential for problematic interactions is growing. That, in turn, can lead to poor yield, costly re-spins, and field failures. As a result, architects and designers are examining possible interactions and various use cases much earlier in the design process. And they are looking for ways to manage increasingly complex designs, which can introduce a variety of bugs that may not show up in isolation.
“Many power management techniques, including multi-voltage power shutdown, can add significantly higher complexity to the design because it actually shuts down part of the operation of a design,” said Renu Mehra, R&D group director for the Digital Design Group at Synopsys. “As all the different parts of the design are talking to each other, it is easy to send a corruption from a dead part of the design to other parts of the design. This means we need to be very careful that we have properly isolated those pieces that will be shut down, so that the other pieces that are live are not getting corrupted. It’s very important to make sure that this is working well right from the very beginning. Also, it’s important to have a complete power intent right in the beginning before running simulation. Most simulation tools natively understand UPF and power intent, and are able to synthesize shut down parts of the design simultaneously with the live parts.”
Peter Greenhalgh, vice president of technology and fellow at Arm, agreed. “Certainly, low power design can introduce bugs,” he said. “Clock gating is one example where you want to be as aggressive as possible to save power, but risk being too aggressive in disabling the clock and creating a functional bug. Fortunately, there’s no verification difference between an overly aggressive clock gate enable and any other functional bug, which means standard verification techniques are sufficient to catch clock gate enable bugs.”
It’s not that existing design tools don’t work. It’s that more of them are needed at precisely the right time, and design teams need to be aware of, and have access to, all of them. Static electrical ruler checkers, for example, are necessary to examine electrical rules and make sure that electrical rules are not violated.
“If you’re going from one voltage to a different voltage, check if a level shifter has been specified in between,” said Mehra. “Or if you’re going to be using retention techniques to, say, retain the state during the off periods, some of these retention techniques might be more complicated than others. There is a very efficient way to implement retention using zero pin retention cells, but if you have that, you also have to check that the clock line and the reset line are properly isolated for this kind of retention use. This static check is a very important part so that you get the design right from the very beginning. The earlier the intent is provided, as the design goes through implementation, the better it is. Some people might provide the intent, such that it is equal to x, y, or z, but that will not accurately implement exactly the same thing that simulated. This is because synthesis is going to do many optimizations like current propagation, and the structure of the circuit might change. This means the intent that you wrote, if you’re going to use it in implementation after the synthesis, is complete. But when you use it in the place-and-route stage, it might apply slightly differently and might not exactly implement the intent that you had in mind. The recommendation is to specify it right up front.”
Rob Knoth, product manager for digital implementation and sign-off at Cadence, views low power techniques like a layer cake. “This shows up in many places, not just low power,” Knoth said. “Design for test is another area, along with a growing importance of design for safety, where there are the same kind of concerns. In all of these areas, architects and designers have to be aware of how the design is being modified, at what point the design is being modified, and how the design is being verified. In the early days of any of these technologies, whether it’s design for test, design for safety, or low power techniques, all of the techniques were very manual, or, scripted, unique to each customer doing the work. There, you really had to reinforce so much more on the verification side, such as running functional vectors, dealing with it in terms of formal verification techniques, etc., and those sort of verification techniques are still what’s absolutely required today.”
Once techniques get a little more mainstream, automation replaces custom scripts. “With low power techniques, while it may have started with some really advanced users who are doing some of the first mobile products, now it’s mainstream,” Knoth said. “This is the foundation of the layer cake. It has gone from being custom to being pervasive through things like power intent. You’re writing your RTL, but you’re writing it almost 100% functional, such that, ‘This is functionally that my thing has to do,’ and so the verification is all centered around functionality in general. But then the EDA during implementation is taking that power intent, automating the insertion, the modification, and you’ve got things like power islands, always-on cells, isolation cells, etc., and the modification of the design is now more predictable, more tested. There’s unit testing going on, as well as software and flows, before it even reaches your design. And while you’ve got a little more confidence, you still have to do things like formal verification, all the low-power checks, checking with the power intent file, etc. Another part of the process includes functional tests to make sure the RTL is functionally good, where the low-power check makes sure that any of the newly inserted code during synthesis and place and route is obeying what you intended it to do.”
The next layer is the RTL. If it isn’t power efficient enough, what else can be done? Some of this is obvious, such as where clock gating or memory usage is not efficient. From there, you edit the RTL and climb up the stack.
“Low-power formal verification isn’t going to help you, but functional verification can,” Knoth said. “This is where things like sequential equivalency are absolutely bread and butter once you start adopting them. It’s a technique that a lot of pure RTL designers are familiar with, but if you’re going over the wall to someone who’s used to getting RTL as an input, and they’re used to just doing synthesis and place-and-route, they might not be aware of this. More and more the lines are blurring a bit about who owns the RTL. This is a critical piece, because that’s free power. It doesn’t require you to do a bunch of architecture with always-on cells of where you’re going to shut things off. It’s just, ‘Here’s my design, run the deep analysis based on functional vectors.’ Where do the tools point out that you could have saved more power? That’s a big growing area.”
At the very top of the stack is where the newest and widest range of techniques for low power are found, which is pre-RTL. “This gets back to the system architecture, and the person who is working at the C code level or the MATLAB level,” he said. “There’s a much better and a much more exhaustive set of tools out there now that let you do things like high-level synthesis. With a traditional high-level synthesis regime, you take the C code, you put it through high-level synthesis, you spit out RTL.”
Whether it is chip or IP, the same rules apply. The challenge is understanding the potential interactions of all the various pieces at once.
“One of the most common things that you do to control power is use clock gating techniques to either slow down or disable pieces of the IP or the chip that aren’t used,” said Matt Jones, general manager of IP cores at Rambus. “As you slow things down, and run at rates that are not just opening up the pipe and letting it blow and go, you introduce this notion of both clock domain crossings as well as the startup and shutdown cycles that go along with it. Clocking is not simple. The higher the speeds get in terms of clock speeds and data rates, turning them off, turning them on, and playing with them certainly does open up some design complexity. Getting that right is a definite art form.”
Power management techniques like clock gating and other approaches create design complexity, which in turn increases the chance for errors, particularly with blocks or entire chips moving in and out of various states, and with components aging at different rates. Different companies handle this differently, but in general issues need to be dealt with at both the system and the component level, and design teams need to span both worlds. Memory choices and strategies, such as in-memory and near-memory compute, for example, may save significantly more power than an array of low-power techniques.
“If you’re working with a 5nm device, the less flash memory you can do at that node, the more you can save on SRAM,” said Sandeep Krishnegowda, senior director of marketing and applications for memory solutions at Infineon. “That reduces the power, as well, on the SoC. So we look at all these different concepts and say, it’s not just about a 10X improvement in performance. It’s also about a 2X or 3X reduction in power, evolving to higher-performance, non-volatile memories for direct code execution.”
Pushing the limits
Put simply, the number of tradeoffs design teams need to make to make these systems, and systems of systems work, is increasing. All of that begins at the chip level.
“The industry has been on this path for a while with dynamic voltage and frequency scaling, and it’s this interesting idea of, ‘Maybe I can drop the voltage until I can’t anymore. Maybe I can change the speed until I just can’t anymore,'” said Steven Woo, fellow and distinguished inventor at Rambus. “It allows you to ride closer to the edge of reliable behavior. What also plays into all of that is the fact that different process corners of the silicon will behave a little bit differently, which means there are more physical effects that matter including what the neighboring pieces of IP are doing, since they can introduce noise that can impact the voltage margin if your voltage isn’t high enough.”
Noisy neighbors can start affecting the correctness, as well. “The challenge in some of these techniques is physical effects become much more important, and what’s going on around you becomes much more important as well,” Woo said. “Disciplines like signal integrity and power integrity have to be well-understood when you’re using these techniques where you’re trying to get much lower power through voltage manipulation and frequency manipulation. What I’ve watched over the last couple of decades is that the chip design and architecture has moved from being in some ways isolated, or insulated a bit, from the physical realities of what’s going on in the system. These days you have to be so much more aware of what’s going on physically in the system. So it’s understanding the voltage, noise, power of elements around you, and the dynamic nature of how voltages may move. Power integrity is really important along with signal integrity. Chip architects and system designers have to become much more aware of the physical environment they’re operating in.”
Dynamic and leakage power
Power needs to be addressed on multiple levels. Leakage current became increasingly problematic from 40nm down to 16/14nm, when the introduction of finFETs solved the problem for a couple of process nodes. But leakage has been steadily increasing since then, requiring a new gate structure to control static leakage below 5nm. That leakage continues to sap batteries even when a device is powered down, and it can increase the heat generated by a device.
Dynamic power, in contrast, has become steadily worse at each new node due to increased density. Heat needs to be channeled, and logic often needs to be designed in the context of other nearby components, sometimes using a checkerboard type of approach to prevent a variety of physical effects.
“In older technology nodes, active power is more dominant, so all these clocks where the data is not active can be gated — and there are a lot of techniques for doing clock gating,” said Mallik Vusirikala, director, product specialist at Ansys. “But with the move into technology nodes where leakage power is more dominant, we started using power switches. Instead of just gating logic, we were gating the whole power. With each of these techniques there are different kinds of issues.”
For clock gating, logical checking is needed. “But from a power supply standpoint, what happens if you suddenly ungate a huge amount of power with the clock gate? If the clock is controlling a huge amount of logic, and suddenly it turns off, that means the sudden demand of power comes in,” Vusirikala explained. “How do you control that type of power demand? The power has to come through a battery, and there’s a package that sits in between. This means there’s a lot of inductance. The nature of an inductor is that it will not allow the rapid change of current. This means the huge amount of current demand on the chip needs to be satisfied well enough so that it shouldn’t cause a dynamic voltage drop. To this point, there are ways to analyze the transient nature of the power supply using analysis tools that measure the amount of voltage drop a particular ramp up is having. The design feedback to that is you cannot ungate so many clocks in one go. You might have to phase it out. You might have to look up like 20 cycles ahead, and then you slowly start ungating the clock.”
This situation can benefit from power transient analysis, which changes the power and analyzes the voltage drop. This analysis can be done in a variety of ways. The analysis can be done at any instance level to determine voltage drop impact. At a package and die-level interface, tools can be used that model the whole chip using a SPICE sub-circuit kind of approach. This involves changing the power to see the effect of power supply noise, which involves the package, as well.
Once a power switch is turned on, there’s a huge amount of uncharged domain that starts getting charged up, Vusirikala said. “This means there’s a huge amount of current that needs to go in to charge the whole switched power supply. This again means the switches have to be turned off in sequence, and different kinds of transistors will be used to turn on. Here, weak switched transistors can be used while the block is turning on. But once the block is turned on, it has to run at speed, which means its current demand is much, much higher, so there will be different strengths of transistors used while a particular block is ramping up, and while it is functioning.”
Determining the optimal sequence of power switches is yet another type of analysis that should be done to avoid problems in low-power design. There are ways to analyze how many switches are needed, and the sequence in which the switches need to be turned on. Once they are turned on, what is the impact on other instances? This has to be carefully analyzed because it drives the cell placement. How many switches are being put in, and what is the sequence in which the switches are turned on.
Logical checks also need to be done to determine the impact of power supply noise and how best to implement the power. “If there is a signal coming from a switch domain to another switch domain, isolation cells are needed between power rails, because if the gate of a terminal is hanging, it’s coming from a switched power supply and it is switched off,” Vusirikala said. “If it is driving a transistor in a functional state, the gate comes into a transient phase and it will leak heavily. To account for all those things, there are logical checks that should be done when there are signals crossing from one power domain to another power domain. Do you have an isolation cell or a level shifter? The level shifter is another key aspect of low power design that must be accounted for. If a signal is being driven from, let’s say, 0.8 volts to a domain of 1.2 volts, a level shifter is needed because if the level shifter is not there, the threshold value of 1.2 volts will be different from a threshold value of 0.8 volt so it can drive both a PMOS and NMOS stacks to be on at the same time. These are the kind of checks that ensure proper isolations, and proper level shifters are added across power domains at the signal entrance.”
Conclusion
As designs chase after lower and lower power, techniques have become much richer.
“There are two angles to this,” said Cadence’s Knoth. “One is in the pre-implementation phase, where you can audit your power efficiency, and we can tell you which lines to edit to recover X number of milliwatts. The other angle is for an RTL reuse situation, where you’re not allowed to edit the RTL. The beauty here is the same tool that’s doing the power efficiency improvements and that guided power reduction is integrated with the synthesis and place-and-route environment. That way, during the implementation process, some of those things that would have required you to edit the RTL can be automatically applied. Formal verification can be critical here because it can do some of this in an automated way. You don’t have to edit the RTL, but you do still have to dot the i’s and cross the t’s; you still have to make sure it’s formally equivalent.”
From a technology perspective, this is a big one, Knoth said, because the bigger changes that are made will dramatically lower power, particularly energy. “The big energy wins are an architecture play,” he said. “They’re not achieved by sprinkling some high VT cells in here. We’re talking about changing the clock rate, or going from fixed point to floating point, or how wide does a certain bus need to be? Big, meaty things that make a major difference when it comes to power and energy.”
Related
Low Power High Performance Newsletter
July 2021
Designing Low Energy Chips And Systems
Orders of magnitude improvements in performance and efficiency are possible, but getting there isn’t easy.
Power Optimization: What’s Next?
Clock gating and power gating were a good start, but there is much more that can and should be done to minimize power.
Reducing Power Delivery Overhead
Pain vs. gain in optimizing the power delivery network in complex chips and packages.
Leave a Reply