Experts at the Table, part 2: What does a power model look like and how do you ensure software utilizes power control properly?
Semiconductor Engineering sat down to discuss power modeling and analysis with Drew Wingard, chief technology officer at Sonics; , chief executive officer for Teklatech; Vic Kulkarni, vice president and chief strategy officer at Ansys; Andy Ladd, chief executive officer of Baum; Jean-Marie Brunet, senior director of marketing for emulation at Mentor, a Siemens Business; and Rob Knoth, product manager at Cadence. What follows are excerpts of that conversation. Part one can be found here.
SE: What is a power model? Are we talking about instantaneous power, average power, total energy, thermal impact on the chip? Isn’t all of this part of a power model?
Ladd: It is all tied together. A power model gives you the power profile of what the IP is doing during a certain scenario. To be useful to a designer it has to show them time-based power. Cycle-based power will identify areas where peak power is too high. For example, we found a case where an idle state was consuming twice as much power as it should, but without the ability to look time-based and see when something is wrong, you have no way to auger into where the problem was.
Brunet: As an industry, there is no standard for what a power model is. This is a big problem.
Wingard: I love the vision, but I don’t know how to plug it together. So you built a power model for this block or for that block, but their behavior is interdependent. Do I have to run a full chip simulation to get a power model for this IP block?
Ladd: You have to understand how a block consumes power and for certain blocks you don’t care about it. They may consume so little power, but you need to auger into the blocks that are generating problems.
Wingard: What we know from performance analysis is that there is tight interaction between the behaviors of the components because they share resources and, in particular, they share memory. So the behavior of your component under loaded memory conditions may be very different than its behavior under an unloaded memory system. So you have to model the bulk of the functionality of the whole chip to get a time domain model that makes sense.
Bjerregaard: You are asking what is a power model, and that depends upon who is looking at it and what they are using it for. What you really want is something that goes across abstraction levels because the need for understanding power depends upon where you are in the flow. If it is total power or dynamic power in the physical back-end, you have power integrity issues, etc. So for the model to be complete, it needs all of those aspects. And then you get into the dilemma of how precise you can be for something to work at the highest abstraction level all the way down to the lowest abstraction level. When you get to the back-end, people have never gotten the power down enough. They always want another 5%, no matter what you did at the architectural level. We see that with implementation methods for physical back-end today. There is headroom, and a lot of that comes from timing. With advanced nodes at 10 and 7nm, timing closure is a huge issue and while it has always been an issue, it is getting bigger and it is costing power. A higher proportion of power is being used to close timing, and that headroom is up for grabs. You can fix it at the architectural level to make sure pipelines stages are balanced, but you can also fix it in the physical backend without having to worry about the architecture by scheduling events such that things become more balanced. Idle time can be found at all levels of detail.
Brunet: It would be nice to conduct a study about how many times you have delayed a part to optimize 5% of power at the gate level, when in reality your power is off by 5X from where you thought it should be. It costs millions of dollars to be late.
Kulkarni: It is not just peak power. If you have to look at billions of cycles—for example, Android boot is about 40 billion cycles—then how do you profile that? Let’s say you do a slice and dice in 1B cycle chunks, then using an RTL power profiler, you can see the slope and nature of power consumption and with that you can look at peak power. Another effect is the L di/dt effect, the rate of change which can mess things up downstream in the physical world in power grid design and cause hot spots. Those two are important parts of a power model. Input/output characteristics and the various states of leakage, dynamic power, time based power and the L di/dt power in terms of large blocks. AMD found that it is not enough to analyze power once and think it will be fine. Instead, they tracked the power profile over 4 months. Initially, their idle verses active had a shallow slope in the sense that idle consumption was pretty high. In their regressions they track the power profile changes on a daily basis and found newer areas where they could reduce the power. Over the 4 month they created a steep slope. So power, is not a one-time event. It requires best practices methodologies that need to be developed. There are several companies joining hands under the Si2 umbrella and IEEE P2416. They are trying to unify the way of describing the power model from transistor level and higher levels of model abstraction so that you can depend on the decision that you make. We find convergence of accuracy from +/- 15% down to +/- 3% from RTL down to gate level and final signoff. It has just been donated to IEEE.
Brunet: Is this just for logic or memory as well?
Kulkarni: For logic. Memory is to come later.
Wingard: How do you see that interacting with the power modeling put into UPF?
Kulkarni: UPF is a little higher level in terms of the domains and state dependence and level shifters at the architectural level.
Wingard: Do you expect that we could plug these models into UPF3 power models?
Kulkarni: Exactly. So 2416 will go into IEEE 1801 UPF, so there should be a merge between them. They do have a few proof points with some small testcases. It defines in a reasonable way what a power model should contain.
Wingard: The ECO point is well taken. We found that, as well. And nightly regressions should profile power so that if the developer would make a change, we could make sure that power did not change by more than a certain amount otherwise it would be considered a regression failure.
Kulkarni: We also came across a study where they talked about an active gaming device, especially those with GPUs, where the game stops but certain things remain and consume power. They called it the idle tail – things that get left over. We have seen this in IoT devices and gaming devices and HDTV.
SE: We are putting power control into designs and hoping that software will use it correctly, but it never does. What are we learning as an industry in terms of power control strategies? And do we have to remove this from software?
Knoth: The problem is not localizing and segmenting who will control power. It is democratizing it and making sure the software guys are at the table. You can’t just say it will all be done in hardware and make it software agnostic. You could, but it would mean that you are leaving opportunity – big opportunity — on the table.
Wingard: Power control is no different than any other functional elements. Those of us who use personal computers have tons of hardware that Microsoft still hasn’t found out how to turn on yet. It is multiple generations of CPUs before they finally write software to turn on a new function. In many ways power control is like that. There is a vast set of chips that have power control features, and yet no software has been written for them. The models and the chip people may have assumed it was turned on, so we do have a disconnect between what we are modeling and what is often actually done. We can do better, and we can build chips that close the loop in hardware without taking away the opportunity for software people to make improvements. We can incorporate identification of opportunities to save power, the sequencing of the power state transitions and the detailed control circuits to manage di/dt, inrush. We can put that all into hardware. We can still have software interfaces, but that becomes optional software and the base platform just runs out of the box.
Bjerregaard: That is required because if you expect the software designers to actually bother about hardware, they don’t. We can put all of these smart features in and it will never reach the surface.
Wingard: The market where it is most obvious is IoT, because right now IoT is mostly about adding wireless capabilities to microcontrollers. You have to look at the software deployment model for the average microcontroller. The microcontroller user has no expectation, and the microcontroller company does not have the infrastructure for delivering the Android drivers because it won’t run Android. In many cases we have no choice but by raising the hardware abstraction to make many of these things automatic.
Bjerregaard: IoT represents an opportunity to get software designers to care because power is so important and it is such specialized functionality that they will walk that extra mile.
Wingard: That is what I mean when I say you can’t take the ability away – there will be smart companies that will take advantage of that.
Bjerregaard: They will win based on power.
Knoth: But have we provided the tools for software engineers to be able to make sure that power management is working effectively?
All: We are getting there.
Wingard: It is a multi-layered problem. There are things that you want to do during development.
Brunet: Sometimes there is just one software guy.
Wingard: Yes, for IoT it sometimes will be a small setup.
Brunet: For the big mobile guys it is very different.
Wingard: So if IoT is driven by the smart phone guys it will look very different than if it is driven by the microcontroller guys. The mobile guys have millions of lines of software code that they deliver with every application processor they ship, or every baseband processor that they ship.
Brunet: They have to realize they have a problem. Otherwise, they don’t change methodology.
Knoth: They have already been beaten into power being an important part of the product. It has already been done there.
Kulkarni: Even on the IoT side, when we see ADAS designs, these are still IoT devices but it is getting to be a very complex device. A car may have 100 CPUs and full cameras working at 150C worst case and -40C. This is worse than military range and they want to do power profiling. We are also talking about variability. They are not at the 28nm node. They are looking at 10nm and 7nm.
Knoth: There is also the safety critical aspect. Power is not just battery life. It is reliability. It is accelerating aging. These have a huge impact on the overall fit of the device. That raises the stakes. You need to manage it, model it.
Leave a Reply