ast of three parts: Stacked die effects; hardware-software co-design; synchronizing models; working with UPF and CPF; what’s needed for testing.
By Ed Sperling
Low-Power/High-Performance Engineering sat down to discuss low-power design with with Leah Clark, associate technical director at Broadcom; Richard Trihy, director of design enablement at GlobalFoundries; Venki Venkatesh, engineering director at Atrenta; and Qi Wang, technical marketing group director at Cadence. What follows are excerpts of that conversation.
LPHP: If you are going to 2.5D and 3D, what’s the real benefit?
Clark: One of our time-to-market struggles is getting a chip in the hands of our customers so they can play with it and tell us what features they want and what features they don’t want. So we spin it and give them the final product. Unless we get something in their hands we don’t get they kind of feedback. We would be able to do a first tapeout that would work. We wouldn’t have to fight the new technology pain. It would save us a whole mask set—in theory.
Venkatesh: I agree.
LPHP: We need software to run on all of this stuff. How much of a problem is that, because that has power implications, as well?
Venkatesh: The place to solve major power problems is the hardware-software stage. We don’t know what the power and performance will be. If you go down to the architecture you can do the hardware-software co-design. There is so much power you can save by doing that. All the leading mobile companies are saving a lot of power at the software side.
Wang: Many people don’t have the resources to put into high-level design. But if you just look at the software itself you can save a lot of power. One example is that in both iOS and Android there are power-management APIs. But the silicon designer, as well as the software designer, may not be aware of those APIs, so they can’t take advantage of them. But how do you catch the problem? You have to run power simulation for the software. There’s a modeling issue. It also depends on what platform you use to run it. Simulation is too slow. Virtual prototyping or emulation are now becoming popular.
Clark: But for switching activity you still have to run that on your model because otherwise it won’t be accurate.
Wang: But traditionally people ran functional vectors to get at switching activity. That doesn’t have any way into the application you’re profiling.
Clark: You don’t know what the software will look like until you give the hardware to your customers because they’re going to think of stuff you never thought of. That’s an iterative process. We could optimize for that.
Venkatesh: You can get power intent for each lower-level instruction, but when you add a couple of instructions together they have different power. So you have to have a real model on an instruction-by-instruction, mode-by-mode level to the point where someone writes a piece of application software you can predict how much power it will consume. That’s where a lot of work needs to be done.
Clark: You can predict for operation, but how do you know how many instructions they’re going to use and in what combination? You could optimize for instructions 1 through 10, and it turns out people are using instructions 15 through 20.
Venkatesh: You have a power number for your instructions. Then you write algorithms for different combinations. Early in the game you can make those kinds of decisions.
Wang: The low-hanging fruit is having compiler technologies that are aware of those power management APIs at the operating system level. That can be done as a first step without changing your architecture or modifying the operating system. Just being aware of the power when you do the programming.
LPHP: Don’t the teams have to be restructured? You can’t just have a hardware team handing off the design to the software team.
Venkatesh: Yes, it needs to be co-designed.
LPHP: Are you seeing realignment in the supply chain because of the things that have to be done for power?
Trihy: It goes even beyond just the software. A lot of things we’re talking about here are the problems faced by early adopters. They’re designing as the process is coming up. If it’s more established, then a lot of these things are already fleshed out and there is more collateral and more experience. What we see is customers who are co-designing their chips while we’re designing the process. That, in itself, adds to the sheer complexity of this. Everything is moving so fast and moving concurrently.
Clark: And some things take longer than others. When we get our standard cell libraries delivered we have many different flavors that have to be characterized, because we build our own libraries with the technology models. The process takes three times as long as it used to. We’re already synthesizing and designing before the library is complete.
Wang: The same thing happens at the system level. You have hardware formed and the operating system. That’s why FPGA-based prototyping is growing so well. People want to get ahead on their system before silicon comes out.
Clark: There was a push for that in the early 1990s.
Wang: It’s not new, but it’s definitely booming now. There is a very significant increase in the last two years. One reason is time to market. You need to validate your platform before it becomes silicon. Another trend is that with the SoC-level simulation there is a lot you need to do. If you get a chip and you want to run logic simulation, you will never finish.
LPHP: In the past, process technology has always bailed out designers. Is the bag of tricks as deep as it used to be?
Trihy: The finFETs are a major change. You can get significantly less power or much higher performance. You have lower leakage than planar devices, and that’s going to be a game changer at the 14nm node. Those technologies are evolving. The foundry continues to look at tweaking as much as we can, but we face new challenges. At 20nm we have double patterning. A lot of our energy goes into working with EDA vendors to make sure their flows can account for new effects we see at 20nm and below. We’re trying to pre-solve issues.
LPHP: As we move into stacked die, we start getting lots of models—power, software, TLM. Are these models synchronized?
Wang: You need to cross your fingers. Things are getting very complicated. From a design and EDA vendor side, these models eventually will consolidate and a methodology will be created. Then there will be tools to facilitate that.
Clark: Every time there’s a new cost function like power there’s churn. There are a couple different models, and the different vendors have their own. It takes a while for the industry to converge on one. Liberty is sort of an industry standard.
Venkatesh: There is a lot of work to be done in modeling. You need good models at each of the design stages. Good is defined by how compact and how accurate they are. The other piece is whether they’re in sync with other models like the timing model. Along with modeling go the estimation tools. How accurate are they in predicting power? And if you’re predicting power, how accurate is that versus silicon?
LPHP: And that information has to go in two directions, right? It has to be measured in the models and at RTL and they have to be synchronized repeatedly.
Venkatesh: These models have to be synchronized with each other and down the design chain. There is a lot of room for progress here.
Clark: We’re struggling now with the UPF-CPF issue. UPF 2.0 is more compatible with CPF, but a lot of our tools don’t support UPF 2.0 yet. We’re at 1.0 or 1.1. We’ve been working on our own source code format that we translate as a common source into UPF and CPF so we have some measure of confidence the UPF and CPF match each other. We want to use one for implementation and one for verification.
Wang: Even within the same format you have pre-DC (Design Compiler) UPF and post-DC UPF. This model validation is a problem. We’re looking at tool enhancements to check the consistency of these models.
Clark: The initial UPF is from a high-level functional standpoint. But then you add test. How do you add test to UPF with the right intent?
Wang: A lot of times people talk about CPF’s format being different. Yes, the format is different, but a lot of times so is the methodology. With DFT, UPF and CPF both have a problem. You should be able to automatically abstract out DFT logic and compare the power intent.
Clark: And you don’t want the DFT to be on when you’re in functional mode. How do you verify that? By having our own internal format our RTL and system guys don’t have to understand UPF and CPF. If we can put it in a specification format so that we can extract the right information out of it, then we can get the system architects more involved in the power architecture details.
Venkatesh: A big piece in lowering power is power intent verification. You can have a lot of great techniques, but unless you can verify that—pre-synthesis UPF, post-synthesis UPF—you will have silicon failure.
Clark: And being able to review it is critical. If a system architect writes a spec and I translate it into UPF and CPF, they can’t read my UPF file.
Trihy: When you get down to the circuit level and you want to run a power analysis tool, are you even able to go back and validate it?
Venkatesh: There are two aspects. One is how much power it is consuming. The second is the power intent. If you have a block, how many domains does it have and is there isolation logic?
Clark: When block A is on, what is block B doing? Is it on or off?
Trihy: At some point you want to predict the power. But can you actually measure it today?
Venkatesh: That’s different from the power intent. At the gate level, you can have 10% accuracy. At the RTL, you can have 20%.
Wang: It all depends on the patterns. That’s why you run emulation.
Clark: Yes, for statistical patterns.
Wang: Accuracy depends on your abstraction level. But in addition to the power intent model, the power consumption model is still a problem. Typically you look at what’s happened in the past. If you look at power consumption of IP at the bit level, you have to model it at some higher level and take into account the bus activity. That is completely new. There is a lot of research work going on there. At the cell level, we have done a pretty good job. Liberty is pretty comprehensive and accurate.
Trihy: I don’t agree. In practice Liberty is not. For timing, we have so much higher expectations that with static timing analysis we will match the frequency we get in silicon. For power, we don’t.
Wang: If you have 10 gates and you miss one gate, you’re done. But if you have 100 million gates on your chip and you miss one gate, who cares? If you look at average or dynamic power, accuracy is very crude.
Clark: But power usually isn’t a showstopper. If you use a little too much power, the battery life might not be as good as you hoped.
Venkatesh: Or you drop an application.
Leave a Reply