Last of three parts: What’s missing; optimization vs. analysis; challenges of mixing old IP with new IP; the increasing value of high-level synthesis for power estimation beyond 45nm.
Low-Power Engineering sat down to discuss the advantages of dealing with power at a high level with Mike Meyer, a Cadence fellow; Grant Martin, chief scientist at Tensilica; Vic Kulkarni, senior vice president and general manager at Apache Design; Shawn McCloud, vice president of marketing at Calypto; and Brett Cline, vice president of marketing at sales at Forte Design Systems. What follows are excerpts of that conversation.
LPE: What’s missing from our tools arsenal? If you have enough experience you get a feel for what works, but how does the average engineer get there?
Martin: You need to build some tools that can actually do high-level system bookkeeping. That’s really what it’s all about. When you have a power estimation or energy estimation, it produces traces of activity that can feed into an overall system model. Various IP models could all be calibrated to send out that information. People could run scenarios and estimates that way. It still seems to be missing. I recall talking to people about this two or three years ago.
Kulkarni: The industry seems to be doing a lot of optimization techniques in ESL, but not a lot of analysis techniques. We have been doing that for several years at the RT level. A lot of us can do RTL power optimization. But from an analysis standpoint there are very few who can do it. It takes so long to be accurate about how to stimulate RTL without having a synthesis engine underneath. ESL synthesis experts still don’t have the analytic engine for ESL. That’s a missing piece along with ESL power models.
McCloud: We have very good point tools out there today. We have good HLS tools doing very complex, high quality hardware accelerators. We have some pretty good power optimization, power analysis, power optimization and integrity tools. What’s missing is a way of productizing the integration of all those tools. It’s standardizing UPF and CPF and getting that propagated through the tools. And then what’s missing on the TLM side is better standardizing around annotating the power and getting the right fidelity to the TLM models. That’s one of the primary purposes of the TLM model—to be able to run real software on it and get accurate estimates. What we need to do is put these good tools together.
Meyer: And we need to recognize that it’s not just an issue of hardware vs. software. There’s also the issue of multiple cores. People have the choice to have some stuff run in a high-speed core and other stuff run on a lower-speed core. Those types of decisions don’t come easily. There’s a fair amount of effort you need to put together a prototype so you get an idea of what the power is if you have two cores vs. one core and an accelerator. There’s a lot of modeling that has to be done to come up with an answer there, even with very early estimates.
Cline: In the case of people with 30 years’ experience who have to trade off between one core, two cores or four cores and the custom logic that goes around it—there are very few of them. You won’t sell a lot of tools into that market. Most of the people with 30 years of experience are grinding RTL everyday or writing software. For EDA vendors, you get paid for optimization with an extra zero vs. what you get paid for analysis. It’s the way the world works. If you have a tool that squeezes out an extra 10% at the end of the process, you get paid for that—especially if you’re putting out a fire where you don’t meet timing or something else that’s critical. What’s the time value of a week over the course of a project? At the beginning of a design a week isn’t worth anything. At the end of the project the value of a week is huge. If you sell a product into the last week of a project and it squeezes out another 5% or 10%, you’re a hero.
Martin: On the other hand, the decisions you made in those first few weeks may cost you downstream. There are a lot of design teams out there where it seems there is very little accumulated experience and they’re confronted with these problems and very rapidly trying to do design. Stepping back from the tools and the analysis, you have to ask, ‘What is the overall design methodology?’ Do you have that experience taught to design teams to make choices about how many cores, what kinds of cores, what kind of hardware blocks, and do they all fit together? That architectural expertise has been gained by years of experience.
Kulkarni: It’s almost as if power is where timing was 15 years ago in terms of the knowledge base.
LPE: Except that the people who know timing now need to know power, as well, right?
Kulkarni: Yes. It’s not just mobile or cloud computing. Everything is focused on power, from disk drives to memory. Customers that have 65 watts per chip want to move to 60 watts. They want to move from 5 milliwatts to a few milliwatts. The knowledge of power is limited, though, in terms of power management, power optimization, as well as certain decisions that have an impact downstream. How do we, as an industry, get power analysis and power decisions to be pervasive? Even from my own company we have not given a complete recipe for how to do RTL to GDSII design. It’s in many people’s heads. The end customer also changes their mind on the fly, but at least in 80% of the design, if we can all produce a recipe book then we all benefit. UPF and CPF have started that work already. But when you go to a customer, typically they have some used IP and some new IP. UPF/CPF may apply to the brand new IP but not the old IP, so mix and match flows are another challenge. How do you make sure the previous design worked and certain parts of circuits work in the new design?
LPE: What you’re talking about is flexibility in modeling, right?
Kulkarni: That’s correct.
LPE: HLS has been around for more than a decade and it still isn’t mainstream. Will power force a change in perception?
Cline: It’s certainly going to be a factor. But what drives it is the ability to get your job done in the right amount of time.
Meyer: Yes, it’s time to market.
Cline: It’s also time to results. You have to hit some sort of metric for your results. I was just in Japan and the concern there is how they’re going to get $5 per chip. To do that they need twice the number of features and speed. So pick your favorite HDTV company. If it’s not Visio, then Visio is undercutting them by 50%. How do they get their chips to where they’re competitive?
McCloud: I wouldn’t underplay the potential significance of power becoming a critical factor for people adopting HLS. We’re just now scratching the surface in terms of what we can do in HLS. Memories are consuming 60% of the power in a typical HDTV. There’s a whole slew of memory optimization we can do around the way we slice the memory, around memory-enabled gating, light-sleep mode, deep-sleep mode. Those are things that are perfectly suited for an HLS tool, which has a detailed understanding of the state in the design of the data path. The tools already have sequential and combination clock gating. But as we start to go past 45nm, it’s not just the battery life. It’s also thermal and power integrity issues that become critical. That’s when HLS will really become close to a requirement.
Martin: Any technology that lets you explore the design space for different alternatives, whether it’s HLS or configurable processors, if you’re keen on performance you can examine more alternatives more quickly. If it’s power and it really does let you explore that space, that’s also key. If it’s just within a few percentage points of what you do in RTL by hand, that’s not going to drive that market. It has to offer a wide margin.
Cline: It’s also what’s called ‘change and check.’ You can change something very quickly and check what the results are. We see a lot of engineers that need to change a number of things and then check them, and then change and check more. You can’t do that in RTL. It gives them a whole other set of options.
McCloud: One of the things we really need to do is shrink-wrap the methodology. Then the smaller companies can pick it up and run with it. Right now it takes a big company to put that investment behind it.
Cline: I agree, except that the very small companies use it because they can’t get the funding to go build a $50 million chip. The middle of the curve are the guys who can’t move just yet.
Kulkarni: The ideal solution for designers would be HLS for optimization of power, timing and area, and then quickly checking against the RTL power analysis. The reason is that at the RTL level you can capture physical effects. That becomes a linchpin from the physical world to the ESL world. But you also have to have good power models.
LPE: Don’t you also need more standardization?
Meyer: You certainly need to be sure that you’re not double counting. A lot of times you’re modeling more than just the software, and you’re trying to estimate the power there and you’re starting to include what’s in the memory and the cache. And then you start looking at the cost somewhere else, and you’re adding the cost in again. You have to have a way to say, when you aggregate it, how do you make sure it’s not double-counted.
Leave a Reply