Reaching For ROI

Calculating the return on investment for power and performance is getting more complicated.


The simplest way to assess power and performance ROI of a chip design is to ask if the chip works and whether it meets the design specifications. But chips can be used in very different ways, and a single chip may have a number of operational modes, so that formula isn’t so clear anymore.

“Preventing failures is the No. 1 priority when it comes to ROI,” said Aveek Sarkar, vice president of product engineering and support at Ansys. “If a chip burns out in four years instead of six, that’s expensive. There can be a lot of reasons for that. It may be that when you’re designing a chip that goes on a PCB and you put the temperature sensors in the wrong place and the board warps so that it no longer maintains electrical or mechanical connection. You have to replace the board in that case, so it can move beyond the chip.”

Beyond making sure that chips work, there are no simple answers or clear-cut approaches. Much of the focus is on analyzing power and performance early enough to make changes, with as much context about how a design will be used as possible. Once the design reaches tapeout, those changes are costly and much more painful to make.

Jean-Marie Brunet, marketing director for Mentor Graphics‘ Emulation Division, observed that over the past three or so years, a shift has been underway among verification engineers with respect to power due to two factors. First, Moore’s Law has enabled more transistors on the chip, and combined with SoCs getting bigger, this requires an understanding of static power consumption as well as dynamic power consumption for every transistor. Second, with more advanced nodes and the move to finFETs, dynamic power has increased significantly and is now a major design consideration.

Those two factors have created the perfect storm for power. For companies developing mobile or multimedia SoCs aimed at consumer markets, there is a requirement for multiple I/O interfaces and protocols used within the context of an operating system that has to boot and run a live application. Phones, tablets and PCs all have the same profiles that requires chips to be tested in a live application concept—booting the OS, issuing a sequence to the chip of a boot operation, and starting up the first level of firmware. These chips can contain hundreds of millions to billions of transistors, and require a significant amount of simulation or verification cycles. Moreover, they need to be looked at in the context of how they will be utilized.

“If you compare this versus 10 years ago, or even 5 years ago, they didn’t have to worry too much about this,” said Brunet. “Chips were smaller, in bulk CMOS everybody was talking about standby current much more than dynamic current, and we didn’t have that explosion in applications. Therefore, chips were verified by using traditional testbenches for power.”

Brunet points to the need for a benchmark reference in the future, an environment that contains the minimum subset to verify how my chip will be utilized, all in the context of an emulation platform.

Most engineering teams don’t have access to all the tools and capacity they need to do these analyses, however. Simon Davidmann, CEO of Imperas, said engineers always will simulate whatever they can using the equipment they have. “If your technology only allows you to do power analysis on 5 million cycles, then the only approach you can do is try and find a representative 5 million cycles. If you could simulate 50 billion cycles, you wouldn’t simulate 5 million, you’d simulate your whole boot, your whole phone call, your whole everything. But the fact of the matter is it’s all too slow, so the only thing you can do is analyze tiny little bits of it and hopefully you find representative bits.”

One important part of this analysis is an understanding of how the chip will be used. “One of the key things that we’ve seen is that it’s all about the software running on the chip. This has a dramatic effect on the extremes that the power and performance can have. Yes, the architecture of the chip is fundamentally important. You have things like MIPS multi-threaded processors with virtual processing elements, and ARM has the big.LITTLE configurations so you can switch processors on and off. There’s all these hardware technologies to try and make chips more efficient for the performance and power that you’re targeting. But what’s the software doing? How is the chip going to actually be used? How will the software use the hardware? The use scenarios and modes of operation of the software components are becoming more and more important, and they’re having a huge impact on the performance/power of the chip.”

Davidmann observed that a lot of people are starting to understand that the dynamics of the software does have an enormous impact. “It’s changing, and we’re at the frontier of this now. I don’t think everybody’s got all the solutions yet, but the process the industry is going through is to try and understand how they can analyze the dynamic nature of these chips. The tools and the methodologies being used for power and performance are still evolving, and there needs to be more integration between software and the hardware.”

Further, he contends the current approaches of using cycle-accurate simulators and the expensive hardware emulators are far too slow and far too late in the process because they require the RTL, as well as being difficult to use.

That view doesn’t jibe with the strong emulator and specialized simulator sales, though. Frank Schirrmeister, senior group director, product management in the System & Verification Group at Cadence, maintained that a lot of ROI comes from doing analysis much earlier and bringing in the data for activity analysis. “Bringing in emulation for all of this is what a lot of engineering teams do because they essentially run the power analysis this way. The ROI is hard to pin down because you really don’t know what the R is until you really have a bug and had to recall some chips.”

Krishna Balachandran, product management director at Cadence, agrees that the simplest answer to boosting ROI is starting early. “I was at a customer just last week and they said they had silicon and they were estimating power numbers very early on using a spreadsheet method, but unfortunately what turned out in silicon was much more pessimistic. They didn’t meet their power spec versus what they had estimated early on. This type of estimation worked a bit in the past, but they are now facing a problem where power estimation with spreadsheets is no longer working. If they designed their chip, ran the tools at the end, and it said the power was much higher than what was predicted, that’s a problem. If the power is lower than what was predicted, it’s also not great because they could have traded off that power for higher speed, or better area. But at least the chip has met the power target. When they taped out they knew what the performance and area was, said it was okay and taped it out. But now, if the power number is more, then all of a sudden you have a problem. You can imagine a chip supplier company talking to a system company. The system company already has gone and made certain plans and specs for their system, and if the power doesn’t come in within that limit for this particular chip, then that system company has to adjust that power somewhere else and has to squeeze the power out somewhere else so they can stay competitive.”

In a situation like this, the chip design company could lose credibility with the system company buying their design.

“Obviously you cannot get it exactly on the dot, but getting the power estimation right is becoming extremely critical,” Balachandran said. “A lot of the effort is spent up front to make sure that the power estimate is good, and once you get the power estimate, and once you see you have hot spots/problem areas in your design, to go about and alleviate those at the early stages — at the RTL stage — as much as possible.”

Location, location, location
Understanding design issues early is a well recognized goal. How to get there is not.

“The way to get the best ROI is to get the most information about the design choices as early in the design process as possible because there you have the opportunity to make the biggest optimization, whether it’s for function, performance, or power,” said Drew Wingard, CTO of Sonics. “At the architecture level you can make the biggest difference.”

There are many people who argue that accuracy is needed for this estimation. Wingard argues that only comes from implementation. “This is about the degree of accuracy to make good architectural choices. RTL offers almost perfect accuracy but is available so late that you can’t use it for a new design. However, it’s a pretty well-trod path to take advantage of what you learned from your prior design. The total number of actually new designs is so small as to be almost insignificant. Even new generations of platforms leverage the heritage of the prior generations. The real question is, ‘How difficult is it to make abstractions from models you can extract from older designs?’ The people who are really good at doing system architectures are quite good at this. The challenge with doing using something like an emulator is that there is nothing abstract about an emulator. You have to have the function correct in order to do your performance model, or to do your power model, and that’s a pretty high bar to get over. I would rather do things even further forward, even with a bigger abstraction. But if I want to do power modeling, I need to know the behavior of my decoder running different bit rate streams. Are those numbers going to be precise? No, they’re not. In fact, they’re not even precise for RTL.”

The reason is that most engineers don’t know what inputs will be used. “You end up making choices that are based upon a nominal value with error bars around it,” said Wingard. “And as you start to put together things with error bars, sometimes the error bars accumulate.

Wingard said that at an architectural level, it’s rare to make a decision based upon something is 2% better than something else. “We are so far away from being that optimized in most of what we do from an SoC architecture perspective today that if my numbers are right within 10%, it’s probably good enough for everywhere but memory. Being off by 10% on memory throughput is a big deal, but for most of the rest of the system, I can make these choices with a coarser comb. So the challenge is just how do I bring forward the information from old designs.”

And that, he added, is the best place to invest—taking existing blocks and characterizing them for power and for traffic characteristics, so that when the SoC is going to be integrated there are traffic models for these different things based upon how the hardware will really behave in different modes. And when the power modeling is being done, the power dissipation of these blocks must be understood in their different modes of operation. After that it’s pretty easy to assembly those higher-level models.

To improve ROI in power and performance, engineering teams need the right methodology and tool solutions, ones that are fast enough to run lots of different software use scenarios and available early enough to have an impact. And they need to be accurate and current enough to be effective.