Getting The Right Return On Invested Power Consumption

Power is limited, and the best way to deal with it is by using early and accurate power analysis.

popularity

Three weeks ago, I participated in a panel on low power and modeling at the system level. It took place at DesignCon 2015 in Santa Clara, together with representatives from AMD, Avago, and Qualcomm. Interestingly enough, it gave me the opportunity to set some of the myths and dis-information about power consumption in emulation straight, but more on that later. The panel was moderated by Steve Schulz, who at the time was with Si2.

Steve emphasized that power has been an issue for some time now in all types of devices, from mobile gadgets like cell phones, tablets, and wearables, to tethered devices like servers and routers. He discussed how power is limited and the trend is not good across the categories of battery limits, supply limits, and thermal and reliability limits. The reason for putting the panel together was really driven by the need to discuss how to support effective project decision-making with early and accurate power analysis.

The way Steve set up the challenge took me on a brief trip through a personal memory lane. He compared the trends for transistor counts, single thread performance, frequency, power consumption, and number of cores per chip. I remember that in “The Free Lunch Is Over: A Fundamental Turn Toward Concurrency in Software”, he argued the imminent challenge so successfully, that I changed from a low-power synthesis company to a multicore optimization company.

Looking at the trend in this updated chart (see below), Herb Sutter’s predictions were right on. Multicore and the associated software helped address the low-power challenge, allowing transistor counts to further increase while top frequencies and power consumptions flattened out.

frank1
“Data Processing in Exascale-Class Computing Systems”, Chuck Moore, AMD Corporate Fellow and CTO of Technology Group, presented at the 2011 Salishan Conference on High-speed Computing, Original data collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten, dotted line extrapolations by C. Moore

Brian Fuller already has written about the actual DesignCon panel, but just to re-emphasize one of the points I advocated during the panel: We are now going from PPA, (i.e., power/performance/area tradeoffs) to PPTA, (i.e., power/performance/thermal/area tradeoffs). I referred to an article that talked about thermal issues throttling performance. Specifically, if a design runs into thermal issues, the AnTuTu score — a widely adopted performance benchmark — may suddenly drop because the chips protective measures kick in. Later, Ed Sperling and I had a conversation about thermal issues in the context of changing reliability definitions as well.

Another point I made was about user-required features that cause specific power consumption—the power ROI, so to speak. We discussed how the continued desire for better graphics, more apps, and better computation stresses the power consumption in mobile devices. This was a good opportunity to set the record straight on another power consumption issue—the one required for emulation. The vendors for FPGA-based emulation keep pointing out that Cadence Palladium emulation requires more advanced cooling. It is somehow construed as a design mistake we made. Well, the truth couldn’t be farther from it. The power decisions about the Palladium platform were deliberate. We even have customers telling us that they would happily have Palladium consume more power in exchange for further improvements in features that are enabled by it.

The analogy I used during the panel was one we all know. When the iPhone came out, it entered a market in which the existing players had arrived at two weeks operation time on one battery charge. How could the iPhone revolutionize the market with a charge of less than a day? It enabled something unique—something otherwise not available in the market. The same is true for the processor-based emulation capabilities in the Palladium platform. When the requirements were set, fast bring-up time, optimized gate utilization, simulation-like debug, efficient memory-to-gate ratio, and fine granularity with multi-user access were deemed most important.

As a result, FPGA- and processor-based techniques were considered. And by the way, in the whole emulation discussion it is often overlooked that we do have both—the Palladium platform for processor-based emulation and the Protium platform for FPGA-based prototyping. Emulation users expect an automated flow. If it works in simulation, they will call you up if it does not work in emulation. Users of traditional FPGA-based prototyping are used to months of manual modifications to optimize the RTL to run it in FPGA. That’s where the speed comes from. It turns out that, when comparing processor-based emulation and FPGA-based emulation, we today see the following benefits:

  • A gate is not a gate. Period! When using a processor-based emulator such as the Palladium platform, if the spec says it is 256M gates, then a 256M-gate design will fit. But due to the routing utilization in FPGA-based emulation, you have to account for 70% or less utilization.
  • Your design contains memories right? Memory has always been a sketchy issue, even in big FPGAs. Processor-based emulation is better in terms of memory-to-gate ratio. That can be used to map on-chip memory and to collect debug traces.
  • You need to do advanced debug, right? Due to its deeper trace buffers and fast compile time above 70M-gates per hour—remember, processor-based emulation does not have to route FPGAs, so it can run less to get the same debug data as an FPGA. Yes, it may be the crossover SUV of the industry and consume more power, but it won’t stop like an electric car before the job is done.
  • You have more than one user, right? Processor-based emulation offers utilization with fine granularity down to 4M gates for smaller verification tasks, serving up more users than an FPGA-based system.

These are just some of the main reasons why users continue to rely on processor-based emulation and why the user base is growing. Yes, the power consumption is higher. Users get real benefits, real returns for it, that they do not get in FPGA-based emulation.

So does this mean that we are against FPGA-based systems? Far from it! With the Palladium adjacent flow to map designs into our Protium platform, users can get designs that run in emulation brought up in the Protium platform within days, with limited out-of-the-box speeds between 3MHz and 10MHz. If users then want to invest more manual effort, they can get to the same speeds that standard FPGA-based prototyping systems are marketed at—50MHz and above. It just takes more effort and time.

So all in all—just like in the iPhone example—users get specific results for larger power consumption. For Palladium, it was a deliberate decision, not some sort of design mistake as the competition portrays it. In the engine continuum of simulation, emulation, and FPGA, users can choose the right engine for the right task, at the required power consumption!