Emulation for Power

Attacking the power problem in SoCs today demands heavy metal.


Solving power problems in today’s leading-edge SoCs requires not only the best architectural choices but advanced tools and techniques to determine the right path to take. This equates to a combination of hardware emulation and power analysis/optimization software tools.

Design teams today must have real-life scenarios to accurately predict the power impact of their architectural decisions, so the EDA R&D machine is churning away to deliver on that.

“In the context of the continuum of verification engines concept that Jim Hogan has been discussing in the industry as of late, bringing up real-life scenarios does not happen completely in Emulation,” said Frank Schirrmeister, group director, product management and marketing, system and verification group at Cadence. “It’s really about the different engines in conjunction, meaning, a lot of people boot OSes on a virtual prototype first. As a matter of fact, when we do our hybrids [a combination of virtual prototyping and emulation], we typically bring up the OS up on the virtual platform first to make sure there are no ‘easy’ bugs. Then when you get into the timing critical things where the pieces you want to be in emulation count, then when you do that analysis, you bring in the emulator.”

For ‘real’ workloads, meaning those that touch on the point of system validation, the OS should be run and then the intent validated to find out if it all works as expected and specified. “That difference between verification and validation is quite important and a lot of people actually do that on FPGA as well. So, there is a continuum of engines all of them doing OS boot. In emulation, we definitely have users doing this,” Schirrmeister said.

In one example, unbeknownst to Cadence, Nvidia made a video about its Maxwell platform, which discusses the use of emulation.

Another interesting use case involves running AnTuTu benchmark scores, Schirrmeister said.

Outstanding challenges
Evidence of designs overheating and ultimately failing prevail, so new approaches are entering the market, with Mentor Graphics throwing its hat into the ring following an R&D collaboration with Ansys-Apache.

Jean-Marie Brunet, director of marketing in the emulation division of Mentor Graphics, explained that as a result of extensive conversations with very large customers there is a clear shift in terms of how power is being approached.

“If you think about a tablet or phone, there is roughly the same bill of materials, roughly the same application processor — but the use cases are slightly different depending on the apps used,” Brunet said. “The problem you have is that while the architectures and designs are roughly the same, the usage is hard to predict. If you look at what they do, they are booting an OS, and they run a live application. To be able to predict accurately the amount of power that those designs are going to consume, you need to put them under a testbench or a system that is a fair representation of how they are going to be used. Most of the time, [users], when they run power, take a functional testbench that they adapt and run their power analysis tool based on this. That’s what we call the wrong way. Basically, you have a very small number of cycles that are run and you have no idea if it’s an accurate representation of what’s happening.”

Mentor compared a functional testbench, which is what is typically used for power analysis, to running a live application on an emulator. “What is very interesting throughout this exercise is the testbench gives you a false power peak,” he said. “What we capture through this activity is an activity plot. We are able to capture the activity of every single net within the design. Now imagine capturing every single net activity through a large SoC design, through hundreds of millions more cycles, which is a lot of data. If you run through this and look at that activity now, compared with the testbench, you’ll see that the testbench is a very small amount of cycles because it’s usually based on simulation and the peak power is not the same. The old methodology gives you a false power peak. It will give you a wrong impression about what the power peak is going to be.”

He added that simulation doesn’t work here because of the number of cycles. An FPGA prototype could be used, but a lack of visibility due to the FPGA structure is a limiting factor.

Once the data is loaded on the emulator, how can the power number be computed? In this case, because Mentor Graphics does not have its own signoff power analysis tool, the company formed a partnership with Ansys/Apache.

Vic Kulkarni, senior vice president and general manager of the RTL power business at Ansys/Apache, noted that accuracy is important—and it’s not just having a flow between emulation versus RTL, which many people may or may not have, depending on how they do it. Instead, accurate power analysis requires time-based access, and this mismatch is impacting the performance. “Many people base it on internal formats for how the handshake happens,” he said. “Between the emulation world there could compromise of certain critical signals, could be compromise of what constitutes average power versus peak power. You might miss those.”

This is becoming a hot topic for debate in the emulation world, where all of the Big Three EDA vendors are heavily pushing hardware acceleration because of the profitability of selling full systems and the monumental amount of data that needs to be processed. The race is on now to find more efficient ways to utilize that data.

“Think about a full SoC for a very large customer, multiple millions of hundreds of cycles every clock cycle, every net I’ll have the activity,” Brunet said. “The data you dump in this case is unbelievable. For example, for a 200 million-gate design, if you run 10 million cycles, that file will be several hundred gigabytes. Then, if you boot the OS and run the live application we need to talk about 300 million cycles, minimum. Just the economy of scale makes that system extremely difficult because we need time to write that file, and the recipient on the other side, which is the power analysis tool, they need to be able to efficiently read that file. That methodology was really broken.”

Mentor’s solution in this case is a dynamic read waveform API, meant to enable an ecosystem external to either company to read waveform data without the massive files. The eventual goal is to offer this as an industry standard, though Synopsys and Cadence have different ideas.

Where power analysis in concerned, Schirrmeister said Cadence has been doing this for some time — providing solutions for power verification as well as dynamic power analysis (DPA). “Power verification is the notion of supporting UPF and CPF, and switching on and off the different regions in a design, where DPA is doing power in the context of software. It’s a use model that becomes more important because for some of the effects you need to run really long workloads, and that’s where emulation fits in. You also need to create and collect lots of data because you essentially collect your toggle data for the activity data in the design.”

Behind the scenes, there is a group within Cadence called Joules working on an RTL power analysis tool. According to the Cadence website, version 1.0 of Joules is currently in beta phase with several high profile customers, and follow-up versions of Joules are set to include power linting and optimization capabilities, and integration with Cadence synthesis, simulation and emulation products for a smooth RTL to GDS power flow, considering power/timing trade-off.

With power heating up the emulation space, SoC design teams only stand to benefit with additional options for analyzing, managing, planning for and optimizing power in their designs.

Synopsys is also a player in the emulation space but declined to comment.