Pushing The Limits Of Hardware-Assisted Verification

Demand for big-iron tools is insatiable as complexity continues to grow.


As semiconductor complexity continues to escalate, so does the reliance on hardware-assisted simulation, emulation, and prototyping.

Since chip design first began, engineers have complained their design goals exceeded the capabilities of the tools. This is especially evident in verification and debug, which continue to dominate the design cycle. Big-iron tooling has enabled design teams to keep pace with growing complexity through a combination of parallelism, more and different processing elements, and the addition of machine learning.

At least for now, there doesn’t seem to be any softening to this trend. More functionality, more transistors, and more potential interactions within a chip and between chips and systems requires massive amounts of computation. In some cases, this involves a single chip, but increasingly it involves multiple chips in a package. And pushing that trend even further is the explosion of advanced designs in safety- and mission-critical applications, where customers are demanding chips remain functional for longer lifetimes, often within the context of complex systems and systems of systems.

“Chip designs are getting bigger and bigger,” noted Simon Davidmann, CEO of Imperas Software. “What this means is that RTL simulation and verification are grinding to a halt in the same way as in the early days when people did gate-level simulation. They built accelerators to solve that problem. Then we moved to RTL, which gave some speed up. Now everything’s done to speed up RTL, and it’s all done in emulators. To speed that up, we move up in abstraction and use high-level simulation and performance analysis, or instruction-accurate simulation. Moore’s Law isn’t stopping.”

And when the design is akin to Apple’s M1, with 16 billion transistors, or an AI engine with 1.2 billion transistors, it’s not surprising that simulation and emulation are being used for verification, timing analysis, and architectural analysis. Predominantly, engineering teams employ emulators to speed up RTL simulation, as well as to get software up and running properly. The challenge with chips of these sizes is that more and more big iron is required, and that adds significantly to the design cost.

“There’s always been the need to speed up software simulation, and emulation is one of the answers to that problem using custom processors, using FPGAs. If you don’t have an emulator, you’re probably not serious about doing verification because it’s a tremendous expense,” Davidmann observed.

Hardware-assisted verification spending in Q4 2018 was larger than RTL simulation, according to Siemens EDA. Historically, RTL simulation was a bit less than $140 million per quarter, with a strong compound annual growth rate. More recently, that growth has flattened. In contrast, hardware-assisted verification has been growing at a 14% to 15% CAGR.

Four markets dominate the majority of hardware-assisted verification use — traditional networking (wired networking, Ethernet, etc.); wireless communications and WiFi; computing and storage, such as computational storage devices; and transportation (including automotive, buses, taxis, planes, missiles). For all of them, two of the biggest concerns have always been power and performance.

“Power and performance are big challenges because they have to be considered within the context of software, and software performance defines semiconductor success,” said Jean-Marie Brunet, senior director, product management/engineering emulation and prototyping at Siemens EDA. “Twenty years ago, a semiconductor was measured based on how it behaves compared to the hardware functional spec. Today, there is still a spec, but a product marketing guy that sells semiconductors no longer just has a spec. They have to get a performance/power analysis. They have to look at a laundry list of workloads, frameworks, and benchmarks. They have to look at the SoC in the system. And the SoC is no longer just an SoC. It’s an SoC integrated in a system. Or there might be a semiconductor company launching what they call a chip, but in reality it’s a system, because it’s a multi-package SoC in the die. This needs to be verified, and it is becoming very much driven around, ‘I’m running a set of software, and how the software is reacting to this will dictate if the design in the semiconductor implementation is working or not.’ That creates a lot of interesting dynamics.”

Along with the benefits of emulators come challenges. “Emulators are great in the sense that they’re very fast, and you can do closer to real-time running of actual software on the simulated environment,” said Marc Swinnen, semiconductor product marketing director at Ansys. “The problem is this produces a lot of vectors. We’re talking millions and millions of vectors, and then it spews out of the machine at a high speed. The traditional way of handling this is to store all those millions of vectors in a giant database and then start perusing it to find out what’s going on.”

Power analysis plays an increasingly important role in all of this, and emulators have proven to be very good at this. “From a power point of view, there can be a direct stream interface from the emulator to power analysis software, so that as data is being produced, the power analysis tool can analyzing the output stream on the fly,” Swinnen said. “This means when the emulator has completed its run, the vectors that are really interesting from a power point of view — such as, where’s the peak power, what is typical of the typical power, which regions have different peak powers which combination of activities create more power — have been identified. Those can be then be isolated for further study.”

What else can it do?
One thing that has changed is that as companies invest in hardware-assisted tools, they are looking for other ways to leverage their investment.

Interestingly, emulation users today are looking for more flexibility in their big iron. “The big users especially are looking for enterprise class emulators, where they don’t just buy one for a project,” said Michael Young, director of product management at Cadence. “They may buy multiple of them and use them as enterprise resources to share between projects. Within projects, there are multiple uses for different purposes. Some are for verification, some are for hardware-software integration, some are for DFT testing, some are for low power, power analysis, or performance analysis. The old days of emulation, where you bought an emulator for one purpose only, are gone. Users are looking for very well polished emulators that are almost like a verification computer because it’s going to be used for many things.”

Fig. 1: Hardware-assisted verification productivity loop. Source: Cadence

Fig. 1: Hardware-assisted verification productivity loop. Source: Cadence

On a daily basis, a design/verification team will build a database to run emulation. This must be very efficient. Whether this is an IP with 500,000 to 1 million gates, a subsystem with a few hundred million gates, or a full SoC that may have billions of gates, the DV team needs a way to build that database two or three times a day. Once they build it, they run it, find a bug, make the fixes, then do another build.

The turnaround times that users are looking for is two or three times a day, because they have teams across the globe and want a three-shift mentality, utilizing the machines 24/7. The system has to be able to scale from the tiniest piece of IP to chip-level emulation. In fact, many engineering teams today are doing multi-chip emulation, where two chips are brought together to see what the system level effect looks like, along with the software.

Also, in the days of sharing at the enterprise level, queueing systems must be set up to prioritize what gets access to those machines, Young said. “Are you a priority job? Or are you low priority, or medium priority? All of that is about the efficiency of the allocation of the job. Then comes the runtime, as well as debug. People buy an emulator not because it looks pretty, not because it runs fast. It needs to be able to have high debug efficiency.”

New use models
Pushing the envelope across the board in emulation also is giving rise to new use models.

“In the early days of emulation, DV teams knew maybe one or two ways to use an emulator,” Young said. “Today we’re tracking 20-plus different use models. Most teams will probably never do 20 different tasks in a single project, but across the board the thinking is evolving on how to apply emulation machines.”

Others report similar trends. Johannes Stahl, senior director, product marketing at Synopsys, pointing to demand for more application-specific emulators, defined as performance-optimized for specific use cases such as power verification or software development. By contrast, general-purpose emulators deliver performance, capacity, debug and flexibility for a wide variety of use cases.

In one application-specific example, the hardware can be used to exercise the design and the software. Then, power calculation software can take the activity that comes out of the design and quickly calculate the power within a single system.

“There’s so much data coming out of an emulator, with data sizes in the terabytes,” Stahl said. “You cannot do that if you have an isolated piece of software running somewhere else. You need to have the emulator connected very closely to the compute servers that actually calculate the power. Then you actually can turn a huge chunk of data into small pieces, or pieces that can be tackled — and maybe with a few 100 processor CPUs. That’s the secret we use to get to this short turnaround time.”

Other application-specific implementations include IP validation and networking design verification.

“What this means inside the emulator is how we integrate the application-specific part with the core emulation infrastructure, such as how we communicate, and how we handle large data sizes,” Stahl explained, “It’s all about getting data in and out the emulator in an efficient way. Then it’s about pre-processing or post-processing the data in the right way to get the best turnaround time for that use case. For networking, it’s how many frames you can get through the system of generating frames, sending them to the emulator to emulate them, getting them back and measuring that they came out correctly. It’s the overall throughput. For power, it’s generating the activity and calculating the power. This is basically the turnaround time that you get on that side. For IP validation, it comes down to how quickly you can complete the regression run for this IP before you do the next RTL drop, or the next software drop that you want to bring on the box.”

Another emulation use model is hybrid, which has become more commonplace over the past few years.

“Emulators are getting bigger because of the capacity they need to deliver, and users are finding that they can do things like hybrid models with software models of the processors to speed the process up,” said Davidmann. “That’s not much good if you’re trying to verify the processor, but if you’re buying an IP from an Arm, for instance, it’s already pre-verified so you don’t need to worry. You’re just using it as a tool, and it’s the software you’re really worried about.”

Given the many considerations in verification — whether it is machine learning, AI, deep learning, ADAS, or high-performance computing — what’s common to every application are power and performance. Everything else is secondary, said Siemens EDA’s Brunet.

“To accurately measure power and performance, you can’t be cheating,” he said. “You’ve got to be able to measure it. To measure it, if you step back from this, you need to have the ability to have visibility. If you cannot debug, if you cannot have visibility, then you don’t really measure it. You measure an abstract view of it. So then you must ask, ‘When I give a reference number on the workload, am I accurate?’ It’s important because that’s the way the software will tell you if your semiconductor has value or not. The level of accuracy is not just a high-level standard. It has forced the methodology to be able to accurately extract data. Since you have visibility and you’re running something, if you find a problem, you need to be able to debug because you need to move on with the chip.”

Moving forward, Synopsys’ Stahl expects that general-purpose machines will always exist, but not in isolation. “They will always go through the generational changes with whatever are the compelling events for next generations, and typically it’s is no secret in the emulation world that the next generation emulation chip drives the next generation hardware. That will continue to go on. It’s the same on the prototyping side. Finally, we’ll continue to see more of the application-specific emulators that address users’ top concerns.”

Leave a Reply

(Note: This name will be displayed publicly)