Hybrid Emulation

Experts at the Table, part 1: Using a single execution engine for verification tasks is quickly becoming the exception as users try to balance performance, accuracy and context.


Semiconductor Engineering sat down to discuss the growing usage of hybrid verification approaches with Frank Schirrmeister, senior group director of product management & marketing for Cadence; Russ Klein, program director for pre-silicon debug products at Mentor, a Siemens Business; Phil Moorby, chief architect for Montana Systems; and Kalpesh Sanghvi, technical sales manager for IP & Platforms at Open-Silicon. What follows are excerpts of that conversation.

SE: We have transformed from an era of single-engine verification into one that requires multiple engines throughout the design flow. Today we are beginning to progress into hybrids of those engines where multiple are used together to solve verification problems. Where are we today in terms of hybrid execution?

Schirrmeister: Each of the big EDA vendors has a large graphic depicting four or five verification engines, depending upon how you count them. On the hardware side you can combine those amongst each other, and then you combine them with the software world. The reason for combining them is to enable software. Formal is less important for this topic, although there are some formal aspects that we could talk about. Simulation can be bucketed into the notion of low-level software, where debug is still applicable. When you are running at a couple of KHz, it is well beyond the cup of coffee level of interactivity that a software developer wants. If it doesn’t come back with an answer after I have gone to get coffee, then it is no good. That is how the software developer acts. But even for RTL simulation, we have people bringing in ARM fast models and connecting those to simulation, so you have mixed abstraction both for software and for verification. The two big engines are emulation and FPGA prototypes. First you combine them and sometimes they are combined with real silicon. This happens when you want to re-use things that are already stable, and you can connect those into virtual platforms for the purpose of software bring-up. In an ideal world, you should be able to take a block diagram for the chip, and for each of the modules define if it should be in TLM or RTL. Are we there yet? No, but I have seen users hand-building most of those engine combinations. It is necessary to balance speed with accuracy and time of availability. So fidelity, speed and time of availability are the deciders.

Klein: There is a mix of verification goals. They need to be able to pick the right engine at the right level of abstraction to get the right level of detail to meet the need. This will change over time. Joining virtual prototypes with emulation or FPGA prototyping, or even with traditional simulation, enables you to run your software payloads a lot faster, and they bring with them environments where software folks can come into the mix. Early on they can validate a certain set of activities. There are also folks who want to be able to verify cache coherency, and we can’t use a virtual prototype for that because they usually do not properly model the cache. So then you need to go down to RTL. The ability to mix and match is something the EDA companies have been working on, and even if we have not reached Nirvana, we are able to mix and match engines in ways that enable customers to get the right mix for the task.

Sanghvi: There are more and more system companies that are moving from standard parts to making ASICs, and they don’t want to do everything with the RTL. The key thing is that they want the software to be up and running and they want the software team to be engaged right from the design phase of the ASIC. Most systems companies want a mixed solution where they can run firmware plus use cases, which is the application level. The approach we have taken is to provide virtual prototyping to enable the firmware verification. This is because of the limitation of the interfaces that are available. We can model some of the flash memory and some of the other interfaces using virtual prototyping, which are required for firmware development. Then the FPGA is mainly used for system-level use cases. It would be great to see how they could be combined, because then we could provide one solution which can be used to run both firmware or use cases.

Klein: What does it mean to combine these two? What is lacking today?

Sanghvi: We tried a couple of times using a TLM-based interface to connect the FPGA with the virtual models, and we found issues. This is why we partitioned it so that we can do something in one environment and the rest in the FPGA.

Klein: It is possible to bring together a virtual prototype plus emulation. This is fairly well-worn path at this point. We are also investing in FPGA prototyping, and we fully expect to be able to take all of the same capabilities from emulation to be able to run with FPGA prototyping. Mixing those engines should be something that the EDA vendors can provide, in terms of the connectivity points between the engines.

Sanghvi: Is that with a TLM-based interface between them.

Klein: The interface is somewhat buried from the user. The user should not need to know about the low-level interface. At a conceptual layer, the user should be able to choose the abstraction level for each block and the interface takes care of the glue between them. Deep under the hood, this is using the Accellera SCE-MI interface. There is a level of infrastructure on top of that, which eliminates some of the low-level details. At the user level, they don’t have to worry about the details, but underlying it all is the standard that we can move to various engines.

SE: The industry has a couple of standards in place – SCE-MI and OSCI TLM. Are these adequate?

Schirrmeister: TLM 2.0 is the glue that holds SystemC pieces together. If they don’t use SystemC and are coming from the bottom up, they sometimes use standard C interfaces. Accelerated verification IP, or transactors come with different interfaces, including C interfaces, a TLM interface to hook into a SystemC environment. There are some interfaces on the Verilog side, as well. These interfaces allow you to connect to the hardware. Which one you use depends on if you are coming top down or bottom up. This is what sets the interface preferences. They may also be using UVM to drive verification traffic. That depends if it is software of verification data be driven into the system.

Moorby: I had done work around emulators since 1992. I worked with Virtual Machine Works, which was acquired by Ikos, which was then acquired by Mentor. It was a valuable experience that enabled me to learn about that market. Customers were trying to solve a couple of problems. Since then, there has been a lot of merging of different needs and different customers trying to solve different kinds of problems. If we fast forward to today, we have a machine with multiple engines and it is very easy for the whole thing to go out of balance. You will have something already running on the emulator that can go at speed. The emulation market demands the ability to solve system issues including software. They may also be interfaced with the rest of the system that is real hardware, which has to operate at a certain minimum speed. That provides a bound that they have to reach. But if we talk about bringing other components in, such as a module coded in SystemVerilog, they are using UVM, which is simulating on a software simulator, and here the speed is very different. I hear that 80% of the time is spent in the testbench. For decades we have been trying to speed up the simulator, measured in clock cycles per second. At the gate level you are lucky to be above 1cps. Software simulators, running on a standard platform, such as x86, can get to a reasonable speed, but it is only perhaps 10 to 100 cps. Now, when we put UVM into the mix, and if 80% of the time is spent in that, then you have made things 10 times slower and that is going in the wrong direction. The big gap between the software solution versus emulation, which typically runs at 100,000cps or more, is huge and growing. Overall, the software simulator is not getting faster. So what can be done that takes advantage of both of these worlds? One solution is to get away from the standard x86 platform, which has remarkable properties in terms of cost, and running standard C or C++ software. However, for RTL synthesizable simulation, it is not the best of architectures. Just look at cache miss rates and you will see the issues. So how do we break out of that and get to a processor model, perhaps running on an FPGA, which provides the necessary speed-up compared to software simulation. We can learn from the lessons of emulation and find ways to balance the engines. Why spend millions of dollars on an emulator when you have a component that is left running slow?

Klein: As you go closer to the complete system, you tend to find that the testbench component gets lighter and lighter. So when you have the final system running, it is software, it is a clock and reset, and it is real-world stimuli coming from outside. It is no longer a UVM testbench that has to figure out the different corners to put a block of IP into. One of the benefits of bigger systems that include a more realistic environment is that the testbench gets smaller. At the block level you are right – it may be 80% testbench and working on the 20% won’t affect you much.

Schirrmeister: To add to that, at the system level, you need to understand the scope of what you are verifying. For IP, you are right, you have the specific randomized testing, UVM, etc. If you look at the Accellera Portable Stimulus effort, the testbench is no longer a testbench in the classical sense. It is software that resides in the processors of the design, stimulating everything and figuring out how the interconnect works. When we do an acceleration assessment, the first thing we do is split them. The majority of the market is still using ICE where we don’t have a testbench. The actual function is what you do. You run the software, you make the phone call in the context of a virtual environment. The acceleration here is dependent on the time spent in the DUT. Given that a testbench is not synthesizable, you may find that the combination does not make sense to provide DUT acceleration. So they need to be separated. But you also have to consider the types of bugs you are looking for. If you are finding bugs in the blocks when doing full system simulation, then something has gone seriously wrong. You need to figure out the right balance between the engines. I don’t want to use simulation for the wrong purposes, or I shouldn’t go to emulation too early because of the bugs I am finding, and I could be using emulation for other stuff.

Moorby: But don’t you see the big gap between simulation and emulation as a problem? If it is a big transition to go between them, it creates two different types of development teams and there is nothing in between.

Schirrmeister: That is no longer the case. We are almost there, and it is a lot easier. It is no longer two different teams. The only area where this remains true is on the FPGA side. In emulation, helped by things such as coding guidelines, we have been able to bring the DUT up in a matter of days or weeks. We have taken care of things like memory model, clocking – the items that in the FPGA world have been difficult in the past. In an FPGA you had to figure out how to synchronize all of the clocks; you are remapping into the look up tables of an FPGA, which are meant to do something fundamentally different, and that is done manually. You don’t have the same level of memory resources compared to an ASIC, so in the past that did require a different team. That has also been improved lately and we can now take a design for emulation and run it on the FPGA. If you really want to get to 100MHz, then you can invest additional time in remodeling.

Klein: EDA vendors have recognized that people are moving from simulation to emulation to FPGA prototyping and we have done a lot of work to smooth out that transition. Common front ends are being used and people can take the same meta-data across the environments. This is a common theme from all of the vendors. We are not complete there yet, but we have made a lot of progress.

Schirrmeister: The race on now is to get to Nirvana. You have a verification payload and you have to decide where is the best place to run it: simulation, emulation, FPGA and let the system figure that out based on priority and other work demands.

Related Stories
Hybrid Emulation (Part 2)
Finding the right balance of performance, visibility, turn-around time and verification objectives.
Hybrid Simulation Picks Up Steam
Using a combination of simulation and emulation can be beneficial to an SoC design project, but it isn’t always easy.
Emulation’s Footprint Grows
Why emulators are suddenly indispensable to a growing number of companies, and what comes next.
FPGA Prototyping Gains Ground
The popular design methodology enables more sophisticated hardware/software verification before first silicon becomes available.