Experts At The Table: SoC Verification

Second of three parts: Trust IP, but still verify it; what can go wrong; the danger of bugs in even non-critical IP; abstractions and use cases.


By Ed Sperling
System-Level Design sat down to discuss the challenges of verification with Frank Schirrmeister, group director for product marketing of the System Development Suite at Cadence; Charles Janac, chairman and CEO of Arteris, Venkat Iyer, CTO of Uniquify; and Adnan Hamid, CEO of Breker Verification Systems. What follows are excerpts of that discussion.

SLD: How important is it to verify all third-party IP?
Iyer: I’ve used IP from a bunch of people. Part of this involves whether you’ve done your homework and understand what you’re buying.
Janac: But you have to test the module because you’ve configured that.
Iyer: Correct, buut when you’re translating from one protocol to another, it doesn’t matter to us. At some point I have to trust that you—Arteris—have done your homework. For me at the SoC level, I’m more worried about the fabric and whether it works. Performance is one aspect, and connectivity is a huge problem. We’re struggling with whether the IP is talking to the other IP properly, not whether it’s doing its job.
Janac: We have a tool that will verify switches and the traffic pattern that was put in place from the VIPs is correct.
Schirrmeister: That’s the communication piece of it. But to be functionally correct you do have to verify the module.
Janac: That’s why you need to trust the IP guy.
Hamid: You have to trust the IP guy and the fabric guy, and then you have to trust that each guy is talking to everyone else in the same language. We have an example of a perfect IP and perfect fabric, but one talks Big Endian and the other talks Little Endian? No one caught this until the system-level test.
Janac: You do a conversion of Little Endian to Big Endian in the network.
Hamid: You could, if you knew ahead of time this would be a problem.
Schirrmeister: It’s interesting that the fabric is becoming a buffer to take care of that. But still, the fundamental premise is true that you have to trust the individual components. And equally important, you have to specify the scenarios under which they are supposed to work. If I have five components and I want them to work together, someone had to specify this system needs to work under the following conditions. And that person comes up with 20 scenarios that he needs to support, and then you verify all of those against that spec. What becomes increasingly more difficult is defining the right performance, which is done with the VIP. From the side of the subsystem providers, the challenge is defining the requirement under which the subsystem or IP will actually work. It’s the conditions that it’s supposed to work in. You need to make sure the environment gives you the right data at the right time and consumes the right data at the right time.

SLD: So does verification work across an IP ecosystem? The challenge is whether they’ve all been verified using the same tools, or same kinds of tools.
Hamid: We can do plug-and-play verification checks. The first step is to define how you flow data through various IPs. You can’t just put down two pieces of IP and expect them to work well together. You have to test interactions concurrently and you have to imagine multiple processors and multiple memories. We believe AMBA works as a standard, but there is no answer about how to plug and play the testing.
The software guy looks at the world completely differently than the IP guy and the system guy. Arguing that they can play in the same sandbox may be wrong. They have different languages and different requirements. The only thing in common between all of those players is they all agree on what the system is supposed to do.
Schirrmeister: What do you mean they all agree? IP vendors don’t always know how their IP will be used.
Hamid: If you look at a video decoder, its role is to decode a frame. That is agreed upon by everyone. The knowledge of how to get a DMA to move data is understood. At the IP level we have to translate transactions for UVM. At the integration level you have to turn that into a sub-verifying C program that first has the DMA move the data and then the video decoder to decode it. We now have the scenario. At the software level we abstract it again and ask for some drivers to prove the same thing works. And then we have to do it all concurrently, because our job is to prove that when you’re watching a video on your cell phone when a call comes in that the screen won’t tear.
Schirrmeister: That’s like looking into the same house through different windows that all have different colors. But they’re all looking at the same thing. Underneath you have engines executing it all. The software guy looks at the hardware executed in simulation or emulation or an FPGA with his software debugger. The hardware guy always looks at it below the register.
Hamid: With the same RTL I can get very different answers. In verification, I can get different answers for what the hardware guys, the software guys and the system guys want. Why not come up with an abstracted view that defines the scenarios, understand what needs to be done and then to test generation that can go to IPs, systems, software, simulation, emulation and post-silicon.
Iyer: We do some of that. Our tests run inside simulators and emulators and in FPGAs. When a chip comes back, even without any software, we can prove where everything is going.
Hamid: And you can figure out the different constraints and optimize for different objectives.
Iyer: Yes, but the other important thing is that the levels you verify to are radically different. If you’re going into OMAP the areas you’re trying to verify are going to be a lot harder to define because you have to know what the customer is going to do with the part. If you put it into a cable modem, your data flow is well defined. In that sense, the verification requirements are a lot easier to define.
Schirrmeister: The higher up you get, the smaller the number of scenarios. There may only be seven general-use models for a phone, but how that falls out as you do the divide and conquer becomes very complex very fast. The number of rational data points is limited, but it folds out.
Hamid: You need use cases for what you are trying to do.
Schirrmeister: That’s where automation kicks in. If you can generate the lower piece from the higher piece, then it’s valuable.
Janac: You have to trust people and then you verify. If you can’t trust them you cannot verify enough. A lot of things that always mystified me about the IP industry is how concentrated it is. Every category has one major player, and then lots of small players, except in the memory interface area. It’s a trust issue. You figure out you can trust ARM, Arteris, Synopsys, Cadence analog, and then you stick with them and make sure they have a unified verification methodology within their IP portfolio. Otherwise the trust issue becomes unmanageable.
Schirrmeister: You also put in checks and balances. As a customer, you take the competition’s VIP because you don’t want to be sitting in the same bathwater. Those checks and balances are like the analogy of trust and verify.

SLD: Have we gotten to the point where the time spent in verification will rise to reach a sufficient confidence level?
Janac: For the chip-level guys, it has to go up slightly. For the ecosystem, there needs to be a huge increase. There are lots of pieces coming in, and each piece has to increase its verification slightly. The chip verification, the SoC verification gets bigger and more complex, but in the ecosystem it gets huge.
Hamid: The integration guys have to tape out. They don’t get the opportunity to wait. They know they’re doing a quarter to 10% of the verification they want to do, and where they get hit is in the validation when the chip comes back. The software guys take the hit because the bugs in the chip take a very long time to find and debug—or worse, the chipmaker’s customer finds it, which takes even longer to debug.
Janac: The problem is there is some mission-critical stuff. With an LTE modem you don’t want to have a hardware bug because your communication to the phone stops. There is a cache coherency subsystem. If you have any bugs at all, it brings the whole thing down. So there are some really critical pieces where you can’t do 15% of the verification because you’ll get burned.
Hamid: If it’s not mission-critical and it doesn’t work, it still may bring down the system, and by the time you debug you lose your market.
Janac: What you’re saying is you want the hardware and software verification to cooperate.
Hamid: Or you take the knowledge of what’s been tested on the DMA and be able to move it all the way up the stack.
Janac: But you also want it to go down the stack.
Schirrmeister: That’s the more important point. It has to go down. You can still express it in an abstract way, but then you need to fold it out to the 50,000 or so cases. There is no one-size-fits-all solution. You need to attack it from all angles. At the bottom, the IP pieces need to be cleaner. That’s a given. Then there are never enough cycles to execute verification. That’s why we’re working on different or smarter ways to execute. Even though formal is great, you’ll never rely on that alone because you want to see the thing do something. That’s why the hardware-assisted verification markets are so popular. And then from the top of that, it’s more and more difficult to define the cases in which the device is supposed to work. And then you have do something virtual in RTL, you put it together, and then you have the chip. Everything has to be overlaid.
Hamid: We need vertical re-use from IP to system, and we need horizontal re-use from horizontal models to simulation, emulation and post-silicon.