Experts At The Table: Verification At 28nm And Beyond

Last of three parts: Bounding problems; the impact of complexity on verification; unexpected gotchas; the effect of 3D stacking.


By Ed Sperling
Low-Power Engineering sat down to discuss issues in verification at 28nm and beyond with Frank Schirrmeister, director of product marketing for system-level solutions at Synopsys, Ran Avinun, marketing group director at Cadence, Prakash Narain, president and CEO of Real Intent, and Lauro Rizzatti, general manager of EVE-USA. What follows are excerpts of that conversation.

LPE: The big challenge is bounding whatever you can?
Schirrmeister: Exactly. One interesting thing about the application domain is it often determines verification requirements. In the wireless area, if I have to reboot my phone once a day that’s annoying. Someone has made a risk analysis of when it’s annoying enough to buy another phone. There are other areas, like in mil/aero, where the verification has to be much more complete. In mil/aero, automotive and medical, designers are much more open to adopting repeatable and checkable processes beyond just RTL. UML adoption is great in those areas. You have a higher-level model, generate your code, and then you have formal checkers.

LPE: Does the verification process become more complex as we move from 28nm to 22nm and eventually into 3D, or is there just more data we have to deal with?
Narain: Yes, it becomes much more difficult.
Avinun: With functional verification it’s based on complexity and size. What’s new is the number of cores and embedded software you have to integrate. We don’t care if it’s 28nm or 22nm. You can have devices that may be more complex and challenging at 28nm than at 22nm.
Narain: Timing closure and power become very different.
Avinun: But I don’t see a bump from one process node to another. For functional verification it’s design complexity, the number of processor cores and the amount of embedded software.
Schirrmeister: But then you have to verify all the other aspects.
Narain: Yes, it’s timing, power, test modes, timing closure—all of these things become serious issues. If you have a 250 million-gate design, how many machines do people have to run all these processes at that level? The way you sign off is very different. You need to put a methodology in place so that when it’s broken down the pieces are done correctly and the overall process is still correct.

LPE: It sounds like we’re talking about a much broader definition of verification, right?
Narain: Definitely. The chips are failing not just because of functional issues. A bad timing constraint is just as bad as a functional issue in the design. Many times you can re-do software. People can mask off functional modes. But how do you solve a clock-domain issue that makes an interface unreliable? That device is dead. It gets less attention because an attempt has been made to make those problems more bounded. But as far as risk factors go, these physical effects are as catastrophic as the functional aspects.
Avinun: At each new node you have more opportunity to get more complexity, more features and more software into your device, so naturally it’s also increasing the functional verification challenges. One of our customers told us that by moving to the next node they got 3x more capacity to lower power. It is easier to implement those features when you have more real estate and more gates. Otherwise it would be more difficult. You will use more real estate because it’s free from a material point of view. But it’s certainly not free from other standpoints.
Schirrmeister: On top of this, all these timing effects have to be verified. One element of making this more bounded is that the components are pre-verified. There are only a couple of IDMs out there. What’s changing is which tasks the foundry user does versus the OEM, and that will change even further. From 28nm to 22nm there will be more need to be bound. More IP will have to be pre-verified so the user doesn’t have to worry anymore that the processor core won’t work.
Narain: As an example, there’s an IP with an asynchronous reset. People can re-use that somewhere else, but when the noise levels went up the reset got asserted. It became a vulnerability point. These are $1 million bugs. You don’t think this can happen, but methodologies are breaking down.
Schirrmeister: The divide is getting bigger. Gary Smith said verification is going up and going down. It’s going down to the electron spin. At the top, at the software verification level you’re worried about having 12 cores on a design. The divide between these two areas is getting bigger. The software can break. Ptolemy (software environment) stopped working at UC Berkeley when they ran it on a multicore PC because there were a couple of deadlocks that were not properly programmed. Once you introduce this complexity, things break at the software and system level.
Narain: EDA vendors develop software. Software bugs tend to be more forgiving because software can be rewritten. It’s much harder to change things in silicon.
Avinun: But if they need to spend two or three months in the lab, it doesn’t matter. It’s forgiving once you find it, but if you miss a market window that’s not forgiving.
Schirrmeister: The guy who has the unforgiving three-month schedule has to design smarter. You have to switch the fabric. In an FPGA fabric the hardware is forgiving because it’s programmable. That’s partly bound because it’s all bound for timing, and you can do a completely new thing where the hardware can be changed.
Rizzatti: It all depends on what kind of software you’re talking about. There are three major classes—the drivers, the operating system and the application software. When you deal with drivers you have less forgiveness. We developed two demos. In the first one we injected an error into the description of the driver. The error shows up in verification as hardware. In the other demo we injected an error in the hardware and it showed up as an error in the driver.

LPE: What happens when we go to stacked die? Is it even harder?
Schirrmeister: It’s just another component to make it more complex. You can model that as an interconnect if you need to, and at the base level you need to make sure the electrons take the right path. It’s another step in verification.
Avinun: It’s not so much functional verification. But there is more complexity.
Narain: The physical effects that were in the second and third order that could be ignored can no longer be ignored. They’re now all first-order effects.