Reaching a sufficient confidence level that SoC designs will work as designed is becoming much harder.
Verification is becoming much more difficult at 16nm/14nm, driven by the sheer complexity of SoCs, the fact that there is much more to verify, and the impact of physical effects, which now affect what used to be exclusively the realm of functional verification.
The questions these changes raise are daunting, and for many engineers rather unnerving. The whole validation, verification and debugging process keeps turning up new bugs as SoCs are rolled out. Some of them can be fixed in software, some of them can be fixed in the next rev of a chip—often a re-spin of a pre-production chip—but some of them also make their way out into the market where they can cause havoc. And it’s not just the hardware that has to be verified anymore.
“Ten years ago we were mainly focused on verifying hardware, which made it simpler to fix,” said Harry Foster, chief verification scientist at Mentor Graphics. “From an ASIC perspective, the cost was more constrained. IP verification is still done in a way that’s similar to how ASICs were done years ago, but SoCs have changed everything. You cannot just verify IPs independently. You have to put them together and see how they work. There are behaviors you can’t verify until they’re fully integrated.”
This shift has been gradual, despite the fact that alarms have been sounded by the most advanced chipmakers. Several years ago during a DAC panel, one of the topics of discussion was the growing cost in post-silicon validation. Large chipmakers typically commit to at least one re-spin to see what goes wrong when things are completely assembled, and then they debug it from there.
Finding bugs earlier
That isn’t exactly the most efficient way of dealing with bugs, though. According to Foster, if a bug isn’t fixed in the stage at which it was created, it costs 10 times more to fix at each successive stage. And post-silicon, it costs another 10 times more to fix.
In fact, the most efficient approach is to design with fewer bugs in the first place. That may sound unrealistic, but the truth is that methodologies across the industry and throughout the design flow need to be cleaned up and modernized. Many of them don’t take into account new IP or physical effects, and verification is typically a late stage to be dealing with this.
“The answer for proper verification is design,” said Frank Schirrmeister, group director for product marketing of the System Development Suite at Cadence. “You need to make sure bugs are not in the design in first place, and to do that you need a good methodology and tools. The tools are not answers to everything, but they do enable the methodology.”
But how you actually construct your methodology, whether it’s a top-down system-level approach or bottom-up approach from the block level, is a matter of debate. The Big Three EDA companies all push an integrated tools flow, while smaller companies say that’s not efficient. There are adherents to both approaches, even within chipmakers.
Pranav Ashar, CTO at Real Intent, believes the best way to tackle the problem is layer by layer. “There is more complexity, but the complexity has layers to it,” he said. “Verification of the whole SoC can be seen as verification of a number of narrow steps. Complexity means there are more steps, but those steps need to have three things. One is that the specification needs to be precise and automatic and implicit. Second, those steps need to be addressed with static analysis that is viable and meaningful, so you don’t have to put it on a simulator. And third, debug needs to be narrow enough that it can be precise. If you have all of these, then the end effect is positive.
Companies that sell integrated flows—Mentor, Synopsys and Cadence—all believe that emulation/simulation is a critical part of the equation and tend to view verification more from a system perspective. Emulation has bought some time there because it can process far more at one time than simulation. Yet everyone agrees that the verification problem is getting more difficult to solve and that methodology is critical to solving it.
But will chipmakers really focus on methodology? The answer is that it depends on the market, the chipmaker, and sometimes the various markets served by a single large chipmaker.
“It’s hard to get customers to do better planning up front,” said Michael Sanie, senior director of verification marketing at Synopsys. “The verification architecture is going to be more efficient if it’s part of the design set-up. But what a lot of companies do is re-use it from the previous generation.”
Varying levels of confidence
A key reason behind the disparate approaches by chipmakers is the vertical markets for which they develop chips. While a single bug may be unacceptable in a processor, it might not matter in a consumer electronics device. Frequently bugs can be fixed with software after a product has hit the market, which explains why there are so many updates to operating systems, fewer updates to automotive firmware, and almost none in critical systems.
But all of these markets are undergoing changes, because functionality is no longer confined to transistors, memory and the signal path—particularly below 28nm. Heat can disrupt all of them, as can software, IP, noise, electromigration, electrostatic discharge, and at future nodes the mobility of electrons. “Physical effects are a new wrinkle in verification,” said Schirrmeister.
That appears to be the general consensus in the functional verification world. You can’t ignore physical effects anymore.
“We’re beginning to see chips that may be functionally correct but their physical behavior is different,” said Synopsys’ Sanie. “A lot of this is post-silicon debug. There is no way to functionally know about the problem before silicon, so you have to take something you find post-silicon back. This is where a lot of innovation has to happen because there are no tools to deal with this now.”
Conclusions
All of this leads back to two very critical questions:
1. When is verification done?
2. Will the cost of verification ever go down?
The answer to the first question is never. “We used to call it purgatory after tapeout,” said Mentor’s Foster. “But there are metrics that people use to determine when they’re done. Coverage is the more important metric, and it works well at the IP level. There is work to do on the system level.”
The answer to the second question is most likely no, the cost will never drop, although there are ways of preventing it from going through the roof. “A lot of engineers are still being deployed to verify, not design,” said Real Intent’s Ashar. “The real measure of benefit for new tools is the ability to repurpose the budget from manpower to tools, where you need fewer people to look at the output of the simulation run, for example. We need to do something, because the number of people is not growing with the rising complexity of SoCs.”
Adds Synopsys’ Sanie: “I don’t believe verification costs will ever go down. The question is how to manage it and become more efficient. That has two aspects to it. One is resources, and more efficient tools will help. The second is a human cost—how much time do you spend on debut and setting up testbenches. That’s a methodology issue plus integration between tools.”
Leave a Reply