Complexity is causing huge increases in the number of unknowns in a design, from architectural modeling all the way through to manufacturing.
By Ed Sperling
The number of unknowns is growing in every segment of SoC design all the way through manufacturing, raising the stakes between reliability and the tradeoffs necessary to meet market windows.
Tools are available to deal with some of these unknowns, or X’s, but certainly not all of them. Moreover, no single tool can handle all unknowns, some of which can build upon other unknowns. Information about X’s, and the constraints needed to make sure they’re under control, need to be passed from one phase of a design to the next, and frequently back again.
While this all makes sense on paper, reality is far more complicated than a flow diagram might indicate—and X’s frequently get lost along the way. The more features, the more interactions, the more modes of operation, the more power islands and voltage rails, and the more IP—both commercially and internally developed—the greater the chance of unknown behavior. Add to that engineering change orders, physical effects caused by increasing power density, and the looming threat of quantum effects at 10nm and beyond where current isn’t always released consistently over time, and suddenly this problem begins looking like some mythic, hydra-headed beast.
“There has been a lot written about X propagation and how to solve it,” said Bernard Murphy, chief technology officer at Atrenta. “What’s changed is the shared complexity of the problem. For one thing, we’re compounding uncertainty at each step. On top of that, there’s been a rush to create general-purpose X propagation tools. But those run into the same problem as general-purpose assertion-based verification—most designers don’t have time to do it effectively. The alternative is that you need application-specific solutions, but what applications are applicable? You can’t use a generic solution to X propagation. You need a way to isolate the problems.”
The impact of IP
A new wrinkle in all of this is the growing volume of IP, which generally is bought or implemented as a black-box technology. In the case of commercially developed IP, most of it is extremely well characterized. In the case of internally developed IP, the assumption is that a team is in place that understands how that IP was put together and developed. But that doesn’t always hold true on either count, in part because IP isn’t always used as intended and in part because it’s impossible to fully characterize IP for every possible scenario. In some cases, those scenarios haven’t even been invented yet.
“You’re bound to have unknowns assembling things that you never used before because you don’t know all of their quirks,” said Laurent Moll, CTO at Arteris. “You can map some of this ahead of time with models like TLM 2.0. Or with RTL, before you do the integration work, you do a minimal amount of performance testing. You also can set up a whole-system environment. What more and more people building large-scale devices are recognizing is that the most difficult job is system verification. It used to be unit verification.”
One strategy, and the basis of the network-on-chip approach, is to push complexity out to the edges of the design. That same strategy will be a central theme inside of stacked die when they become popular. While that helps, it still doesn’t resolve all of the issues involving unknowns.
“For each IP you have to analyze the X’s as you analyze the SoC,” said Pranav Ashar, CTO at Real Intent. “X’s can influence the implementation of simulation. It’s hard to get the initialization step right when you don’t fully understand the problem. With RTL simulation you may be weak in understanding all the X’s, so you fall back to gate-level simulation. That errs on the side of more X’s so it’s pessimistic, and then you have more X’s to debug. Some companies mandate all slots be reset. Others don’t because the designers refuse to do that, so you have longer verification time.”
False positives, false negatives
One issue with unknowns involves false positives, and their darker and far more insidious counterparts, false negatives. Chasing down false positives takes time. Not knowing where there are unknowns is worse because they may not show up in verification at all, and most engineers cringe at some of the highly public problems that have shown up in the past few years in everything from smart phones to cars. The best that can be hoped for, in that case, is to fix them after sale with software updates.
One tack is simply improving accuracy for existing tools. The alternative is to take a pessimistic approach and add more margin, but while margin can provide a buffer against unknowns it also defeats the purpose of moving to the next process node because it decreases performance, increases power requirements, and at 40nm an beyond it increases the number of physical effects that have to be dealt with and verified.
“One of the problems is that timing has been graph-based,” said Anirudh Devgan, corporate vice president of silicon signoff and verification at Cadence. “The better approach is path-based. You can recover about 3% of the margin with a path-based solution. In the past, the tools were not accurate enough to bound the problem. And when you consider on-chip variation, that only increases the number of unknowns.”
This challenge is particularly evident at the verification and signoff stage, where chipmakers get to see everything that is known to have gone wrong so far. Unknowns can stack up there in a giant logjam and turn the verification and debugging process into a nightmare. Fortunately, most of the companies that have been wrestling with these issues have seen this coming for awhile and are at least aware of what needs to be done on the design side.
A second approach is to isolate various sections of the design, through approaches such as behavioral modeling (see related story). What’s different about that strategy is that while it follows the usual divide-and-conquer approach, it requires raising the level of abstraction to get there. Either way, there is more to deal with in the same or less time, and there are interactions that no one has encountered in the past because the number features and overall density are increasing.
“There are certainly more complexities to deal with,” said Erich Marschner, product marketing manager at Mentor Graphics. “What some users don’t address is things that go beyond the basic checks.”
FinFETs, process variation and other surprises
Unknowns aren’t just confined to the design process, either. They have taken on a new dimension at the most advanced nodes with process variation and interactions that were unexpected, some involving software.
“For exploration and feasibility you can tune your constraints, and for established nodes that’s not so difficult,” said Mary Ann White, director of product marketing at Synopsys. “But there are good reasons why customers are moving to finFETs, and there are a lot of unknowns there. With real silicon you’ve also got on-chip variation.”
Add to that the sheer complexity of devices that may be plugged into a wall outlet sometimes, reliant on batteries at other times, and chock full of different voltages that may or may not be used depending on the user, the functionality or the location.
“The foundries give us slow-slow and fast-fast, but an SoC has to operate at all of these conditions and more,” White said. “That’s yet another variation of unknowns, and MCMM (multi-corner, multi-mode) is yet another constraint.”
Conclusions
X’s are not new. Unknowns have been dealt with by verification teams for years. What’s changed, though, is the sheer number of unknowns, the potential impact of one unknown on another, and the need to consider them as part of a system rather than in isolation.
Tools, methodology and education all help. But each of them alone—and in some cases, all of them together—cannot predict or solve all the potential adverse interactions. Complexity in design is spilling over in all directions, and the number of X’s is a direct result of that complexity. At best it can be contained, but it will never be completely solved.
Leave a Reply