Dealing With Deadlocks

Understanding of dependencies and IP interactions is becoming essential to SoC design.

popularity

Deadlocks are becoming increasingly problematic as designs becoming more complex and heterogeneous.

Rather than just integrating IP, the challenge is understanding all of the possible interactions and dependencies. That affects the choice of IP, how it is implemented in a design, and how it is verified. And it adds a whole bunch of unknowns into an already complex formula for return on investment.

“IP reuse on paper sounds like the right way to go but how much of the rework is necessary and where does it start defeating the whole purpose of IP reuse if I’m actually trying to rework all the intent on the SoC side,” asked Piyush Sancheti, senior director of marketing, verification group at Synopsys. “This is happening more, and is typically broken into two major classes. One is at the IP level, where the focus is more on intent-level verification. You want to make sure that the IP is verified for all the key functions that it was originally designed for. VIPs play a role in this context because they provide an exhaustive environment for verifying all of the underlying protocols and all the key functions that that IP was designed for,” he said.

At the system level, the focus shifts more to system-level verification, where the idea is to determine whether the IPs play nicely with each other. But end-device behavior isn’t always as well defined as the chip, and as complexity grows so does the risk that something can go very wrong.

“In functional verification the goal of the engineer is to make sure every IP is functionally correct by itself, that it’s doing the right thing in its own universe,” said Rajesh Ramanujam, product marketing manager at NetSpeed Systems. “Every IP uses silicon resources, and sometimes they share silicon resources, but the goal is making sure that every IP is functionally correct. Once you put them together at a system level, however, there could be contention for resources. For example, you have all these parallel universes where we assumed that every IP was working correctly. Once you them together, they clash. There are shared assets and there is contention of resources, which is where deadlocks come into the picture.”

To get a clearer picture, imagine Processor A is waiting on data from Processor B before it responds, and Processor B is doing the same thing. “Neither is doing anything incorrectly, Ramanujam explained. “They are both doing the right thing in their own universe. But when they are put together, the engineer did not take into consideration that there could be sharing of resources between them causing a deadlock. As a result, nothing is moving forward because everybody is waiting on something else and there is a cyclic loop. Unfortunately what happens is that the chips get deployed, and a few years later the whole system goes to its knees.”


Fig. 1: Interconnect with deadlock. Source: NetSpeed Systems

There are two primary places where this can be addressed effectively. One is at the very beginning of the design cycle. The other is after the various component are already integrated into an SoC.

“The architect needs to carefully think about cases that can cause deadlocks and architect around it,” said Chirag Gandhi, verification manager at ArterisIP. “In terms of verification, I would always recommend running IPs in a system-level environment. You might hit some deadlock cases by your random stimulus. In most cases, architects/RTL designers highlight cases they think can potentially result in a deadlock. In those cases, verification engineers can use irritators to artificially fill up resources and run simulations (sometimes very long ones) to hit these deadlock cases.”

Coverage issues
At its root, a deadlock is an issue that was not included in verification coverage.

“If an organization can say, ‘We’re not making progress,’ that’s actually not a bad thing,” said Harry Foster, chief verification scientist at Mentor, a Siemens Business. “The worst thing is when an organization doesn’t have a clue. This is so fundamental. For example, how do we know we’re making progress? We have different metrics in place. The obvious one is coverage, and an organization will look at that, and if they are looking at it correctly, they can ask why they are not closing coverage and whether they should move to a different strategy. If they can’t answer that—if they don’t even know they’re not making progress—they’ve got a problem. Where a lot of organizations have a problem is that they don’t really partition the verification problem correctly.”

One common issue involves a mix of internally developed and third-party IP.

“Where organizations make a mistake is when they attempt to close coverage at too high a level,” said Foster. “That’s where they will run into deadlock. They’re not making progress because it is too hard to close coverage. For example, if I waited until I entirely assembled the chip to try to verify coherency of the design, I’m in trouble.”

Foster said the proper way to go about this is to first verify the IPs and achieve the coverage that’s appropriate at that level, but not to try re-achieving that at a higher level. “Once I do that I assemble them into subsystems, and using coherency as an example, I’m verifying the fabric but I’m not verifying everything—only the fabric. I close only the appropriate coverage in these interactions between the IP. I’m not trying to close coverage down within the IP. Once I get that working, I then go to that next level of integration, which is integrating the fabric with higher-level or other pieces of IP, until I ultimately get up to the pure system. Organizations have problems where they’re not closing coverage at a low enough level. They’re focusing predominantly on, ‘Let’s go to the state where we integrate everything.’ That’s a problem.”

Methodology counts
Much of this can be attributed to growing complexity and many more possible use cases. But some problems can be spotted earlier with a better verification plan and tools flow.

“Proper decisions at this stage will save us from deadlock during the project,” said Zibi Zalewski, general manager for Aldec’s Hardware Division. “It is very important to decide the methodology at early stage and have all the required resources available when needed. A well-thought-out verification strategy along with an experienced team will help avoid project failure. The proper tools portfolio and verification strategy is absolutely critical for system-level design, and the ability to use the tools in multiple verification modes, interconnectivity with different applications via standard interfaces or data formats, and multi-language support is a must to avoid unexpected verification blockers.”

Others agree. Frank Schirrmeister, senior group director, product management in the System & Verification Group at Cadence, said deadlock must be approached from a number of perspectives, starting with solid engines and debug approaches.

“That’s where you need tooling that can allow you to run everything in lock step, at the right accuracy,” Schirrmeister said. “You need to be able to look at the software running on the different processors, or you need to be able to look at all the different IP blocks. And you need to be able to, in lock step, debug it. You need a solid environment (both hardware and software) for execution with debug around it. Second you need to be able to write tests and stimulate your design that you actually trigger those [deadlocks] because you cannot rely on just executing the software or switching it on and waiting for the bug to happen. You actually want to stress the corner cases and develop test specifically in that domain.”

On top of that, he said that design teams need to consider how these bugs can be avoided in in the first place, and they need to use formal tools to route out those that do exist.

Going formal
Pete Hardee, product management director in the System & Verification Group at Cadence, noted that deadlock verification often is carried out with formal verification tools and different apps can be used, depending on the nature, scope and design stage of the sub-system being verified. Deadlocks can occur in a range of circuits from simple finite state machines (FSMs) to complex protocols—in particular, cache coherence protocols.

“Fundamentally, property-based formal verification is a good solution for deadlock verification, because very specific timing of transactions may be necessary to provoke the deadlock condition. This is often difficult to achieve with non-deterministic timing of events in constrained-random simulation, but formal exhaustively exercises every possible sequence of events to expose everything that could possibly happen,” he said. Then, for verifying the implementation of common industry-standard protocols including cache coherence, Hardee said it’s very common to use formal property verification in conjunction with assertion-based verification IPs (ABVIPs) to fully check the design-under-test’s adherence to the protocols.

Sancheti agreed: “Formal verification is becoming a big part of it for the simple reason that with techniques like formal or structural, what you can do is do very dedicated testing of specific interfaces concerns that you may have. However, formal is not really meant for system-level verification but when it comes to specific, interface-level testing or verification, you can apply formal techniques. You’re not testing the entire system in that context.”

There are still many situations when traditional verification methods are used. However, Roger Sabbagh, vice president of applications engineering at Oski Technology, said the challenge of verifying the absence of system-level hardware deadlocks is not something that is well addressed by traditional verification methods. “Design teams have relied on SoC RTL simulations and in-circuit emulation (ICE) to verify system-level requirements, such as absence of deadlock. However, due to the complexity of modern designs, the coverage of corner-case scenarios using these methods is predictably low and subtle deadlock bugs may survive the verification process.”

He believes verification of control-oriented problems such as deadlock are a perfect fit for formal verification because formal can provide complete coverage, equivalent to that achieved by simulating all possible scenarios leaving no bugs behind. However, he admits formal model checking suffers from the exponential challenge associated with solving PSPACE-complete problems, and that proving absence of deadlock on the RTL model of an entire system with its multitude of states is impractical.

Instead, this could be addressed with architectural formal verification. Sabbagh said this approach leverages formal’s exhaustive analysis capability to explore all corner cases with abstract architectural models to overcome complexity barriers and enable deep analysis of design behavior. “This forms a powerful combination for effective system-level requirements verification useful to target areas not well covered by traditional verification methods, such as deadlock.”

And because the methodology does not rely on RTL model availability, it can be deployed early in the design phase allowing architectural bugs to be detected and fixed before they are propagated throughout the implemented design. In contrast, fixing late-stage architectural bugs found through full-chip simulation or emulation may require many RTL design changes since fixing these types of bugs often has a ripple effect that spans out to many blocks. It is preferable to find these bugs early, avoiding code churn that can result in a significant setback in verification maturity and lead to costly project delays, he said.

ArterisIP’s Gandhi also believes formal verification can be used to tease out architectural deadlocks, although functional verification at a system or multi-block level could also be used in conjunction with specific traffic and with the help of irritators, to hit deadlock cases. Also, running on emulators can hit deadlocks because you can run a lot faster and can send a lot more transactions to potentially hit deadlock cases.

But formal also has its limits. Mentor’s Foster said formal techniques fall apart at a high integration level unless an abstract model of the design is created, and architectural aspects are being proven. “When you get beyond designer-sized blocks, it takes tremendous skills, and it’s beyond the tools at that point because you’ve already encountered state explosion. What is required at that point is formal expertise in how to abstract the design, and there is a bag of tricks that formal verification experts use. Part of the problem in terms of these interactions we’re talking about in a full SoC — the interactions between this IP and that IP — is also hardware and software interactions. There you absolutely need emulation or FPGA prototyping. You need something because you just won’t get enough cycles in simulation.”

Ramanujam added that formal techniques currently on the market are not targeted toward deadlocks. That would require a dependency graph. “With functional bug verification, the formal techniques that are used in functional verification always make sure that the IP is safe against the specification,” he said. “An architect needs to provide the formal tool a specification of how they want the IP to work functionally. The formal technique uses that as the golden benchmark and makes sure that the IP is correct against that mathematically. Similarly, on the deadlock side of things, you want to have a formal technique to verify deadlocks. You need a specification of all the dependencies that could happen in the system, all the resources that could be shared, and the data needs to be presented in a particular format.”

Ramanujam stressed this is a preventive measure, not a post-mortem, and should be deployed before the IP is created. “Once you build an IP, how do you make sure deadlocks don’t happen? This doesn’t exist. You have to go back and run multiple simulations from use cases, and stress the IP in multiple different ways to make sure deadlocks don’t happen. The space is huge. This needs to be deployed before the IP has been constructed so that it is free of deadlocks because you have already formally proven all the dependencies.”

Conclusion
The overarching challenge with building systems today has moved from the IP to the integration level, because now the concern is about the interaction between the IP blocks rather than just whether it is functionally correct.

“Many times the problems that occur, such as the interactions, aren’t well understood,” said Mentor’s Foster. “For example, I can go buy a bunch of IPs and integrate them. In my mind I know what I want, but I didn’t realize that when ‘this’ does ‘this’ it’s going to affect this other IP in this way. It’s a difficult thing.”

Related Stories
Verification’s Breaking Points
Depending on design complexity, memory allocation or a host of other issues, a number of approaches simply can run out of steam.
System Coverage Undefined
What does it mean to have verified a system and how can risk be measured? The industry is still scratching its head.
Portable Stimulus Status Report
The Early Adopter release of the first new language in 20 years is under review as the deadline approaches.



1 comments

Kev says:

Another good reason to use CSP as a design paradigm (instead of RTL).

Leave a Reply


(Note: This name will be displayed publicly)