Deadlocks are more prevalent than anyone wants to admit, so how do we deal with them?
I am sure there is an anonymous group – like Alcoholics Anonymous – headquartered in Silicon Valley, meeting every quarter to discuss the deadlocks that have paralyzed their products, roadmap and deployments. In discreet venues in every town, small groups of engineers huddle together to share war stories about the disgruntled customers whose trust was lost because of a deadlock discovered only after the product was shipped. Deadlocks are more prevalent than anyone wants to admit, but there is a reason why they do not surface to the top of the news. It’s not the kind of thing anyone wants to advertise, of course, and it’s not a problem that is easy to find or fix. If you have been bitten by this deadly issue, then speak out, and read along, and save yourself from being prey to this in the future.
What are you doing?
Do you want to really be part of this anonymous group? Perhaps the real question is, what are you doing to not be part of that group? Hopefully, your answer is not to just wait it out and see how the silicon behaves. And even if does behave well in post-silicon validation, that doesn’t guarantee it won’t lock up in the field with your most prestigious flag ship customer. Ouch! Basically, you can never sleep peacefully. So now you also need to add the weekly Insomniac Anonymous meeting to your calendar.
What is a deadlock?
Let me share an example of a deadlock from my personal life. I have a 7-year-old daughter and I hope that, in a few years from now, she does not come back home and say, “Dad, this is my boyfriend, and I plan to spend the rest of life with him.” I can’t live without her, she can’t live without him, and I can’t live with him. So, what do we have here? A deadlock. But I know better than to just sit back and hope. So, I fill up her head with all kinds of information about how horrible boys can be and to stay away from them. My hope is that by preparing her and preventing the situation, I can avoid the problem and not have to deal with it later. But since this is real life, I also have a baseball bat behind the front door, just in case.
Kidding aside, when it comes to deadlocks, heed the words of Albert Einstein: “Intellectuals solve problems; geniuses prevent them.” Let’s looks at some of the causes and cures for deadlocks.
On-chip Interconnects play a key role
With today’s complex SoCs, the interconnect plays a key role in connecting various IPs that have a multitude of information flowing among them. All this data varies in size and bandwidth and, to complicate things, there is a high level of system dependency between the various traffic flows. The dependencies can be introduced by different layers, namely:
To put it simply, any kind of resource dependency can cause a deadlock and the on-chip interconnect plays a vital role since it connects all the IPs together.
Getting to the bottom (debugging)
Why do I refer to it as deadly? ‘Cause that is what it is. Root cause analysis of a deadlock takes an enormous amount of time and you’d be lucky to actually find the source of the issue. The average resources, in terms of man-hours spent on debugging a hung system is just plain madness. It’s like shooting arrows in the dark. As a solution provider, one needs to be on top of any issue that might be reported by a customer and be able to respond promptly with a solution or workaround. Neither the chip maker nor the customer want to spend weeks and months trying to figure what the issue is in the first place. It is not a pleasant experience!
Old school methods
The traditional methodology of writing directed and constrained random testcases to cover for these scenarios are not scalable anymore. More to the point, this approach does not guarantee that you have swept through the entire space. Especially when you are catering to the needs of multiple customers in different application and market segments. Customers tend to be highly secretive as to what they consider their “secret sauce,” which they do not want to share with the vendors. And understandably so. Then how can we ensure we find and solve these problems? And is this even the right approach?
Why formal techniques?
If you think about functional verification, directed testcases have given way to random and constrained random verification methodologies. And now we have formal techniques to prove the functional correctness against a specification. Formal techniques have come into this space because they give a more mathematical proof of correctness instead of using scenarios made up by engineers. And the industry has accepted it with open arms.
Deadlock verification and avoidance is seeing the exact same approach to make it scalable and, more importantly, correct for ALL cases. Formal techniques for deadlock avoidance are here to stay and NetSpeed has pioneered this approach.
The new age
The beauty of using mathematical techniques to formally prove that interconnects do not have deadlocks is that the problem is addressed during build time rather than late in the development process during testing. Hence the designs are correct by construction, systematically, by using the methods below:
Every dependency is captured and every resource is tracked in a Resource Allocation Graph (RAG). Using formal techniques, we prove that deadlocks cannot occur in the system. So, no more writing unending pathological test cases hoping to cover the various possibilities statically and temporally. Customers have access to this RAG, which they can then stuff inside their pillow and sleep peacefully.
How about customer-specific system dependencies?
Great question. As an IP provider, we have information on transport, routing, protocol, and most of the system dependencies, but customer-specific system dependencies are not typically shared with us. To account for this, NetSpeed also provides a platform called NocStudio that allows users to specify other system dependencies without sharing it with us directly. These user-specified dependencies, along with the implicit ones that the platform is already privy to, become the golden specification against which the formal techniques are applied. This ensures that the design generated by the platform, including topology, routing and resource sharing, is deadlock-free.
Bottom line, you don’t have to be part of this deadlock anonymous group. To learn more about developing deadlock-free designs, register for the upcoming webinar, “Debug, Analytics, NoC, and beyond… Exploring uncharted galaxies of interconnects!” presented by NetSpeed Systems and UltraSoC Technologies.
Leave a Reply