Being able to isolate problems and bring together the tools and expertise to deal with it may be the hardest challenge yet.
For years, the motto among design and verification engineers has been to look at the individual pieces of a design because it’s impossible to have a single tool or even an integrated collection of tools that can debug everything. That approach isn’t changing, but the method for getting there is.
The driver behind this shift is a familiar one—growing complexity. Even platforms and subsystems are too complex to verify and debug all at once, and as the number of transistors on a piece of silicon continues to increase, that challenge will only become more difficult.
The new approach, and one that has been showing up in bits and pieces for years, is to abstract out narrower pieces based on everything from function to interaction. There is no single tool to do this, and sometimes even identifying what needs to be tracked and tested is difficult. But once the function or path is identified, it can be analyzed using a combination of techniques such as modeling, formal verification, trace and statistical analysis—basically creating a distribution of how the likely usage model will look and how it could affect or be affected by other parts of a design.
This is like attacking a disease with a cocktail of drugs. A single drug might stop one symptom, but it doesn’t stop everything. In this case, the everything is an understanding of what other problems might erupt, how that could change if a number of conditions are met, and how that might change again if something else was substituted, such as a different memory or processor configuration. It requires a multidisciplinary team of pieces that don’t ordinarily go together—including but not limited to software engineers, DFM experts, formal experts, verification experts, system engineers and even some with implementation and RTL expertise. And it requires enough knowledge of all the tools in use to be able to build something that solves their specific problem for the device-critical paths.
A lot of this could be avoided at older nodes just by guard-banding a design, but at the most advanced nodes any extra margin raises questions about why move to the next node at all. It affects performance and power, and it eats up area—the full spectrum of PPA. The solution is lots of slices within a design, as well as the normal way of designing and verifying the overall system-level design.
Put in perspective, what’s happening is that there is a push to rationalize and more effectively use limited resources within a design organization. While there has been much talk about breaking down silos inside of engineering groups, the reality is that there need to be silos to get some things done and no silos to get other things done—and often involving the same people. The confluence of software with hardware, the compression of the back end with the front end, and the reality of implementation an integration all need to come together at specific times and not at others.
This is a new kind of flow management, and while it may look ad hoc on the outside, it needs to be carefully orchestrated and coordinated on the inside—around narrowly defined abstractions. Complexity in organizations needs to follow complexity in designs in the only way it can be understood. And that may be the hardest job yet.
—Ed Sperling
Leave a Reply