Getting the design right the first time has higher stakes than ever before.
As designs move toward 7-nanometer (nm) process nodes, engineering and production cost dramatically increases and the stake in getting the design right the first time becomes significantly higher than ever before. You are faced with the question, “how confident are you in your design analysis coverage?”
Tighter noise margin, increasing power density, faster switching current and greater interdependencies between multiple domains and multiple physics are key challenges facing design teams. Traditional methodologies using divide and conquer approaches and guard banding are no longer adequate in ensuring that your product can be delivered on time and function as expected. Existing tools are limited by capacity and performance. Their limitations make simulation of billion+ instance designs in a reasonable amount of time difficult, and as a result simulation coverage for these expensive and complex devices poses an increasingly difficult problem. To manage time and schedule constraints, you will be forced to run only a handful of analyses focusing on best- and worst-case conditions. In addition, the single domain nature of existing tools makes multi-domain and multi-physics simulations infeasible. If you are limited to running only a fraction of a second of best-guess vectors, without any visibility into interactivities between different domain and physics, how confident are you in signing off your $20M project?
Increasing design verification coverage is key to achieving first silicon success. In order to increase coverage, you will need access to fast, flexible and smart simulation environments that will allow you to quickly simulate, analyze and fix your designs to come up with the best results. This, complemented with a flow that considers the impact of multiple variables and multiple scenarios, provides the needed coverage and confidence.
Analyses need to consider both local and global effects
To better understand how your design will perform under real life conditions, you need to analyze your designs for local effects as well as for global effects. Focusing specifically on power and its related impacts, local effects cover factors such as current density, grid resistance, and decap sizing and placement. Interactions between these factors affect timing and reliability at a regional level. Global effects, on the other hand, cover effects such as chip-package-board resonance arising from switching states of macros, IPs and memories resulting in significant collapse of the supply voltage at the chip-package boundary. These two effects are often mutually exclusive. Even if you design a power grid very well from the bump down to the transistors, your chip can still fail if there is a resonance condition (as noted in this paper).
Since local power grid-related failures arise from a combination of localized simultaneous current and weak power grid, a typical approach is to ‘over-design’ the metal routing dedicated to the power/ground networks globally across the chip. However, this is an extremely expensive proposition not only in terms of routing resources but in the impact it has on the overall schedule by making timing closure considerably more difficult. To solve timing problems, you may end up upsizing the drivers, which in turn have higher switching currents, which increase the voltage drop, creating a vicious cycle. Over-design will not necessarily solve the problem, as there may still be situations that are beyond the design parameters.
Multi-domain statistical analyses with directed scenarios increase design coverage
Instead of trying to be everything to everyone, a more appropriate approach would be to start with a reasonable power grid and then isolate and fix various weakness areas. To gain confidence that you have sufficient coverage in identifying weakness areas, you need multi-domain statistical analyses coupled with directed and scored transient scenarios. This needs to be performed across 100’s of different switching scenarios against various power grid quality metrics. The switching scenarios can come from a combination of use case vectors (RTL or gate level) or logically coherent VectorLess activity modes.
Big data analytics techniques enable efficient and targeted design fixes
Traditional solutions lack the flexibility and performance to analyze large volumes of data while overlaying them with other design attributes, such as power grid weakness and timing clustering, to highlight areas of possible local failures. To rapidly profile a design across multiple scenarios, variables and operating conditions, it will become necessary to employ big data analytic techniques as shown in figure 1. Using such a targeted approach will allow you to focus on fixing where the fix is needed while freeing up routing resources across most of the chip.
Figure 1: Multi-variable scenario ranking for power coverage.
Multi-core SoCs need fresh approaches for design exploration and sign-off
Global switching rail collapse, on the other hand, is an emerging and real problem for multi-core SoCs. Switching events on multiple blocks can create issues that are hard to find and fix, such as instantaneous voltage drop caused by combined di/dt and aggressive regional supply demand, as well as a chip-package-board resonance condition. A global PDN collapse cannot be fixed by over-designing the power grid only at the chip. Instead, it requires a comprehensive look across the chip-package-board PDNs. Often times, fixing such multi-scenario conditions requires changes in architecture, package or even software. These types of changes need to be addressed early in the design process to minimize downstream design impact.
It is virtually impossible to identify global switching scenarios that can cause catastrophic failures by using a traditional simulation-based approach. To filter through 1000’s of various combinations of SoC level switching modes, you need big data-driven rapid analysis techniques to understand which issue to focus on to identify possible fixes.
Traditional methods of silo-based and margin-driven approaches are no longer sufficient for designs at 7nm process nodes. In order to ensure first silicon success, design teams need to have the confidence in the coverage provided by their tools and methodologies. The coverage needs to be comprehensive for both local and global switching effects, avoiding overdesign while providing sign-off confidence.