Timing is increasingly dependent on vectors. Can static techniques continue to deliver the necessary results? Maybe.
The chip industry traditionally has relied on margins to help them mitigate timing problems, but an increasing array of factors are now influencing timing. Can static timing analysis evolve to address these problems?
Static timing verification (STA) was a cornerstone technology for the acceptance of the register transfer level (RTL) abstraction. It showed that functionality would not be impacted by timing, so long as the longest combinatorial path would settle within the clock period. During the ’90s that simply meant adding up the number of gates, multiplying by the gate delay, and comparing against the clock period. Then, wires started to impart greater delays than gates, and the calculation became dependent on place-and-route. That drove adoption of physical synthesis. But the delay calculations were still fixed.
Today, there are many things that impact timing, and most of them are activity-dependent. To make things even more complicated, that activity is impactful over orders of magnitude of time, ranging from nanoseconds to operational time.
Until the establishment of STA, all designs had to be simulated at the gate level. “That is not scalable,” says Wei Lii Tan, director of product management at Siemens Digital Industries Software. “We were not able to run full custom simulation or transistor-level simulation on designs back then, and we certainly can’t today. Designs are getting larger. There is an increasing number of effects that have to be taken into account during the digital flow. Having said that, STA has been evolving and will continue to evolve to address these new impacts that are coming. The alternative is that I have to custom time everything, and that’s not scalable.”
This is not impacting everyone today. “Most people just take general margins,” says Marc Swinnen, director of product marketing at Ansys, now part of Synopsys. “With general margins, you assume that every gate could be up to this much voltage derated, and hence every gate could be slowed down by this much. That may be an expensive assumption to make across the entire design, when in fact it’s only a tiny fraction of the gates that actually experience a voltage drop anywhere near that big. You make everybody pay for the sins of a few. Static tools are generally not activity-driven. The whole beauty of STA is that it is activity independent.”
There are many factors to consider. “We use the classical method, which is to put in margins for unknown problems,” says Andy Heinig, head of department for efficient electronics in Fraunhofer IIS’ Engineering of Adaptive Systems Division. “Not everything we do is that advanced today, but I expect it will become necessary at some point. It’s really related to products. Which type of product, what frequency is it intended to operate at, and so on. That can make the decision very hard.”
It is a balancing act. “When silicon first comes back from the fab, it tends to fall short in operational frequency,” says Ansys’ Swinnen. “And while margins are considered to be safer, and you won’t miss anything since you’ve seen the worst of it for everybody, that is not always the case. You might find that with a detailed analysis, you have missed the worst case. It all depends on the risks that you are willing to take.”
There are several factors impacting timing. The shortest of timescales is IR drop. Medium timescale includes things such as thermal, while aging is the longest timescale. These are all activity-dependent. New manufacturing technologies, such as 3D stacking, are adding things like stress-induced timing. All of these need to be taken into account if static timing analysis is to remain relevant.
Small timescale
IR drop happens when transistors demand more current than can be supplied at a given moment in time. “With the newer nodes you get more transistors crammed into the same area,” says Swinnen. “In addition, these transistors switch more quickly, which means that you have more intense, sudden current draws. You get more change of current over time (dI/dt), and that current has to be sourced locally. While there are capacitors off-chip, from the transistor’s point of view that is far away, and there is so much resistance between it and that capacitor that the current will never get to you in time. If the current can’t come, the voltage will drop. It is a localized problem, and harder to solve using decap capacitors.”
This has been a problem for advanced nodes for the past 10 years. “Customers understand the impact, and they are already designing for it,” says Manoz Palaparthi, senior staff product manager at Synopsys. “IR is highly dependent on vectors, so having a knowledge of the worst-case impact that’s going to be caused by different vectors needs to be considered as a part of IR-STA.”
That does require an extra analysis step, however. “Most tools have instance-based IR drop static timing analysis,” says Siemens’ Tan. “The tool is given information about the IR drops for each instance. The impact is calculated based on the voltage drop going into each instance, and then the .lib contains information that tells the STA tool how the logic blocks are going to behave based on that IR drop.”
Each of these gates needs to have its performance derated based on the voltage. “You need multiple libraries characterized for multiple voltages, and then they interpolate between those to find the performance of the gate at a specific supply voltage,” says Swinnen. “Given the voltage values, the STA tool can then do a regular analysis. It just changes the library information for each of those gates based on its real voltage. But it’s a lot of information, and typically it is only done for critical paths or timing-sensitive paths.”
Because the voltage drop is activity-dependent, there is a risk that the worst case is not considered. “Some designs have experienced significantly more voltage drop than expected,” said Swinnen. “As you crank up the speed, at some point you’re going to have dynamic voltage drop that chokes off the performance. It results in not quite being able to achieve your supposed target performance. And it’s typically undetected or escaped voltage-drop conditions.”
Medium and long timescale
An emerging problem is heat, which will become a major contributor to delay when 3D stacking becomes more common. “So far, it hasn’t been a problem,” says Synopsys’ Palaparthi. “Customers are dealing with it by setting a uniform derate that applies to the whole die. But that is changing because of HPC designs, when you have multi-die stacking. Now, the impact of temperature is not uniform across the die. If you go with a single derate, a single margin, either you are missing something or you are over-designing. That’s where thermal-aware STA is becoming really important.”
Thermal impacts are being incorporated into an increasing array of tools. “Even on large chips, you do have temperature gradients,” says Swinnen. “Traditionally, place-and-route tools haven’t gone to the point of actually measuring temperature. They use power density as a proxy. Power density is the amount of power being used by the gates, and you count them up in a square and give it a number. It’s a relative measure of how much power is being produced in each little sector. Areas that have a higher power density will have higher temperatures, and you can differentiate parts of the chip being at a different PVT corner.”
Aging and manufacturing variation started becoming a concern about 10 years ago and is highly important to industries, such as automotive, where product lifecycles are longer. “We started out by putting in a flat derate,” says Tan. “I’m going to add an X percent derate, just to margin in the variation that’s going to happen. That was too pessimistic. It evolved first into on-chip variation settings, and later evolved into instance based. For each instance, the tool will calculate the impact of variation. The .lib evolved to be more granular in its approach – it is not a one size fits all, which is more pessimistic.”
Aging takes a similar approach. “Customers apply a set margin for the whole design to get aging, but with the process nodes coming down, and with the supply voltages coming down, any margin is leaving something on the PPA side — usually leaving performance on the table,” says Palaparthi. “Today, we are doing a true native aging analysis that computes the aging impact on timing, taking different BTIs, different activities, different time horizons, and then we can do native aging analysis. This is becoming a mainstream application.”
As the industry continues to adopt 3D stacking, many of these issues become worse. “Thermal densities become larger,” says Tan. “Co-optimization comes into play. For example, how do you optimize logic between two adjacent chiplets? At the very basic level, it’s still STA, but now you have more factors to take into account. STA will have to evolve to be more efficient and with more granularity. Design flows have to become more aware of timing, all the way from floor-planning to the final route and the sign-off stages.”
This will become a necessary step before third-party chiplets can thrive. “As you look at chiplet designs, you’re not going to be able to time across all the chiplets,” says Joe Davis, senior director of product management at Siemens Digital Industries Software. “It’s going to behave slightly differently, depending on how it’s integrated, whether it’s integrated onto a silicon substrate or an organic substrate, or if it’s sandwiched between different things. But you’ve got interfaces and handshakes between everything, and these enable you to make it a solvable problem. Future innovations will occur to bring in more of these challenges, one at a time.”
The manufacturing process also introduces new issues. “Stress is something that’s emerging,” says Palaparthi. “This is not as prominent as IR, thermal, and aging today, but it will be in a year or two because of multi-die and HBM stacking. The stacks generate so much heat and warpage that stress will become important for those markets. Backside metal also creates complications. When you have heat coming from the bottom, it will impact very non-uniformly. If it is 10° or 20° difference between hot and cold points, that may be manageable. But if it’s more than that, you need to model it because it will impact timing.”
It all becomes a little complex. “The problem is, how do you assign which gate is at which temperature, at which activity, and which age,” says Swinnen. “Putting all that information together to then drive timing, that’s where the problem is. It’s not the final calculation that’s the issue. It’s assembling all the necessary information to accurately capture the variation across the chip.”
Methodologies
There is no standard methodology that everyone uses. It is very dependent on the markets you are addressing, the technology nodes being used, and the degree to which frequency is under pressure.
“We learn from old tape outs,” says Fraunhofer’s Heinig. “For yield, you get a good feeling about what is possible and what is impossible. If your next product is very similar regarding the frequency and technology, you have good knowledge that you can re-use. But if you switch to a new technology, you may need one or two rounds to learn more. You can improve it by redesign, or you will in the next design. If you are designing a product with high volume, many companies do a respin to improve yield. When you have a product with a smaller volume, you may not be that aggressive with your margin.”
Companies can progressively tackle the problem, depending on the level of the problem they expect. “If you look at the digital implementation flow, it’s not one homogeneous flow,” says Tan. “There are stages where you are making broad strokes, like synthesis and placement, and then there is a stage toward the end where you’re doing ECOs. You can do a lot of fine-tuning. There’s a spectrum of where you use derates and broad strokes, and then when you get towards the latter part of the flow you can build in a lot of granularity — instance-based ECOs and instance-based timing analysis.”
Help is required to ensure the effort is spent in the right places. “We have something called graph-based analysis that looks at the full design,” says Palaparthi. “It gives you a report of the whole timing, and then for the worst and critical paths you can do a more exhaustive path-based analysis (PBA). That gives you more detail and highly accurate local analysis for both critical-path timing and sensitivity in terms of IR. For example, I can have a path that is highly sensitive to IR. Those are important to annotate back to the IR tool, get the IR impact, and then do the timing analysis again for those paths.”
All of that needs to be considered early. “This is where architects earn their money,” says Siemens’ Davis. “This is why floor-planning in chip design is such a critical aspect. There are tradeoffs between getting higher performance by putting things closer together, because it results in less delay, but it also creates more heat, which is going to cause it to slow down. There’s a balance.”
Many of the factors are vector-dependent, and while these may be useful for analyzing IR drop, that approach is less realistic for thermal and aging. “You can provide vectors as an input,” says Palaparthi. “You can take an FSDB file that shows activity for each node, and then do activity propagation for the full design. Based on that, we do a quick power analysis to get the timing impact. We do consider that a part of aging analysis, but it’s computationally intensive. Another approach is to apply some static ways of solving this problem. You can apply a toggle rate or a static probability for the whole design, which is based on the designer’s expertise.”
Solutions have to balance accuracy and computational cost. “As you get to very large systems, you have to use a vectorless approach,” says Davis. “But a vectorless approach isn’t going to tell you anything about when something is going to happen. Where are my hotspots? When are my hot spots? Is this floor plan going to work? Early simulation becomes more critical from a physical standpoint. That is a challenge today, because a lot of that simulation doesn’t come until late in the process. Shift left — how can I get a good enough model to be useful, while I am getting to the point where it’s really known? I need data early that is perhaps not perfectly accurate, but useful information.”
Not all parts of the design may warrant such deep analysis. “Some companies may want to be very aggressive for the CPU block because that’s where they can make a lot of difference in terms of getting the whole chip performance maxed out,” says Palaparthi. “They’ll spend a lot of time tweaking the margins as much as they can for that block. For other blocks, not as much. The ROI is not that significant. It varies based on the project cycle, the complexity, the risk tolerance. It also varies based on the application they’re doing.”
If the risk of escapes keeps you up at night, Davis offers another alternative. “One technique that has picked up steam is putting IR drop sensors or temperature sensors in the die to do dynamic correction of the clock as you detect excursions. That really changes your perspective on what you’re designing for if you have a safety valve that says, ‘If there’s an IR drop excursion that has the potential to create race conditions, or performance issues, I’m going to lower my clock until I get past that, and then I’m going to recover.'”
Conclusion
While there are many added challenges, they all can be solved with enough computation. “The reports of STA’s death are greatly exaggerated,” says Davis. “There’s always a scope within which it’s applicable. It is about defining that scope, and as we incrementally add capabilities, different voltages, different temperatures, or different process corners, you can analyze the effect. Timing is a worst-case propagation. It says worst case, I’m going to arrive before this, and worst case, I’m going to switch in this time. If we partition, you can always define an area where that’s going to be valid.”
EDA companies are making rapid progress. “This is a really interesting time in the sense that the complexity is increasing and so many new effects are coming into play,” says Palaparthi. “IR, aging, thermal, stress – these are extremely important for accuracy and margin reduction. The other side of the problem is that with the growing number of chiplets, the growing number of scenarios, the growing number of cell instances in each die – it puts a huge strain on STA tools in terms of the compute needs and the TAT needs. A lot of innovation is going into this area to solve these problems.”
Leave a Reply