The issues may be familiar, but they’re more difficult to solve and can affect everything from performance to yield.
Timing closure issues are increasing in magnitude at 7/5nm, and ones that were often considered minor in the past no longer can be ignored.
Timing closure is an essential part of any chip design. The process ensures that all combinatorial paths through a design meet the necessary timing so that it can run reliably at a specified clock rate. Timing closure hasn’t changed significantly over the past few decades–largely because each new process node presents similar challenges and physical phenoma. The same effects can be seen at earlier nodes. But those effects have reached the point where they are now impacting yield as well as power, performance and area.
“In the past, a typical design flow was built out of point tools,” said Igor Keller, distinguished engineer at Cadence. “Each tool got something separate. Each one may have been developed by different organizations, which were then put together in a loosely-integrated flow. The output of one piece did the delay calculation, for example, which was fed into another step of the flow like the the timer, and so on. There was a loose integration and exchange of files or data, but it was not really well integrated or architected with a single engine.”
Over the past couple of nodes, the precision of modeling timing has improved. But there has never been more uncertainty about the accuracy of timing results relative to silicon. Today’s pain points come in the form of waveform effects, process variability, on-chip variation, and tightening power, performance, and area (PPA) requirements, to name just a few, according to Jim Dodrill, senior principal design engineer in the physical design group at Arm. And while these have been around at all nodes, they are steadily getting worse.
“PPA is what designers are interested in achieving during their design cycle,” said Cadence’s Keller. “They need to meet their marketing objectives. The chip needs to have certain performance per clock cycle, certain power consumption, and certain area. But pretty quickly they realize this is not enough. They also need to include yield in their consideration, because today it is easy to achieve PPA on a very small subsystem where the use will be low, and that’s not acceptable either. Good PPA numbers are not enough. You also need to be able to have good yield for your silicon. That requirement from design houses to EDA vendors is definitely something new that has arisen in the past few years, and it definitely is more specific to 7/5nm where the pressure to squeeze everything possible in the PPA space is so tremendous that people are sacrificing yield. Engineering teams discuss PPA optimization all the time, and they are looking at the quality of the timing closure from the perspective of how good the PPA is that is seen in the end. It’s definitely important to have a flow which easy to use, which is stable, and fast but if those aspects are not giving you good PPA, it’s not very useful. They need to see good PPA coming out of the timing closure.”
None of this is a complete surprise to hardware engineers. “Transitioning to 7nm makes the challenges familiar from 10nm that much harder,” Benny Winefeld, solutions architect at ArterisIP, said. “That includes timing characteristics differing from mask1 and mask2 due to self-aligned multi-patterning, higher variability across the die, more corners and modes, lower voltages and higher susceptibility to noise, crazy DRC rules, high wire resistance on low layers and scarcity of resources on top layer. All of this is compounded by the fact that while cell delays scale down, wire delays don’t.”
This translates to somewhat different implications for the design of high-performance cores and SoCs. For cores, it mostly comes down to the quality of physical implementation tools, their ability to handle complex DRC rules, multiple timing scenarios, and the balance between different requirements (timing, power, area) – all within reasonable run-time, Winefeld explained. “In contrast, the top-level SoC design primarily has become an art of stitching a growing variety of IPs, including CPU cores, memories, hardware accelerators and peripherals. It’s characterized by long ‘flight’ distances and complex interconnect topology, through which data traverses while going from one IP to another.”
For the SoC, efficient early planning, starting from pre-RTL, of the interconnecting fabric becomes essential.
“This has a critical impact on whether design goals will be met, both architectural (latency, throughput, QoS) and physical (timing, area and power),” Winfeld said. “A network-on-chip architect should be able to take both logical and physical factors into account. For example, just a simple delivering of signals from one chip corner to another may take several clock cycles. This requires insertion of pipeline stages. If you insert too little, or put the pipe stages in the wrong locations, timing won’t be met no matter how hard the place-and-route tool works. Or, if pipes are injected too aggressively, timing may be easily met, but the price is higher latency, power and area.”
Fig. 1: Routing congestion can cause physical implementation problems, leading to timing closure issues. Source: ArterisIP
A more subtle example is planning of asynchronous clock domain crossings. “These are expensive, but unavoidable, as we’re having more heterogenous IPs on the same die. Also balancing large clock domains becomes exceedingly difficult due to on-die variability,” he said.
Physical effects and variation
Mark Olen, product marketing manager at Mentor, a Siemens Business, has observed similar problems, particularly with low-power designs. “People are designing with more and more clock domains, and that causes all kinds of challenges. At 28nm and below, glitches get introduced during the logic synthesis process that are unavoidable, and it has to do with the geometry. Interestingly, at 28nm and above, you can get away with doing all of the CDC analysis in RTL and then everything is correct by construction. But when we’re working at 28nm or below, clock domain glitches are occurring at reconvergent fan-out. This is driving the move toward gate-level/sign-off CDC.”
There are myriad complicating factors involving timing. For example, when designing the topology of NoC switches, one way to improve performance is to have fewer hops from origin to destination, which also can lower contention. “But without knowing how these switches will be positioned in the layout, the efficient design of a logical topology is not really possible,” Winfeld said.
Design teams have been using a variety of techniques and tools to help with these types of issues. Advances in gate delay modeling, for example, have helped reduce the excessive margining inherent in the on-chip variation (OCV) methods that have been used in the past. But as one problem is solved, another pops up.
“No significant advances have been made to model the effects of net delay variation, voltage variation, and delay changes due to aging effects like bias-temperature instability (BTI),” said Arm’s Dodrill. “The engineering community will need to advocate for better solutions and help EDA and IP providers prioritize which uncertainties to address first.”
Shekhar Kapoor, director of product marketing for Synopsys’ Design Group, agrees: “With progression to 7/5nm, the impact of process variation has increased even more, exacerbated by near-threshold operations. Variation in delay is becoming larger than nominal delay at lower voltages, and non-linear effects are becoming worse. This is necessitating the use of advanced margining approaches, which are no longer optional, such as advanced parametric on-chip variation (POCV) margining solutions, as well as extended timing/noise variation models. In addition, interconnect RC parasitic variation is another challenge that has emerged, requiring fresh review and tightening of corners to reduce pessimism.”
The challenge is not new, but it is much more acute. There is increased complexity, including more IP blocks, overall increases in design sizes, and tighter budgets.
“These challenges, combined with more corners and more timing and power scenarios, mean longer design cycles,” Kapoor said. “There is a continued escalation of the need for high performance, capacity and more resource-efficient options for analysis and timing closure. Physical awareness is yet another critical aspect for timing closure at 7nm/below. Physical rules are much more complicated, and physical context or proximity effects can have a non-trivial impact on timing. It requires sign-off-quality, full-chip-capacity physical ECO solutions with full-feature physical design rules support to accelerate design closure.”
So while the leading-edge process technologies have made it possible to build processors with greater parallelism running at frequencies beyond several gigahertz, the rate of change in current which these processors can produce is increasing, said Arm’s Dodrill.
“A processor can go from a low-power state to a high-power state, or vice versa, within a few nanoseconds,” Dodrill said. “This rapid change in current, di/dt, results in large voltages across the parasitic inductances in the package and can induce ringing on the power supplies. Quantifying, margining for, and suppressing the voltage droop caused by rapid current changes is the next frontier for successful timing closure. Because of the complexity of modeling timing degradation due to transistor aging, there is no EDA solution on the horizon, so the industry will have to coalesce behind some common approaches to solving this problem.”
Where it fits in the flow
Put in perspective, timing closure is the last hurdle in any chip design.
“You do functionality, you do verification, you do validation, and so on,” said Rajesh Ramanujam, product marketing at NetSpeed Systems. “Finally, you have to make sure the chip you’re building is able to meet the timing requirements. You want to make sure that all of the wires are running as fast as they have to, that all of the gates are running as fast as they have to, and that you can actually implement a viable product. This takes quite a few months to close these things, and it effects the time to market. As such, physical timing closure has become a very big deal, especially in the lower process technologies. Particularly with 7 and 5nm, the wires are getting relatively slower than the gates. The gates are improving, but the wires are actually pulling us down. Designs have become much more sensitive to the number of wires.”
To this point, wire reduction is a significant task, and many engineering teams are utilizing on-chip interconnects to help with this because it takes a big interface and channels it down, reusing wires as opposed to using dedicated wires, he explained. “That’s one way of reducing the wires at a very high level. But it’s not all physical. There is also an aspect of how these IPs are built. Even with those at 5nm and 7nm, there are physical implications. What really matters is how these IP are built right at the conceptual stage, and physical designers need guidance. They need the guidance from the IP providers because the IP vendors understand the base technology better than the physical designers. Physical designers understand transistors, they understand how to implement, but they don’t really understand the functionality of those IPs and what the repercussions could be. So they need guidance from the IP providers,” Ramanujam said.
Still, the biggest problems are occurring where design teams are best equipped to deal with them. “Strangely enough, at the most advanced nodes, this seems to be getting easier,” said Mike Gianfagna, vice president of marketing at eSilicon. “The number of foundries there is decreasing and the depth of technology knowledge is increasing. So the confidence level in SPICE models and timing closure are better, although they still aren’t perfect by any means. What we’re seeing at those nodes is that power is becoming a bigger challenge than timing. Power closure is the number one challenge for us.”
Timing closure and power closure are related, though. “If steps of the flow are very different in the way they treat waveform effects or variability, or in the way they model different physical effects, and if they are not consistent with each other, there will not be a good PPA result in the end because the flow is going to iterate all the time and will never converge,” said Cadence’s Keller. “If we don’t see consistent PPA metrics from the early stage of the design to the later stage of the design flow, that will never lead to good PPA in the end. Therefore, the number one challenge is to create a flow starting from synthesis and ending at timing, which has very a consistent analysis engine. In earlier generations of solutions, the place-and-route timer was different from sign-off, and the numbers were not matching well. Underlying this, the algorithm for static timing analysis must be robust such that a change in any input data, whether it is parasitics, library, user input, will not cause dramatic change in the output.”
And for physical designers who are now more involved in the design from early stages, because they may not know what to expect from nodes like 7 and 5nm, they have to do more work up front in the design cycle. That includes providing hooks at the IP level.
“One way this is done is by modularizing the interconnect by parsing the problem,” said NetSpeed’s Ramanujam. “If we modularize and structure it down to the point that the physical designers can focus on one small aspect, they can pretty much solve the entire interconnect as opposed to building a giant interconnect, which is not modularized. If it is flattened, the physical designer has to solve a much bigger problem and the iterative loop is much worse. No matter what the process node, there are challenges that are common across the board. It doesn’t matter if it is 7 or 5nm, or 10 or 14nm. If you solve those things up front, there is much less need to solve them at a later stage. Otherwise these common challenges become really huge.”
Related Stories
Timing Closure Issues Resurface
Adding more features and more power states is making it harder to design chips at 10nm and 7nm.
The Problem With Clocks
Clocks are power- and area- hungry, and difficult to distribute in a controlled manner. What is being done to reign in these unwieldy beasts?
Tech Talk: Timing Closure
Why timing closure is suddenly a problem again and what to do about it.
Hi,
Great article. Just wanted to know if there’s any chipset brand that is particularly facing this issue?
Thank you!