Adding more features and more power states is making it harder to design chips at 10nm and 7nm.
Timing closure has resurfaced as a major challenge at 10nm and 7nm due to more features and power modes, increased process variation and other manufacturing-related issues.
While timing-related problems are roughly correlated to rising complexity in semiconductors, they tend to generate problems in waves—about once per decade. In SoCs, timing closure problems have spawned entire methodologies, and they have created demand for a number of new tools and capabilities across the design flow. The last enhancements and rollouts have kept these issues under control since 130nm, when copper interconnects were first introduced.
But new challenges are surfacing that stretch beyond the capabilities of current tools and methodologies. Among them:
• The continuation of Moore’s Law has opened the door to many more features, all of which need to share resources such as memory and I/O. Many of those are connected together using longer, thinner wires.
• Power and thermal issues are forcing design teams to look at multiple options, including multi-die packaging, where chips, packaging and boards can affect overall throughput and power, which in turn has an impact on timing.
• Multiple patterning, process variability and other manufacturing-related issues are now impacting timing. That is forcing design teams to deal with these issues much earlier in the flow, or risk yield and reliability problems later on.
More area, more features
Shrinking features creates more room on the same die, and a typical smartphone is a good example of just how many can be crammed onto a piece of silicon today. Some of those features, such as a global positioning system, were sold as separate devices until several process nodes ago. Moving them onto a PCB was a first step, but moving them onto an SoC dramatically reduced the cost and improved their overall performance.
That may have seemed like a simple enough solution when there were a few key features. However, this approach has been repeated so many times, transforming devices or chips into IP blocks or subsystems, that it now requires sophisticated floor planning and routing to connect all of them to memory and to each other. Wires are longer, and the diameters of those wires are smaller. That results in RC delay, an increase in the amount of power required to drive signals, as well as more heat and a number of other physical effects that can disrupt electrical signals.
“If you exceed certain distances between two IPs, then you have to add repeaters, or pipeline those signals, to keep operating at the same speed,” said Charlie Janac, chairman and CEO of Arteris. “Otherwise the drive strength is not enough from the IP. This was okay at 90/65/45nm, but at 28nm distances got longer. On top of that, voltage dropped to 0.5 volts. Relative resistance has increased.”
He said that with finFETs there may be as many as 6,000 pipelines and 60 timing parameters. “You have to automate this. We had one customer that refused to do a product review, and they couldn’t close timing. The damage was huge. They lost $200 million.”
Integrating more features raises other problems, as well.
“There is only so much logic you can cram between flops,” said Bill Neifert, director of models technology at ARM. “You need to design and configure changes to meet timing, but at each new node you crank the voltage down, speed everything up, and add more functionality. It’s not just nodes, though. It’s also each successive generation of a design. Even at the same node people are spending a lot of time architecting for throughput.”
How much of this can be done in advance varies greatly from one design to the next, from one company to the next, and even from one design team to the next within the same company. But there are several common challenges that need to be addressed.
“Timing closure is primarily a process of solving two challenges—frequency requirements and floor-planning needs,” said Rajesh Ramanujam, product marketing manager at NetSpeed Systems. “The frequency challenge by itself can be divided broadly into fast clocks and the interaction between various clock domains, while the floor-planning needs arise from silicon resource requirements and the physical design iteration cycle. From the floor-planning perspective, this would require front-end architecture exploration and design tools to import data from the early floor plan, being aware of the challenges, and hence account for it as part of the architecture automation process. This, combined with the synthesis and physical automation, will provide an overall system-automated solution. The other aspect of floor-planning is wiring congestion.”
Ramanujam said first reaction for dealing with this complexity is to divide everything into more manageable pieces. While this worked well enough at older nodes, the number of blocks has exploded in SoCs. Chipmakers report as many as 100 separate IP blocks and subsystems, often running at different clock frequencies and often developed using different methodologies.
“One team might be providing the coherent subsystem using timing methodology X, while the memory subsystem used methodology Y,” he said. “Floor planning is not an easy job because almost every component wants to talk to every other component. The divide-and-conquer approach doesn’t solve the problem. It puts all the burden on the interconnect that needs to connect these subsystems together—without compromising on performance. Coherency adds a few other functional checkpoints in the data path, and instead of solving it separately, the problem must be looked at as a whole, requiring physically aware coherent IPs. It is not so much that the actual problem has gotten worse, but more importantly that the solution needs to be scalable.”
Power plays an increasingly important role here. Throughput, signal integrity, and various modes of operation are all intricately woven together, with power as the common thread. All of them can affect timing closure.
“Timing closure is a huge challenge, but power closure is up there alongside it,” said Mike Gianfagna, vice president of marketing at eSilicon. “Budgets are tighter and at more advanced designs, you have finFETs with shorter channel lengths. The more fundamental issue, though, is that you have lots of IP and more and more IP from more and more vendors. Some of those vendors are small, some are large, and some are cutting-edge. So now you need to figure out the right operating points, the right IP, and that affects timing and power.”
Not all of these IP blocks are characterized the same way, either, which can affect the different modes of operation.
“During implementation, it’s normal to work only with a subset of dominant functional modes,” said Bernadette Mortell, senior product marketing manager for the PrimeTime Suite at Synopsys. “But during ECO for timing closure and timing signoff, designers want to validate all the operating modes against the expected process, voltage and temperature corners. In static timing analysis (STA) speak, you need to verify ‘all the modes’ against ‘all the process corners’ to sign off your design for timing. Writing timing constraints to represent all the intended operating modes is outside the scope of timing analysis, but where implementation and STA tools can contribute is by compressing the available operating modes down into the smallest set of unique modes that still reflect the intended functionality. This includes, but is not limited to, always on, or sleep or standby states. The constraint-merging process can be effectively automated by a tool that understands how STA interprets timing constraints and thereby which constraints can be compressed without loss of fidelity. When the tool doing the mode compression is also the core timing engine for the timing closure flow, it can provide confidence that using the compressed set of timing modes is safe for signoff.”
There is no shortage of approaches to dealing with these problems, and plenty of opinions about the best way to do that. To some extent, these differences depend on the starting point, which is often determined by the most important considerations in a design.
“Generally speaking, the most accurate way to analyze a design is flat, meaning no hierarchy,” said Ruben Molina, product management director for timing signoff at Cadence. “For static timing analysis of a flat design, it makes little difference if a portion of the design is turned on or not. Capacity is the main challenge. Capacity challenges are typically addressed with hierarchical design flows where blocks in the design are optimized based on boundary conditions and changes are made to the internals of the block. This allows multiple blocks to be timing closed in parallel. This can be scripted by reading in the block-level design and the top-level constraints to do the optimization.”
Still, getting this done using just one tool becomes more difficult as geometries shrink.
“The issues with capacity affect the implementation phase because a full physical database requires more memory than a timing database,” Molina explained. “Design sizes handled by today’s place-and-route tools are well under the expected 100M to 1B cell designs that are projected to be designed at the 7nm process node. This problem is exacerbated by the number of timing modes and corners that need to be analyzed.”
One approach to this problem is to separate out some of the features that are now integrated into a single SoC and connect them with a high-speed interconnect. This so-called “advanced packaging” approach is gaining favor from all of the major chipmakers, foundries, and OSATs, but it still isn’t seamless. And it raises a number of issues in terms of timing closure that have not yet been fully documented and automated because many of these interconnect technologies are new.
“With heterogeneous integration, you need a multi-physics analysis loop,” said Andrew Kahng, professor of computer science and electrical engineering at the UC San Diego. “That includes timing, IR drop, workload, and thermal. Architectural rebalancing is possible with heterogeneous integration.”
Back end vs. front end
Not all of this is in the hands of design teams and architects, though. Lithography may seem a long way off from the chip design phase, but the need for multiple patterning at the most advanced nodes dictates the shapes and layout of designs in the design phase. If it can’t be printed, then it doesn’t matter whether the timing works in a model.
“Pre-SADP (self-aligned double patterning), you would fabricate each layer differently,” said Andy Inness, principal architect for product deployment at Mentor Graphics. “Now, when you fabricate layer one, half the wires are in one step and half are in another. You have ‘even’ tracks, and then you fit the ‘odd’ tracks in between them with a new layer of deposition. The timing characteristics are different, and that can impact the different layers.”
Increased process variability can impact timing closure, as well. Variability has to be accounted for by tools, but the tolerances for variability decrease at each new process node while the variability goes up. Typically, that has been handled with extra circuitry, or margin, but that approach doesn’t work at advanced nodes because it can impact power and performance, which are two of the main reasons why companies migrate from one node to the next.
This has led to an increase in restrictive design rules, which have been creeping into advanced semiconductors for some time. The reason they have become less of a focus over the past decade is that companies operating at the leading-edge nodes have generated large enough chip volumes that foundries were willing to work with them on more customized layouts. As the market for mobile phones flattens, and as mainstream pushes forward all the way 28nm, or 22nm FD-SOI, the rules become much more rigid for more chipmakers.
“In the past, you could do place and route and it would be clean and it wouldn’t affect timing closure,” said Sudhakar Jilla, group marketing director for Mentor’s IC Implementation Division. “At 10nm and 7nm, we’re moving from double patterning to SADP (self-aligned double patterning). Now you have to deal with cycle violations. SADP adds much more stringent rules. But with two different layers, the RC can be different, depending on the context. That means routers need to do proper layer assignment.”
That gives engineering teams far less flexibility in designing those chips, however.
“You have to build a floorplan that is useful,” said Arteris’ Janac. “You need to avoid funky geometries and strange shapes and leave enough channel space. A standard SoC will not fix everyone’s problems. But for the average SoC, it also takes three months to add in pipelines manually, and if it doesn’t work, you need to do that again. That requires interconnect changes, and it leads to overdesign on the interconnect, which in turn causes an increase in latency, power and area. And when you get done with all of that, you have huge blocks, which pushes timing closure.”
New markets, new opportunities
As more devices are connected together, timing closure takes on new meaning, as well. What used to be just a normal step in the design flow is much more than that in a safety-critical system such as a car, or in a connected industrial system. A car that is about to collide with some other object or car has to react in real time, and timing needs to work flawlessly to make that happen.
The tradeoffs are similar to those in complex SoCs at advanced nodes. In fact, some of the same kinds of SoCs that are likely to be used in cars will be used for other machine learning and artificial intelligence applications, and timing in these systems can be very complex.
“There is a large impact of increased functional modes on the timing closure signoff phase of design,” said Synopsys’ Mortell. “It can increase the number of timing scenarios (modes x corners) that need to be analyzed and then closed to meet timing and declare the design ready for tapeout. This can significantly increase the cost of signoff, in terms of turnaround time on the analysis and ECO timing closure runs, and the machine resources required to complete the runs in a timely manner. Design managers must decide to spend the time and resources to analyze all the possible combinations of functional and test modes and process corners to be comprehensive in their timing signoff validation, or choose to run only a subset of the modes and corners.”
Timing closure is a growing problem at leading-edge nodes, and it will remain troublesome regardless of whether chipmakers continue to shrink features or turn to advanced packaging approaches. There are more features to deal with, more problems in manufacturing, and more interactions that are driven by and affected by power.
The result will likely be a combination of new tools, enhanced tools, and new methodologies that span from the front end of design all the way through to manufacturing. Timing closure is a well understood problem, but every decade or so it becomes so complex that it exceeds the systems and approaches put into place to deal with it. At 10nm and 7nm, that won’t be the only challenge facing design teams, but it certainly one that will demand increasing attention over the next few years.
Tech Talk: Timing Closure Video
Why timing closure is suddenly a problem again and what to do about it.
Stepping Back From Scaling
New architectures, business models and packaging are driving big changes in chip design and manufacturing.
10nm Versus 7nm
The economics and benefits of moving to the next process node are not so obvious anymore.
Moore’s Law Debate Continues
As the industry speculates about 5nm and below, questions surrounding node shrinks remain.