Concerns about electrostatic discharge and electromigration at advanced nodes are prompting some unusual steps; challenges emerge in mixing the big picture with divide-and-conquer approaches.
The move to the next stop on the Moore’s Law road map isn’t getting any less expensive or easier, but it is becoming more predictable. Tools and programs are being expanded to address physical effects such as electrostatic discharge (ESD), electromigration and thermal effects from increased current density.
Any or all of these three checklist items can affect the reliability of a chip. And while they were scarcely even a consideration at 40nm, they are becoming first-order concerns with 16/14nm finFETs using a 20nm back-end-of-line process. That doesn’t slow down the progression of feature shrinks, but unlike past nodes where one or two tools or different process techniques could solve the problem, the most advanced node now requires collaborative efforts spanning an entire ecosystem.
Consider the recent expansion of TSMC’s IP quality program. First instituted a decade ago as a way of ensuring that IP could be manufactured, the program last month added ESD checks as IP is integrated into designs.
“This is on top of manufacturing rules that are already in place,” said Dan Kochpatcharin, deputy director of IP alliances at TSMC. “We don’t know ahead of time how the IP is being connected. This is an extra check on how the customer has integrated it. We’re no seeing so many different IPs from different providers that we have to make sure that integration is not an issue.”
Addressing ESD at tape-out is way too late to fix it, of course, and tools have been available for years from multiple EDA vendors to control chip-killing shocks. But as the amount of IP being used in designs continues to climb, both to reduce complexity and to speed time to market, how that IP is being integrated is raising serious concerns at the most advanced process nodes. TSMC struck a deal last week to use Mentor Graphics’ tools to help certify IP for ESD once it’s integrated into SoCs.
“What’s new is that we’re now checking for circuit relationships,” said Carey Robertson, Mentor’s director of product marketing for LVS and extraction. “If you use IP differently, the IP version being used has to be validated against TSMC’s requirements to maintain electrical intent. The IP vendors don’t know if you’re going to put their IP in an environment with high current or overstress conditions, so the checks done at the IP level need to be revalidated at the IC level.”
Mentor isn’t alone in seeing these problems. Throughout the supply chain, the quality of IP is being measured increasingly in context at advanced nodes. “The question is how you bring IP into the context of the SoC,” said Mark Baker, technical marketing director at Atrenta. “This now goes way beyond specific rule checks.”
Nor is this just confined to IP blocks that go into a chip. It also applies to verification IP. At new process nodes, one of the trouble spots is the interface, particularly with multicore configurations that require cache coherency. “We’re seeing coherency bugs because the processors are giving a response that’s too lax,” said Scott Meeth, senior engineer for methodology and IP development at Jasper. “That allows data corruption when processes give other processes permission to change things. The challenge is to detect when things go wrong at the interface.”
The challenge is to somehow bridge the gap between the divide-and-conquer approach for developing pieces of chips with issues that span the full SoC and even the full-device.
“You can do place and route on part of the chip, but with power signoff you have to do the whole chip together,” said Anirudh Devgan, corporate vice president and chief technology advisor for silicon signoff and verification for Cadence’s Digital and Signoff group. “There are a lot more power domains and an increasingly complex power grid, and you no longer can overdesign for power because you can’t afford the margin.”
But reducing margin isn’t so simple. An unhappy reality at 16/14nm is that complexity far exceeds the ability of the human brain to even comprehend all the possible interactions. What is clear, though, is that some of these interactions are fatal to designs.
“We’ve got a lot of additional complications with finFETs,” said Rob Aitken, an ARM fellow. “Local current density is important because it can produce electromigration, and then you have rules for ESD. As people have become more interested in multicore, that has led to development of cores by power envelope. So one app may use 500 milliwatts, this one 700 milliwatts and this one 1,300 milliwatts. You need to understand the power allocation for multiple cores and cache, which may be different on the low power setting than on the higher one. The IP vendor has to target standard applications and configurations and processors and the average current is still key.”
However, foundries also are beginning to demand that chips be tested for peak current, as well. That’s a much more difficult number to arrive at in complex SoCs, Aitken said. “You don’t know the capabilities of the power delivery system. And with post-28nm nodes, this isn’t just automatic. FinFETs have added performance, but they also have added complexity.”
All of this complexity has to be captured and shared, too. And while the large IP vendors have been extremely fastidious about characterizing what they sell for as many conditions as they can think of, not all IP is used as planned—and not all of it comes from third parties. But crossing boundaries and sharing information does have an upside.
“When you package a part, each part goes through an ESD check,” said Mary Ann White, product marketing director for the Galaxy Implementation Platform at Synopsys. “We also have to check for EM. So there is a reliability aspect that needs to be considered when moving to the next node. But what we’re hearing from a lot of IP vendors is that although some pieces age faster at the most advanced nodes, once you get that stabilized it stays stable for a longer time.”
Another area that needs to be considered across multiple facets of the design process is the power grid. While this used to be done later in the design at 40nm and even as late as 28nm, at the 14/16nm node there is no wiggle room. It has to be created as part of the architecture, and bottlenecks need to be identified early.
“With current density increasing at 16/14nm, these are not just static problems anymore,” said Arvind Shanmugavel, director of application engineering at Apache Design. “Now you’ve got high speed clocks, voltage islands and package inductance. You can have instantaneous voltage drops and timing failures.”
Coupled with this is the need for a detailed map of thermal density. While peak current is difficult to map, detailed thermal density is almost required—and therein lies a growing issue.
“We’re beginning to see localized heating effects,” said Shanmugavel. “The only way to deal with this is with a proper model of the die—a chip-thermal model. With low-power electronics, heat is very important. A smart phone is less than a half-inch thick and you’re trying to fit in DDR, a microprocessor, and all the supporting components. If you don’t model this properly you can have thermal runaway issues. Most smart phones are passively cooled, so there is no heat sink or fan.”
This is going to become even more of an issue with devices such as Google Glass or smart watches. While heat may be bearable in a handheld device, it’s going to be much more noticeable next to your temple or strapped to your wrist.
Everything is connected
In some way or form, all discussions ultimately lead back to two issues. One is partitioning. The other is reliability and coverage.
On the partitioning side, the shift under way is partitioning on a much bigger scale. In some cases, that may even change the basic architecture of the chip and the board, as in the case of fan-outs and ultimately stacked die.
“We’re seeing a lot more decisions being made around how a design is partitioned for power, timing and area,” said Atrenta’s Baker. “We saw the first entry point for this kind of discussion at 28nm, but it’s becoming a much bigger discussion at 16/14nm. We’re also seeing a lot more interest in 2.5D, particularly with a direct interconnect rather than an interposer.”
But how exactly do chipmakers know that with all these issues using relatively new technology—finFETs and even 2.5D—will work? The answer is as complex as the problem.
“One thing that definitely helps is to insert coverage points in the design because figuring out the best way to exercise a single coverage point is not easy,” said Shawn McCloud, vice president of marketing at Calypto. “The big question is how do you know when a design is done and what’s your level of confidence. If it’s not a critical function or there’s a workaround in software, that’s relatively straightforward. But if you’re dealing with a modem, for example, you have to guarantee there are no errors because there is no workaround.”