Variation Issues Grow Wider And Deeper

New sources, safety-critical applications and tighter tolerances raise new questions both inside and outside the fab.


Variation is becoming more problematic as chips become increasingly heterogeneous and as they are used in new applications and different locations, sparking concerns about how to solve these issues and what the full impact will be.

In the past, variation in semiconductors was considered a foundry issue, typically at the most advanced process node, and largely ignored by most companies. New processes came online every couple of years, and foundries had to figure out how to correlate what got manufactured with the original spec, which was mostly dealt with in the process rule deck. But enough chips are being built far enough down the process roadmap, for markets such as automotive or anything involving AI, that these issues are no longer someone else’s problem.

As chips are added into a slew of new markets, the number of sources of variation are increasing. There is variation in the processes, the manufacturing tools, the analog circuitry, and even in use cases and environmental conditions. And to make matters worse, these are all additive. So while one type of variation may not interrupt the functionality of a chip, or turn a die from good to bad, variation across multiple manufacturing steps, use cases, or across multiple dies that need to work together can render entire systems unacceptable. And in the case of automotive, industrial or medical devices, those inconsistencies could have life-threatening consequences.

“We were in a phase where we basically understood that there was some variation inherent to a lot of these things—process integration,” said David Fried, CTO at Coventor, a Lam Research Company. “We decided we were going to try to understand it very well, and as long as we could understand variation at a deep level we could design technology and design rules that would encompass the variation and we could survive. That’s the historical stance on variation. Process tools tried to reduce it and the integration schemes tried to be less sensitive to it, but at end of the day you could understand it and the design rules had to eat it. We are in the second phase now, which is that we can’t take that stance anymore. The design rules are too tight. There is simply not enough fat on the table.”

On top of that, the number of sources of variation are growing. Combine that with tighter tolerances and less room for guard-banding, and variation becomes much more of a critical element from design through manufacturing.

Fig. 1: Types of variation. Source: KLA

“As we develop and fabricate devices with smaller feature sizes, there is a need to control all types of variation due to tighter tolerances,” said Chet Lenox, yield consultant at KLA. “With advanced design nodes, there are two key trends for variation measurement and reduction. One is a stronger emphasis on edge placement error (EPE) rather than individual components of CD, overlay, and LER. EPE can be thought of as the sum of all errors in the measurement of distance between two features. More and more, IC manufacturers are talking in terms of EPE margin—rather than CD plus overlay plus LER plus process margin—and subsequently they want to measure and control EPE. The second trend involves measuring buried features, which became a critical part of IC manufacturing with the move to 3D device types, including finFETs and 3D NAND. The importance of measuring and controlling buried features will continue with the emergence of design technology co-optimization (DTCO) structures to extend logic scaling, and future logic device architectures such as lateral nanosheets, CFETs (complementary FETs), and eventually full 3D logic.”

Variation extends to all parts of the manufacturing flow, and while it is present even at 28nm and 22nm, the problem becomes more acute as feature sizes shrink beyond the capabilities of fab equipment. For the better part of a decade, the big challenge was being able to print fine enough features quickly enough. That problem is now somewhat under control with EUV, which currently is being used by both Samsung and TSMC. But other equipment is running out of steam, too, in areas such as inspection, metrology and test, which makes spotting variation across an increasing number of potential trouble spots more difficult.

One area that continues to be particularly problematic is edge placement error, where photomasks need to be aligned to print extremely tiny features.

“The mask makers make the masks to spec, which incorporates an error budget assigned to the mask manufacturing process,” said Aki Fujimura, CEO of D2S. “In turn, the wafer fab manufactures the chips to spec, which incorporates both the error budget for the mask and the wafer processes. Standard cells and interconnect and vias and contacts are all designed according to design rules, which are in turn determined by carefully studying what can be manufactured reliably according to the wafer and mask specs. And if process variation on the wafer can be contained to 0.1nm instead of 2nm, speed and power specifications of a standard cell (or any circuit) would have less variation in the specified corners. The corners, in turn, drive RTL design, synthesis, place-and-route to design a chip that will perform at the target speed, making sure that the signal goes from one clocked device (stage) to the next on time, and also that the signal stays there long enough for the next clock cycle to arrive.”

This requires complex coordination of many parties in a chain of exchanges, specifications and permissible tolerances and error budgets. In the past, some of this used to be controlled by the now-defunct International Technology Roadmap for Semiconductors (ITRS), which served as a central authority for handing out error budgets and tolerances for future generations so that all equipment suppliers are coordinated to assure reliable manufacturing of future nodes.

How this figures into future generations of chips remains to be seen.

“If you decide you’re just going to eat the variation, you are done scaling,” said Coventor’s Fried. “Now we’re in this middle phase where the reduction of process variation is absolutely essential. The understanding of it is critical to the reduction. And then you start working the two of these pieces together. So I understand the variation. Sometimes I can predict or compensate for this variation, and that becomes a control scheme. We’re there now. The design rules just can’t tolerate as much variation as we’ve been able to tolerate previously. We’re in this middle phase. But we are not on the other side of this yet, where variation becomes a benefit somehow. We haven’t seen that. We are still in a phase where if we can control variation and compensate for it at the equipment level and at the integration-scheme level, then people can still see the path to scale technology. We’re not going to get out of that phase until we can’t see the path anymore.”

From a manufacturing standpoint, that still leaves plenty of variation issues to resolve.

“The major technical problem of the line-cut process is edge placement error (EPE),” according to TEL in a paper submitted at SPIE last summer. “EPE is defined as the sum of variations that induces placement error of blocking mask and process shift. The EPE calculation in this LC process includes three parts: SAQP variations, blocking mask variations, and other variations such as local critical dimension uniformity, run-to-run variations, etc. (SAQP variations are calculated by the root-mean-square method of line CDU, pitch walking distributions, and line roughness performance. Blocking mask variations are calculated by root-mean-square method of line CDU, line roughness performance, and lithography overlay shift. By input, the three times of standard deviation of these variations into the calculation, the expected EPE is larger than 7.9nm.”

New sources of variation
But even without extending scaling, variation is starting to creep into chips from new sources.

“If you go to a more brain-inspired, neuromorphic type of approach, you no longer can hide the variation behind digital,” said Sanjay Natarajan, corporate vice president at Applied Materials. “That variation is all smaller than the clock speed, for example. Transistor ‘A’ might switch fast, transistor ‘B’ might switch slowly, but as long as all of them are finished switching within one clock period, no one notices that variation. The digital world has basically buried that variation. With analog, it’s more energy-efficient, but then you’ve got to get the variation under control. Since you can’t hide the variation, you have to eliminate that variation or minimize it.”

Sometimes, solving variation issues in one area can result in unexpected variation in another.

“From a performance standpoint, you’re worried about cross-field variation in your device performance in a critical path,” said Klaus Schuegraf, vice president of new products and solutions at PDF Solutions. “So you can wire out your transistors with a critical path to one side and another, and those need to be controlled with a certain tolerance. Cross-field is a major problem because some of these AI chips are full-field. The field size is about 600 or 700 square millimeters. That’s a huge die. So you’ve got device variability. And at 7nm the interconnect is very congested, so people have introduced an intermediate metal layer to relieve the congestion between the gate and the drain contact. The drain contacts are now bars, so you have better contact resistance and less variability. The result is you have more capacitance between gate and drain. But when you have a metal bar in there, you have to add a contact onto the source and drain. The way you do that is to add another metal layer. That adds variability. The overlay is quite tight. So as that moves around, the resistance changes. As that overlay changes, the resistance changes. Now you have a whole new source of variability. So you’ve solved one problem and created another.”

Design impacts
The problem isn’t just confined to manufacturing. It’s showing up all the way across the supply chain, from initial design all the way through to materials, equipment, manufacturing and final test.

Some of this is the same problem, with less wiggle room. “When we first were learning about variation, back in 1977, the recommendation was to use standard tools to deal with dose and focus,” said John Sturtevant, director of technical marketing at Mentor, a Siemens Business. “Now you have smaller and smaller budgets, and everything has tightened up. We are starting to look at the probability of failure to protect yield and how much variation is acceptable. Within a die you may have more than 40 billion vias, variations in edge placement, and you now have to think out to 7 sigmas. With the sheer number of patterns, even ignoring for the moment random effects, you need to look at the very edge of distributions. And everyone needs to be better informed than they were in the past.”

This affects every piece of the design puzzle, from floor planning to design for manufacturing (DFM) models and tooling.

“Because floorplans are being changed, we are seeing an impact on the way we do floor-planning for some of our high-speed interfaces,” said Navraj Nandra, senior director of marketing for interface IP at Synopsys. “You have the potential for a very wide bus, but you are limited by a height requirement, so you have to do high-speed routing across the chip rather than up the chip. When that happens you have to look at process gradients because they manifest themselves across the die. If you try to calibrate the sheet resistance in a process, ideally you have the same value if you are probing across the chip. You want the same value of resistance everywhere. Typically, because of process variation, that resistance value changes as you probe along the chip. That is the process gradient. You have to design around that using offset cancellation techniques.”

The use of finFETs at 16/14nm to control leakage current added much more regular shapes into designs, limiting freedom for design teams and therefore some of the variation caused by different shapes.

“This hasn’t actually made it easier, though, because now these different areas have more influence on each other,” said Steven Lewis, marketing director at Cadence. “So now each of these things has more influence on each other because of the extreme size of the gates and the closer proximity of the transistors and the complex routing. What we found is that you now need to use a combination of tools and methodologies to solve these problems. For analog, they’ve always bristled at explaining their methodology. And for finFETs, the front-end, back-end and verification walls need to disappear because you can’t wait until the end of the line to figure out if someone made a mistake. That doesn’t mean you forget everything you’ve learning with corners and DRC correctness, but you need to bring a lot more things together at the same time to work more cohesively with each other. So during layout, for example, you want to know the electrical expectations of the route and its effect on the transistor. Just following design rule checking isn’t good enough anymore.”

That approach being echoed further down on the manufacturing side, as well

“If you can measure and deposit simultaneously, you can reduce variation because you stop depositing when you’re done,” said Applied’s Natarajan. “If the chamber is running slowly today, and tomorrow it’s running fast, it doesn’t matter.”

Whether that solves the growing variation problems across the industry remains to be seen, but chipmakers are beginning to discuss this problem at a number of conferences.

“Single-dimension metals have forced us to promote the vias very quickly to higher levels, where most of the power is used at the higher-level metals, and the vias resistances have become a large portion of the delay contribution,” said Pr “Chidi” Chidambaram, vice president of engineering at Qualcomm. “The margin and variation of these vias management is quite critical. The methodology we use—like LPE extraction, the parasitic capacitance management and the error—if you take a thousand paths in a chip and calculate how much error you get, it is in the order of 5% to 10% on the worst case paths. For my product-level performance, these worst cases are the ones that matter. Even though the nominally, with the majority of the parts you are hitting the right target, the worst case ends up determining the final specs. All of this manifests in an unpredictability that we have to margin for. The ellipses show what we can design to. You take the SS or SSG corner and then you design into that box. But when I look at the kind of data that I get from a large blocks, across many technologies, you always end up with a wide tail that you didn’t predict. Today, we are able to live with it simply by over-margining the part. Improving the predictability will actually get us quite a bit of value in these technologies and the ability to scale further going forward.”

Chidambaram noted that the goal is to get the unpredictability error down in the back-end. “The front-end with SPICE and TCAD are pretty good, but the variation reduction to be achieved by better predictability is real. Any prediction error that stays—when there is variation in the part we can improve the process and bring it down, but the unpredictability error will stay through the life of the technology, so to decrease that prediction error is very valuable.”

Tighter tolerances, new applications and widely different use cases are adding to variation concerns across the supply chain. What used to be almost exclusively a manufacturing problem is rapidly becoming a manufacturing and design problem.

This is both a challenge and an opportunity, and there is no shortage of activity behind the scenes to understand and resolve this issue. But at this point, there’s still a lot of work to do.

— Brian Bailey contributed to this report.


Related Stories
Variation’s Long, Twisty Tail Worsens At 7/5nm
Multiple sources of variability are causing unexpected problems in everything from AI chips to automotive reliability and time to market.
Process Variation And Aging
How the very fast progress of the semiconductor industry is making transistor aging even more difficult.
Why Chips Die
Semiconductor devices face many hazards before and after manufacturing that can cause them to fail prematurely.
Variability In Chip Manufacturing
Why consistency in materials is so critical at advanced nodes.

Leave a Reply

(Note: This name will be displayed publicly)