What used to be someone else’s problem is now everyone’s problem.
Today, most design engineers don’t pay much attention to variation. It’s generally considered to be a manufacturing problem.
Even within the fab, various job functions are segmented enough that variation in one part of the process, such as the photomask shop, doesn’t necessarily come to the attention of the people doing deposition and etch or those polishing the wafers.
But increasingly, variation is becoming a team sport. An issue in one area can affect overall quality or yield. Or even worse, variation across a number of elements in a system can cause reliability issues that can crash a semi-autonomous vehicle or an industrial AI system—even though all of the components or processes used to build those systems are within acceptable quality distributions.
There are a number of main sources of variation, and multiple contributing factors in each of those. For manufacturing, where variation is the best understood, the big challenge is to align what gets designed and translated into GDSII with what gets printed on silicon or whatever substrate material is being used. This becomes more difficult at each new node because tolerances are in the range of several atoms, and just being able to count those atoms and make sure that enough atoms are deposited, or that too many aren’t etched away, is one of the most incredible feats in engineering. It’s even more incredible when you consider the equipment to make that happen ages differently, with discrepancies between chambers, and that EUV scanners have to be calibrated with each other because no two are exactly alike.
The second main source of variation comes from the design side. Unlike in recent years, when the bulk of designs were focused on fairly predictable implementations of von Neumann architectures, an increasing number of chips are being designed for smaller batches and much more targeted applications. This creates a number of challenges, particularly around the ability to leverage historical data about what works best, what goes wrong and why, and how to fix any problems that might arise.
In the past, the majority of chips being developed met a set of specs for a socket. Expertise was built up over decades and systematically used to train new hires. But in markets such as automotive, where 7nm AI chips are now under development, none of the companies involved has a full complement of such expertise. Systems companies such as Apple and Waymo, and consumer chip giants such as Qualcomm, have never built automotive systems, or anything that needs to last 18 years under harsh environmental conditions. Tier 1s such as Bosch and Delphi, meanwhile, have never built chips at 7nm, where noise, electromigration, thermal issues such as self-heating, and complex routing are extremely difficult to manage.
There are other sources of variation that have never been considered in the past, as well. Use models can produce widely different results. So can driving conditions and the exact location of sensors, memory and logic within cars. And so can the packaging used for those devices. Each of those adds a level of systemic variation, and any one of them can cause problems for the others.
Yet another source of variation involves materials. Carmakers, which are keenly aware of the liability issues they are taking on, are demanding seven sigma purity levels. That may or may not be realistic, but at the moment no one is quite sure because those numbers are purely theoretical. No one has ever tried to measure defects in the parts per trillion. And while that may be statistically possible over time by measuring problems in vehicles on the road, how do you trace a defect in a dopant or gas when the equipment to make those measurements aren’t capable of spotting those issues at that scale? And what is an acceptable level of random defects?
Adding up these variation sources has broad implications for the entire supply chain, because variation itself is additive. Variation in one area may not be significant, but when combined with something else—variation in hardware, software or manufacturing processes—it can cause an entire system to malfunction. In the case of a smart phone, that might be an inconvenience. In the case of an implantable medical device or a satellite, the impact is much more significant.
It’s time for the entire supply chain to begin sharing information about variation to understand the scale of the problem and what needs to be addressed where and how. Data is only valuable if it’s useful, and having part of the picture is like having part of a cipher. It doesn’t solve the puzzle.
Related Stories
Variation Issues Grow Wider And Deeper
New sources, safety-critical applications and tighter tolerances raise new questions both inside and outside the fab.
Variation’s Long, Twisty Tail Worsens At 7/5nm
Multiple sources of variability are causing unexpected problems in everything from AI chips to automotive reliability and time to market.
Process Variation And Aging
How the very fast progress of the semiconductor industry is making transistor aging even more difficult.
Why Chips Die
Semiconductor devices face many hazards before and after manufacturing that can cause them to fail prematurely.
Variability In Chip Manufacturing
Why consistency in materials is so critical at advanced nodes.
The three primary obstacles faced in manufacturing are: Unnecessary Complexity, Excessive Variability, and Intellectual Myopia. The most insidious of these is the propagation of variability. The three equations that must be understood if you are to overcome these obstacles are: Little’s Law, The P-K equation, and the Propagation of Variability Equation. Despite that, the vast majority of factory engineers haven’t heard of these equations. Hopefully, your blog will encourage more interest in variability – the performance killer.