The number of techniques for improving quality is rising, but so is complexity.
The semiconductor industry is making huge progress in understanding the causes and telltale signs of circuit aging and irregular behavior. But are devices actually getting more reliable?
The answer depends on a number of factors, none of which is easily measured. To be sure, circuits are much better designed and inspected than in the past, and the individual components are printed more accurately using EUV than with double patterning. Moreover, there are more techniques for ensuring that chips are operating as expected throughout their lifetimes, such as in-circuit monitoring and external monitoring around the chips.
But reliability is a measure of quality over time, and it’s getting harder to track that reliability across different implementations. This is particularly true in multi-chip packages, where reliability may be a measure less of the individual circuitry within a chip than the ability of the combined system to identify potential failures and re-route signals, or to identify heavy use patterns and load-balance the processing by turning some processing elements on and others off.
Everyone knows when a device doesn’t work. But they don’t necessarily see when performance slips gradually over time or when devices implement workaround solutions with a software update. In the past, much of the performance degradation was attributed to software patches upon other patches. As chips become more complex, it may be the hardware that’s actually the culprit. Short of testing each circuit in a device that’s been in use in the field for some time, the cause is not always apparent.
In addition, more transistors or chips per die increases the possibility that something will go wrong. This can include everything from thermal effects that only show up under some use cases to latent defects that may take years to appear. Finding these kinds of defects in manufacturing may be nearly impossible because there isn’t enough history when chips are developed at the leading-edge nodes to point to the problem. In some cases, it may be an immature manufacturing process, which might affect only those transistors on one part of a wafer. But if it takes years for these problems to present themselves, the process itself likely will have changed.
Solving this comes down to whether approaches to improving quality can keep pace with increasing complexity and more of everything, from interconnects to software to heterogeneous structures on one or more die. With billions of transistors, defectivity rates even a half-dozen digits to the right of a decimal point can cause massive headaches. It’s not clear at this point whether quality control can scale as fast as density, and that density may involve multiple dimensions rather than just two.
Finally, in addition to rising complexity, there is a push for more customization in nearly every market. Giant systems companies such as Google, Facebook and Alibaba, to name a few, are designing custom chips based upon their own goals for efficiency and performance, and others are looking at attaining similar benefits through use of standardized tiles or chiplets. In all cases, it’s not clear how well this will work. It took an entire ecosystem and years of evolutionary improvements to achieve reasonable yields at 10nm and 7nm.
For companies do this by themselves using unique designs is an interesting approach, and it could improve performance by orders of magnitude. But it will be years before anyone can assess how well these systems perform over time. Even for customized multi-chip packages, where there are more implementations across multiple markets, the number of potential corner cases is unknown.
So while reliability is definitely improving with better techniques, better equipment, and better internal monitoring, how to measure all of this remains somewhat less obvious – and probably will remain so until the current batch of chips under development has been in the market for a number of years. Until then, keep your fingers crossed.
Leave a Reply