中文 English

What Causes Semiconductor Aging?

Why this is becoming a bigger issue, and what can be done to mitigate the effects.

popularity

Semiconductor technology has evolved to the point where no one can assume chips will last forever. If not carefully considered, aging can shorten the life of an IC below the needs for an intended application.

Aging is well studied in technology circles, but while others less directly involved may understand at a general level this is a problem, it’s not always obvious why. So what exactly are the physical mechanisms behind aging?

“Aging depends essentially on how fast we are driving the electrons through the transistor channels,” said Sathish Balasubramanian, head of product management for AMS at Siemens EDA.

This, in turn, drives a number of tradeoffs. “From a design perspective, nearly every designer is interested in something different in terms of aging,” said André Lange, group manager, quality and reliability at Fraunhofer IIS Engineering of Adaptive Systems Division.

That said, its profile has risen as a key verification objective. “We focus on PPA all the time,” said Balasubramanian. “It should be PPA2,” making for four key considerations — performance, power, area, and aging.

There are only a handful of effects that are known to shorten chip lifetimes, and many of them involve charges being trapped where they don’t belong. Some can be mitigated with technology improvements, while others require careful design and verification. Aging simulation has proven to be a big help for designers, but there are no quick-and-easy solutions. Ultimately, in-chip monitoring can help to keep an eye on systems in the field as they age.

Why we are now worried about aging
Aging of circuits always has existed. In the past, however, there has been much more margin and less pushing the limits of what silicon and other materials can do. “Aging effects and degradation mechanisms have always been around in theory,” said Marc Hutner, senior director of product marketing at proteanTecs. “What we are starting to see is their widespread manifestation in high-performance applications, such as data centers. Large hyperscalers are reporting compute glitches and random defects they are not used to seeing. We expect this trend to increase as designs shrink, technology nodes advance, and performance and reliability requirements surge.”

In the past, device failure due to aging likely would have come long after the expected lifetime of the chip — if at all. With increasingly aggressive nodes, however, materials are being pushed harder, and there’s much less room for error. Aging can happen more quickly, with devices having expected lifetimes even shorter than that of their intended system if not properly handled.

While semiconductor aging can affect any application, one market has focused the spotlight on it more than any other. “Automotive is the main driver for considering device aging in design,” said Fraunhofer’s Lange.

That’s due to two factors — cars must last much longer than your average smartphone, and failure can have safety consequences. So OEMs and tier-1 suppliers have forced a good hard look at aging effects and how to protect systems from them.

But aside from automotive, any chip that has to last a long time will be impacted. “We are trying to do city planning things like sewage-control systems, mission-critical things in industrial IoT, or edge systems where each of these end applications for a given chip determines how to harden them for aging,” said Balasubramanian.

While there are many sources of information on how to mitigate aging effects, there is less that digs into the root causes of aging. It’s not immediately obvious how stable materials with electrons running through them would somehow wear out. It turns out that there are relatively few effects responsible for aging, and most relate to the energy those moving electrons impart when they run into something.

Metal migration
The first of the known effects is an old one, dating back decades. The issue here is that electrons literally can push metal ions around as they collide in a conductor. The effect is gradual, but, over time, gaps can appear in conductors. Even without a connection fully breaking, its resistance will increase as it narrows.

This effect is driven by current density. That creates a positive-feedback effect as voids form. The current density relates to the amount of current traveling through a cross-section of the conductor. If that conductor is being narrowed by electromigration, then the available cross section is reduced, increasing the current density and worsening the problem.

Mitigation involves managing the current density. It can be reduced either by increasing the metal cross section (by widening the metal, since the height will be fixed) or by lowering the current. Simulation tools have long made it possible to run checks to ensure that current densities remain in bounds throughout a design.

There’s another form of metal migration that may or may not be an issue. Metal-dielectric interfaces can be a source of physical stress, and that can cause the metal to move. This was an issue early in the dual-damascene copper process with the stresses along the metal walls.

That generally is considered a fixed problem, but it’s unclear whether it may be rearing its head again on the most aggressive nodes.

The impact of trapped charges
The idea of conductors and insulators assumes, simplistically, that charge carriers move only through conductors and that insulators are electrically inert. While that can be true, the prevalent issue of trapped charges violates this principle.

A trapped charge is one that somehow has become embedded in an insulator. It might move around, but it’s most likely to stay where it is. Because these charges are not mobile, they won’t participate in any outright current flow, but they can enable leakage and eventually cause breakdown.

Charge trapping is, for the most part, an unwanted phenomenon. While a few applications, like flash memory, can leverage charge trapping as a storage mechanism, it’s otherwise unhelpful.

Traps can be thought of as defects that attract and hold electrons. This is of biggest concern in gate dielectrics, where the trapped charges affect the threshold voltage. They also make it easier for carriers in the channel to tunnel through the gate.

There are intrinsic traps, which are created during manufacturing, and there are extrinsic traps, which are created during operation. Due to the latter, trapped charges can accumulate over time, which is really what drives most aging.

Intrinsic traps may result both from imperfect dielectric formation and from the interface between the dielectric and silicon. Within the dielectric, charges can be trapped anywhere there’s a defect in an insulator’s crystal lattice.

While it may be possible to grow extremely high-quality oxides, the time it takes may make such a process uneconomical. “When you have good gate oxides, then you do not have too many traps inside,” said Lange. “Nevertheless, it’s a lot more effort to have these good gate oxides.”

Annealing may not be practical due to the temperatures and times involved, depending on where one is in the process. “It always depends on the previous process steps whether you can do a high-temperature process step,” Lange noted. “Some of the high-Κ dielectrics do not really like high temperatures.”

The reality remains that there will always be some defects within a dielectric.

Traps at the interface
At the silicon interface, there will be open, or “dangling,” silicon bonds. These are passivated with hydrogen. But not every dangling bond may get filled, and those remaining ones will be traps.

In addition, holes in a channel may coax the hydrogen away from its position, opening up a trap. “It is mainly because of the breaking of the silicon/hydrogen bonds located at the silicon oxide interface,” said Ahmed Ramadan, product engineering director, analog and mixed signal at Siemens EDA.

In general, traps will depend on the materials and any additives used to prevent leakage. The switch from silicon dioxide to hafnium oxide (or other high-Κ dielectrics) has not eliminated intrinsic traps.

These defects don’t cause a problem unless an electron finds its way in there. Even then, a single trapped electron is unlikely to be noticed. The aging effects relate to the fact that, once trapped, an electron may be hard to dislodge, and this can be a cumulative effect.

The following are specific causes and effects of charge trapping that have different names, depending on the impact they have. While they account for the majority of the aging in mainstream applications, some applications — especially those that must survive the rigors of space — may have other contributions.

“You have single event upsets due to alpha particles, for example,” noted Lange. “You also have permanent degradation due to high energetic ions, so total ionizing dose is also a problem.”

Hot-carrier injection
Carriers within a current will have a distribution of energies, and some of them will be more energetic — or “hot” — than the others. Those carriers can cause electrons to cross into adjacent insulators, embedding themselves in (or even creating) traps. This is referred to as hot-carrier injection (HCI).

“Hot carrier injection is due to the high electric field that takes place at the drain side,” said Ramadan. “This high electric field will actually push electrons that gain enough energy to collide with the silicon lattice atoms and generate electron/hole pairs. Holes will go usually to the substrate for the NMOS device. Electrons will be pushed toward the silicon/silicon-dioxide interface. They will collide with other silicon atoms, generating more electron/hole pairs in an effect that is called ‘impact ionization.’ These electrons can either reside inside the silicon-dioxide interface or pass to the gate.”

This tends to be more of a problem when the drain voltage is high relative to the gate. “It is said to be an issue when the drain voltage is double the gate voltage,” Ramadan noted.

Fig. 1: An energetic carrier hits a silicon atom, creating an electron/hole pair, with the electron becoming trapped in the nearby dielectric. The new electron may create other pairs along the way. The holes will move toward the substrate. Source: Bryon Moyer/Semiconductor Engineering

Fig. 1: An energetic carrier hits a silicon atom, creating an electron/hole pair, with the electron becoming trapped in the nearby dielectric. The new electron may create other pairs along the way. The holes will move toward the substrate. Source: Bryon Moyer/Semiconductor Engineering

This mechanism is explicitly used in one side of flash-memory programming. Beyond that, it’s an unwanted phenomenon.

Negative- and positive-bias temperature instability
This is an effect that can gradually reduce the threshold voltage of a transistor. Negative-bias temperature instability (NBTI) affects PMOS transistors and has been more of a concern over time. The positive version (PBTI) affects NMOS transistors, and it’s also being considered for aggressive nodes. Both are exacerbated at higher temperatures.

With NBTI, there may be two causes of the traps that capture the carriers. One is simply the presence of intrinsic traps within the gate dielectric. That’s also the main mechanism behind PBTI.

The other source of trapped electrons for NBTI is at the channel/oxide interface. Unlike HCI, where hot electrons are involved, these phenomena cause electrons to drift slowly into and through the gate dielectric under the influence of an electric field. Note that this aspect is not believed to be a mechanism affecting PBTI.

These electrons shift the threshold of the transistor. “The threshold voltage will go down, and that impacts the overall delay of a circuit, which is going to be worse,” said Ramadan.

Fig. 2: Carriers (blue circles) drifting from silicon into dielectric. Black circles are traps, some of which are occupied. Source: Bryon Moyer/Semiconductor Engineering

Fig. 2: Carriers (blue circles) drifting from silicon into dielectric. Black circles are traps, some of which are occupied. Source: Bryon Moyer/Semiconductor Engineering

Charges captured by the bulk traps can gradually be released over time if the voltage across the oxide is removed. “If there are pre-existing traps from the manufacturing process, they will be filled with holes,” said Ramadan. “When you remove the [voltage] stress, it will go away.”

In theory, PBTI and a portion of NBTI may be reversible — making them not an actual aging issue.

But the relaxation time for undoing the charge trapping can be anywhere from the millisecond range to hours. By the timescales of an operating integrated circuit, those are long. That means that, even though the effect may technically be reversible, it may not have the opportunity to do so in some circuits. In that case, it also acts effectively as an aging mechanism.

It’s less clear whether the interface traps will release their charges. “The traps created by collisions with holes that correspond to the silicon/hydrogen bonds. Removing the hydrogen bonds will leave a dangling silicon bond,” said Ramadan. “These are not recovered.”

Or put differently, the hydrogen will not return to the site even if the stress is removed.

“NBTI/PBTI] and HCI are important for circuit designers because both of them lead to a gradual shift of transistor performance, and hence to a gradual shift of circuit performance,” said Lange.

Increased leakage
Another mechanism for electrons crossing a dielectric is tunneling, and Fowler-Nordheim tunneling is a very specific mechanism whereby a voltage across the dielectric narrows the tunneling barrier. The higher the voltage or the thinner the oxide, the easier it is for electrons to tunnel through.

Fig. 3: A simplified band diagram illustrating Fowler-Nordheim tunneling. On the left, with no applied voltage, the barrier is too wide to tunnel through. On the right, with an applied voltage, the thinner part can allow tunneling. Source: Bryon Moyer/Semiconductor Engineering

This effect was intentionally utilized for the original electrically-erasable programmable ROMs (E2PROMs) and is still used for one side of the flash programming mechanism for bit cells using floating gates (with HCI used for the other side). But any time there’s a voltage across a thin barrier, tunneling can occur, whether or not it’s desired.

Those electrons can be trapped by defects along the way, meaning they may not make it across the dielectric. But they do lower the barrier for further tunneling, and that can cause increased leakage current through the dielectric.

Time-dependent dielectric breakdown
As more trapped electrons accumulate in a dielectric, its overall breakdown voltage comes down. While the charges accumulate slowly, at some point a “percolation path” forms, allowing the dielectric to fail. That causes “time-dependent dielectric breakdown,” or TDDB.

Unlike typical aging mechanisms, this means there is an abrupt failure rather than a gradual reduction of performance leading to failure. While this effect has been modeled, it may be that the models are too conservative for some designs.

“We had partners working in RF telling us that the static TDDB model would predict the circuit as immediately failing after switching on,” said Lange. “Experiments show this is not the case. The problem is that you have high voltage spikes that are so fast that they do not lead to the damage that you would expect from DC stress of that height.”

This is not well captured by the models, so more work is needed in the event that RF designs are over-constrained today.

Temperature-cycling fatigue
Advanced packaging brings its own challenges independently of the individual dies that may be packaged within. “Another factor we need consider when discussing degradation and aging sources is advanced heterogeneous packaging,” said Hutner. “This means we are actually expanding the term ‘aging’ to include new phenomena.”

With complex multi-chip packaging, there are multiple materials with differing coefficients of thermal expansion (CTE). This means that as the temperature changes, the materials will expand and contract at different rates. Over time, those differences can result in discontinuities as metal connections fail.

Devices with mechanical elements — like MEMS chips — also may have internal failures if such long-term temperature cycling hasn’t been adequately addressed in the design. Such a failure would result in the degradation of accuracy or outright death of the device.

Analysis pre-design and monitoring post-design
While an understanding of the underlying mechanisms can be helpful when working to mitigate aging effects in a chip, the details should be abstracted away by analysis tools. The quality of those tools will depend on the quality of the models they use.

“If designers would like to push their designs to their limits, they need to make sure that they have good models that are capturing these reliability effects,” noted Ramadan. “Most of the foundries are actually keen on having NBTI and hot-carrier injection in their models. We have also seen TDDB models specifically for automotive applications.”

But verifying aging isn’t as simple as pushing a button. “These simulations depend on the mission profile that you use,” cautioned Lange. “What is the circuit intended for? What are the critical or typical use conditions? The other question is, which models does one use to mimic the impact of device degradation or circuit performance? These models have to support the other transient waveforms in the simulations. And they have to accurately assess all the points in the IV curves that you’re interested in.”

As more is learned about aging, those models should be updated, making it easier to design around aging — even as the effects get worse with each generation.

“Basic understanding of what’s happening physically is fairly good already, even for the very, very small nodes,” said Lange. “But there’s a lot of work to be done to make these models available for the different technologies or the different transistor types.”

Pre-silicon verification, while critical, is insufficient. Aging is one of the main reasons that monitoring circuits are being included in advanced ICs. “SoCs need to be made intelligent, embedded with smart monitoring solutions, to provide real-time feedback on their health and performance degradation predictions,” said Hutner.

Critical parameters can be watched, with analytics pointing toward devices that may fail due to aging. Where appropriate, such chips — or the boards onto which they’re soldered — can be replaced before their failure causes a bigger problem.

It would appear, then, that aging effects are here to stay. But between design verification to mitigate aging and real-time monitoring to observe aging in progress, systems can be better protected as they pervade ever more of our lives.



6 comments

Bessel Func says:

If I remember correctly, wirebonding also has a wear out issue. Interface metallization can and will degrade. Purple plague will continue to increase even in metal system where such phenomenia isn’t suppose to be a problem. Bessel

David Leary says:

I enjoyed reading your article. You are correct to mention that leading-edge Si technologies are challenged by erosion of wearout margin from node to node. Lifetime has a time to failure distribution, owing to more than defects. With eroding reliability margin, early wearout failures can result from wafer process variation. Examples are across-wafer, wafer-to-wafer, and lot-to-lot patterning, film thickness and stoichiometry variances. The key aging mechanisms are thermally-driven. With hot spots across the die being common, creating in many cases many tens of degrees of temperature variance across a die, variances in wafer processing can and do result in early life failures. For performance-optimized circuit designs, I’m advocating that production reliability screens (eg burn-in) need to deliver sufficient stress level to not only activate early life latent defects, but also identify product at risk of early wearout failures (eg < 10 years) arising from wafer processing variances.

Lu says:

David, you are absolutely right. I’m an integrator on smaller nodes and I don’t buy my own product. I know all fabs and tools have process variation but with smaller feature size it becomes exponentially more lethal. And defects get nasty and downright undetectable. I don’t want to lose my phone to a stupid field fail

Jan Hoppe says:

This is the best explanation I have seen. I read papers and got lost. Thanks Jan

Arnaud PHELIPOT says:

Thank you for this very interesting article which reminds me a long decade of works.

I add that for all these failure mechanisms, the models (Arrhenius Law in most cases) and their parameters are well known and agreed in the the semiconductor society.
If you refer to JEDEC JEP122H publication you can get for all these failure mechanisms the model physical law and parameters that can help you on the user side to qualify your semiconductor parts vs your mission profile (or to use semiconductor manufacturer results to do so).
However you can see that the publication was not updated since 2016. This is because when entering deep submicron (below 45/32nm), leading nodes were developed with different technology choices depending on the manufacturer (FinFET, GAA, planar + SOI substrate, material stack up) that broke the model and model parameters’ agreement. This is what I understand in Lange quotation “But there’s a lot of work to be done to make these models available for the different technologies or the different transistor types.” (and this applies to III-V semiconductor, to optics electronics etc.)

So Physics of Failure models becomes something which is process-dependent, if not foundry dependent, and whose knowledge can be critical for business because it can determine the capability to a technology to adress various application from handheld devices to space systems.

Lange is perfectly right when saying that mission profile is critical (for a car under hood parts will see higher temperature than in-cabin electronics, a car will be ON only around 5000 hrs during its whole lifetime whereas a civil aircraft will be ON during the same number of hours per year), but evaluating the end of life of the product and its ability to be used in a system requires also a good knowledge of the specific technology in use.

For more informations about lifetime prediction and models you can read the good document from JPL with Pr Bernstein participation : https://nepp.nasa.gov/files/16365/08_102_4_%20JPL_White.pdf. It is a very good scientific introduction to the models.

Arnaud

Phil Hollis says:

Thank you for this excellent summary, which brings back memories of some challenging times over 44 years of semiconductor manufacture.

Leave a Reply


(Note: This name will be displayed publicly)