Transistor Aging Intensifies At 10/7nm And Below

Device degradation becomes limiting factor in IC scaling, and a significant challenge in advanced SoCs.

popularity

Transistor aging and reliability are becoming much more troublesome for design teams at 10nm and below.

Concepts like ‘infant mortality’ and ‘bathtub curves’ are not new to semiconductor design, but they largely dropped out of sight as methodologies and EDA tools improved. To get past infant mortality, a burn-in process would be done, particularly for memories. And for reliability, which basically follows the curves of a bathtub, there is a wide, safe operating time. As the device ages, the failure rate goes up again on the other side.


Fig. 1: Bathtub curve. Source: Wikipedia/Wyatts derivative work

“The burn-in technique — heating and cycling the semiconductor — accelerated or precipitated those failures,” said Mick Tegethoff, director of AMS product marketing at Mentor, a Siemens Business. “Initially that was key. Then technology got better and better. The processes got better and it became less of an issue.”

While transistor aging and reliability analysis has always been practiced in safety, automotive, and aerospace applications, this was not the case in consumer devices until very recently. But at 10nm and below, it has become a general problem, regardless of the application area.

Specifically, there are three causes of aging in semiconductor devices:

Negative bias temperature instability (NBTI). This is caused by constant electric fields degrading the dielectric, which in turn causes the threshold voltage of the transistor to degrade. That leads to lower switching speeds. This effect depends on the activity level of the circuits, with heavier impact on parts of the design that don’t switch as often, such as gated clocks, control logic, and reset, programming and test circuitry.
Hot carrier injection (HCI). This is caused by fast-moving electrons inserting themselves into the gate and degrading performance. It primarily occurs on higher-voltage modes and fast switching signals.
Time-dependent dioxide breakdown (TDDB). This occurs when high electric fields eventually cause total breakdown of the gate causing catastrophic failure of the transistor.

“At modern geometries, the primary root cause of all these effects is the stress effect of high electric fields across the dielectric,” said João Geada, chief technologist at Ansys. “As geometries have got smaller, but voltages have not scaled at the same rate, the electric field across gates has increased, resulting in worse aging behavior.”

On top of that, new applications of advanced semiconductors in industrial IoT and automotive spaces, which have formal requirements for durability of parts, has put a critical focus on aging behavior and the ability to predict and/or control it. “It is one thing if a disposable gadget stops operating after a couple of years,” he said. “It is an entirely different issue if a life-critical ADAS system fails, either completely or by failing to meet its minimum operating requirements, within its expected lifetime.”

As with all changes in technology, there is a learning curve. Even established finFET nodes are revisiting reliability.

“We’ve seen a lot of renewed interest in reliability, even from the 16nm node, because you’ve gone from having planar devices that everyone’s been designing with now for many many years — and they’re experienced with — to devices that are three-dimensional,” said Art Schaldenbrand, senior product marketing manager for the Custom IC and PCB Group at Cadence. “Change is a little bit scary. There are also impacts that come with everything scaling down. We have to obey the laws of physics, and sometimes it’s difficult because we don’t understand what the rules are.”

Device stresses are different than in the past, and they are being used in new ways. “If you’re in an application like a chip for industrial IoT, you might sit quietly for 10 years working very little of the duty cycle, but it is aging during those 10 years even when mostly at rest,” he said. “So it’s not just the technology. There are also more challenging applications.”

Nature vs nurture
So what can be done about it? The answer may be as unique as a single use case.

“One of the maxims of engineering is that you can only fix what you can measure,” Geada noted. “This is where aging reveals one of its problems. It is extremely context-sensitive. The same gate in two different design contexts will see totally different age stress patterns, and therefore will age differently. This is true even within a single gate. Each transistor, depending on the gate’s design context, will see a different set of stress patterns, and thus each transistor experiences individualized aging. Its performance degradation will be different than the same transistor elsewhere in the same cell, or that same transistor in a different instance of the same cell in the same design.”

The usual way of dealing with this problem is guard banding. But at advanced nodes, that extra circuitry can affect both power and performance.

“Traditionally, aging in design flows has been addressed by margining—degrading all transistors in a cell to a certain amount, computing the different performance of this aged cell versus the original cell, and using that ratio (old/new) to derate all cells of that type in the design,” he continued. “Though simple to understand and deploy, this type of approach has clear fundamental limitations. It totally fails to address that aging is completely instance-specific and can have completely different behavior in different parts of the design due to the local stress patterns, such as switching rates and static probabilities.”

Circuit simulation
Most of this can be built into circuit simulation, which is still the main tool used to electrically simulate the circuit.

“What they do is heavy duty mathematics that models the degradation of a parameter,” Tegethoff said. “Each transistor is a model of mathematical equations, and these transistor models—whether BSIM3 or BSIM4—have a lot of parameters in them that determine the conductivity of the channel or the resistance of the drain, among other things. Part of the development of the PDK of a process is to come up with the parameters of the model, and it scales with length and width. Let’s say you create your models for your PDK. Then the question as to reliability and simulation is how those transistors are going to behave 10 years from now.”

Commonly, engineering teams will perform a two-path simulation. First, a ‘today’ simulation is run to simulate the circuit dynamically and see what currents are flowing through the transistor and what voltages are been seen. That data is collected and run through aging equations to determine the degradation of the parameters based on the data and on on bathtub curves. This allows the engineering team to predict how the parameters will degrade over 10 years, for example, and by how much.

“So if the resistance was X, now it’s gone up to 1.2 X,” he explained. “If the capacitance was ‘this,’ now it’s ‘this.’ Interestingly, now you go into the SPICE simulator, and for each transistor, based on where it’s located and what electrical stress it’s seeing, it will get new updated parameters. In other words, at the beginning of simulation you get the same model. As long as it’s the same width and length, everyone gets the same model. After you ‘degrade’ them, each one will have its own model. Then the simulation is run again for the same gain, or whatever parameters, and you see theoretically how it changes after 10 years of operation.”

Degradation mechanisms also lead to a change in the electrical behavior of the devices, which results in shortened circuit lifetime and degraded circuit performance over time, said Deepa Kannan, technical marketing manager at Synopsys. “Device degradation has become a limiting factor in IC scaling and a significant challenge in designing advanced SoCs used in reliability-critical applications such as automotive electronics.”

As such, circuit performance failures due to device degradation need to be considered during the circuit design process to achieve reliable products. “Design for reliability is one of the key requirements for IC designs in modern electronics,” she noted. “Long and expensive testing is required to assess the degradation of circuit performance and failure in time (aging), thus increasing the overall manufacturing cost. Alternatively, designers use conservative rules to over-design the critical circuits, increasing the chip cost. Therefore, a cost-effective way to estimate the lifetime of circuits is essential.”

Dealing with data
As is the case with so many aspects of the semiconductor supply chain, using data generated and analyzing it for specific purposes applies here, as well, in order to improve on reliability and transistor aging.

Specifically in the area of burn-in data, David Park, vice president of worldwide marketing for Optimal+, said if data being measured that is at all correlated to what causes a device to fail in the burn-in chamber—which is a physical manifestation of the aging modeling that’s done much earlier in the process—this can be used to create a quality index. The quality index can be made up of any number of tests that are weighted any way they want.

In one situation he pointed out that a semiconductor company identified four specific things it believed was very indicative of what caused devices to fail in burn-in, which is an expensive step in the testing process. These included the number of re-tests the device had to go through before it passed, location relative to the edge of the wafer, and how many parametric and bi-variant parametric outliers were found for those types of devices, such as the number of failing tests. A quality index was created, which is the grade per component by factor, multiplied by its weighting percentage. Then, the sum of all of the factors is divided by the total amount of weighting provided for that device to result in the quality index number. This can be skewed more liberally or conservatively, depending on design goals.

“If you want to be very very highly confident, you can set it so that nothing bypasses burn-in that absolutely would not have not failed in burn-in,” he said. “In this case, the customer indicated 50% burn-in time because of the number of devices that were simply never statistically going to fail burn-in. If they wanted to get really aggressive in the burn-in reduction, which depends on what their quality limits are, they could have set the quality index limit to accept 10 DPPM (defective parts per million). So there would still potentially be 10 devices skipping burn-in that most likely would have failed. But now the burn-in reduction has increased by 70%.”

Of course, the ability to translate these learnings back to the design engineers is a significant benefit. “If you can get back to the original design engineers that the devices that are failing burn-in the most are located closer to the edge of the wafer, so as part of the RTL to GDS2 manufacturing process, you can advise them to put in more slack or tolerance for process variations at the edge of the wafer,” Park added. “This needs to go into part of the aging criteria because they are being too aggressive and sacrificing yield and, in turn, quality, in order to get the design done.”

Future aging challenges
With further advancement of semiconductor manufacturing nodes, the issues associated with reliability and transistor aging will only intensify. To this end, Cadence’s Schaldenbrand noted that there is always some concern about devices being in sleep mode for a long period of time, and whether the models account for this or the way the device will actually be used.

“Another question that’s come up is related to the way models are typically fit,” he said. “You will do what’s called accelerated aging by putting the device at a higher temperature, run it at a higher voltage to stress it greater. The question always is, is that data you derive from accelerated aging really going to give you predictive data when you operate it normally? That’s one of the big considerations for any of these modeling approaches. There’s kind of a leap of faith involved, that in order to model a device you have to stress it for some finite amount of time because you can’t wait 10 years to develop a model. But when you do that, do you trust that extrapolation?”

At the end of the day, for predicting reliability and transistor aging, engineering teams of the past might not have employed every analysis option available. With complexity and physics staring design teams in the face at 10nm and below, design teams will have to use all means possible to make sure devices work and yields are high.

Related Stories
Power Challenges At 10nm And Below
Dynamic power density and rising leakage power becoming more problematic at each new node.
Quality Issues Widen
Rising complexity, diverging market needs and time-to-market pressures are forcing companies to rethink how they deal with defects.
Improving Transistor Reliability
Some progress is being made, but there are no easy answers.
Are Chips Getting More Reliable?
Maybe, but metrics are murky for new designs and new technology, and there are more unknowns than ever.



Leave a Reply


(Note: This name will be displayed publicly)