Aging: Not Always A Bad Thing

Best practices for predicting how a device will operate over time—and what to watch out for.


By Ann Steffora Mutschler
When IC devices are produced and shipped to end customers, it is important that they will function as specified in the application environment. Determining how a device will operate over time is a key aspect of overall reliability and is commonly referred to as ‘aging.’

Aging of electronics is not a new problem. In fact, analog and automotive designers have been analyzing aging for years. But it’s becoming an issue today for many different reasons.

“What is new is that the impact of aging is getting higher because it now impacts everybody,” said Hany Elhak, senior product marketing manager at Synopsys. “It impacts digital designers, it impacts the systems. With advanced nodes it will affect everybody. Aging is getting shorter now.”

The phenomenon of device aging
Device aging refers to the degradation of the device performance over time. Transistor performance degrades over time mainly due to degradation of the gate dielectric and degradation in the interface between the gate dielectric and silicon.

There are different mechanisms that happen at the gate over time, he pointed out. “If you think of overall reliability, there are thermal issues, device aging issues and electromigration that happens at the level of the interconnect. Those three mechanisms affect the overall reliability of any circuit. Today, that’s more important than before.”

For device aging, there are two mechanisms. One is called hot carrier injection (HCI), which occurs when the device is biased. There is an electric field and carriers are supposed to move between the drain and the source, but some of those carriers bombard the gate. Over time the carriers injected into the gate start to change the electrical properties of the gate dielectric, and as they do that the threshold voltage of the transistor changes and the transistor simply doesn’t behave as it’s supposed to.

This impacts the gate because of the natural behavior of the transistors, Elhak explained. “In the transistor you have a gate, which has an electric field that is supposed to control the current that is flowing between the drain and the source but there are random events. This electric field causes some of those carriers, instead of flowing between the gate and the source, to go and get injected into the gate. As more carriers get injected over time, the electrical properties of the gate start to differ because it’s not supposed to have those carriers in it. That changes the properties of the whole device because now the gate is supposed to control that electric field it is now made of a different material.”

The second mechanism that causes aging is called the bias temperature instability (BTI), which happens when there is a constant bias on the device meaning there is current flowing. Here, instead of being driven by electric field, it is driven here by bias and temperature. Also, charges start to get trapped into the gate and as this happens, the properties of the gate change and again it impacts the threshold voltage and the carrier mobility in that channel. “If you change the threshold voltage and if you change the mobility, then you have a different transistor,” he asserted.

A bigger concern
Device size and electrical field play a role in why aging is an increasing concern.

Because electric field is dependent on size, if the same voltage is on a smaller device there will be a higher electric field, which means more carriers will be injected in the gate and that means that degradation in the gate material will happen faster, Elhak said. “The other thing that happens with advanced nodes, not only is the device smaller but also the bias is lower and the threshold voltage is lower. The smaller threshold voltage means higher sensitivity to this kind of fluctuation. Simply, smaller threshold voltage means that this change in the threshold voltage has a higher impact, so it will be more sensitive to aging than before.”

The other aspect on aging is temperature. The bias temperature instability is proportional to the device operating temperature and a smaller device means it will heat up faster than a larger device.

This is analogous to when toys were made from steel and wood and parents passed them from generation to another. Now that they are made of plastic, they break more quickly. “These new nodes are just like that. With 16nm and 20nm, you need to take into account the performance of your device is changing over time and it will break down. Devices are not going to last as long. If 90nm or 65nm was durable, 28nm and down is not. You need to take aging into account. You may ignore it and overdesign. That’s one way many people do it. They add margins so that when the device degrades it will still be operational, but this means that your overall system will be more expensive.”

One way or another
There are many internal and commercial approaches to prevent aging problems or perform aging analysis.

One approach, as in the case of Synopsys, first simulates the circuit with fresh transistors to determine the stimulus of every transistor, i.e., how every transistor is biased during the operation of the circuits. This information is applied to an aging model of the transistor and based on that how the threshold voltage of every transistor in the circuit will change is calculated, how the mobility of every transistor and the circuit will change based on the aging model that the foundry will provide, plus the dynamic information about how every device is biased during the operation of the circuit. From that a new device model is created, called the aged device model, against which the circuits are run to see how the circuit will operate with aging and compare the fresh behavior with the aged behavior.

Another approach is ‘protect and prevent.’ Matthew Hogan, product marketing manager at Mentor Graphics, said that from a SPICE simulation perspective, existing tools allow engineering teams to run simulation on the devices and attach aging models—similar to the Synopsys approach above. However, the company also looks at the issue from a full chip verification perspective, because when using aging simulations it is easy to get into a situation where there is limited capacity to run the tools.

As such, he said, another route would be to consider how to protect the devices or ensure that the overall design is laid out and the voltages and thresholds are of the correct values to avoid these reliability issues going forward. “Companies have whole teams of people that characterize these devices and come up with the right design rules. One of the things that we do is validate the voltages at a device level. This comes back down to the device-level verification conundrum that you had previously for low-power, for example, where a lot of the low-power folks were thinking of UPF just at the gate level and not actually going down into the devices of what voltages are on each of the pins, what are the wells connected to how those who wells are biased, how are the bulk of the transistors biased. Understanding that whole system, particularly at the chip level, allows you to make some very good choices as to whether the device is going to be at risk or not.”

One way to do this is with ‘voltage propagation,’ which allows the engineer to holistically have a look at the entire chip by propagating these voltages and consider either the device is an ideal switch, so there is no voltage drop across it. Or it can consider both the forward and reverse bias voltage drops to give a more accurate and realistic idea of what voltages are going to be presented to each of these devices.

“That’s really how we try and encourage people to be more proactive about avoiding these sorts of situations. Instead of trying simulate the artifacts of what you are left with in the design, let’s be proactive and validate and verify that the design that you are putting in place is actually going to meet those reliability criteria because the reliability guys have told you the thresholds that you need to adhere yourself to,” Hogan added.

According to Donna Black, senior director of product line management at semiconductor design and manufacturing services provider eSilicon, “The bottom line is that before we would consider a product complete and ready to go to production, one of the key aspects of verification and validation is that we correlate. We want to make sure is that we prevent any of the parts that are going to the ATE test environment to go into the field and have issues on customer applications or system.”

An important part of this is the product/process qualification, which is performed to fully understand the reliability of the wafer fabrication, package assembly and device design. Tests are performed to determine the robustness of the combinations of manufacturing processes, product and design, as well as to emulate the device under stress over a period of time.

Upon completion of these tests, predictions can be made about how well a device will perform as it is aged in a system or application environment.

As seen in the table below, examples of these tests are High Temperature Operating Life, ESD, Latch Up, Temperature Cycling, Highly Accelerated Stress Test and many other tests depending on the end application market. Conditions are set during these tests to best duplicate the stress a device will endure during the application operation and life expectancy predictions are made based on the ability to meet such testing conditions.

prevent failures-table-20130821

Black said predictable product reliability performance is essential and critical for most semiconductor target applications. “Our customers have simply stated, ‘we expect the product to work without issue throughout its expected lifetime,’ which is typically greater than 10 years of 24/7 operation. Any realized field failures can result in equipment down time or loss of critical operation, which can have a cascading impact depending on the application and the end user.”

She added that eSilicon’s processes and procedures are aligned with JEDEC standard reliability tests that have been industry standard for many decades. However eSilicon optimizes which tests are performed to ensure they focus key resources on those critical tests. “We consider the application conditions and the stress conditions while relying on the traditional Arrhenius Equation to predict reliability performance. Due to the nature of eSilicon’s business model, we afford a level of flexibility in many areas, which includes qualification. We also review specific customer requirements and help customers understand the trade-offs and values of specific reliability stresses.”

Moving it up
Today, aging is done at the transistor level, but in order to be able to simulate the big systems, it has to be able to happen at the gate level. Synopsys’ Elhak noted that there are engineering teams who have done aging at the gate level by running millions of transistor level simulations on parallel machines. In one experiment, 100 CPUs were used and it ran for two or three days.

“In order to make this type of analysis appealing for digital designers, you need to abstract it and run it at the gate level, just like simulation versus static timing analysis,” he said. “With static timing analysis you’re running with a certain level of abstraction and you are getting the results faster. You can get the same result by simulation, but it will take forever. What needs to be done is taking this type of analysis and moving it up one level of abstraction.”

Leave a Reply

(Note: This name will be displayed publicly)