The effects and mechanisms of chip aging, plus how to predict a device’s lifetime.
A Q&A with Moortec CTO Oliver King.
Why is understanding your chip’s age important?
Semiconductor devices age over time, we all know that, but what is often not well understood are the mechanisms for aging or the limits that will cause a chip to fail. In addition, there is bound to be a requirement for a minimum lifetime of a device which will depend on application but could be two or three years for a consumer device and up to twenty-five years for a telecommunications device. Given that lifetime requirement and often poorly understood aging processes, many chips designed today are over-designed to ensure reliable operation. If you understand that aging process or better still can monitor the aging process then you can reduce the over design and potentially even build chips that react and adjust for the aging effect, or predict when that chip is going to fail.
Chips at the moment are not getting anywhere near their total lifespan because, in most cases, there isn’t any in-chip monitoring taking place. I sometimes use the analogy of a rental car which you want to give back with an empty fuel tank. If your chip has a defined lifetime, then you want to run it as hard as you can to just perform within spec for the lifetime, or looking at it the other way, you want to hand your rental car back just as you run out of fuel.
What are the effects and mechanisms of ageing?
There are a number of mechanisms which contribute, the most notable ones are electromigration, hot carrier effects, and bias temperature instability. Whilst some of this can be mitigated through design techniques, and CAD tools exist to help with that, they can only go so far. In the case of bias temperature instability, the mechanisms are not fully known. Whilst traditionally only negative bias temperature instability (NBTI) was considered an issue, now, with the introduction of high K metal gates at 28nm positive bias temperature instability (PBTI) is now a problem as well.
The result of BTI is to raise threshold voltages, and the effect is very temperature dependent, so without a good model of device use it is hard to predict and thus design for. In addition, aging effects in general are, by nature, hard to measure because it takes a long time even with acceleration techniques such as HTOL to get a device to end of life.
How can we help predict device lifetime?
From Moortec’s perspective, we are working on monitors that can be used to measure the aging process of a device in the field, by having reference structures and comparing them to live structures, we can compare the two over time. This is one application that is being used at the moment, alongside using the information to adjust the supply to bring the chip back to the performance level that you expect, or need. This is actually quite common, particularly in devices where there is a requirement for a particular throughput.
How does this help with choosing the lifetime of your chip?
The thing is that aging is complex and very dependent on use case and environment. In most modern applications neither of those is well known and often will vary over time itself.
If we take the smartphone as an example, there will be modes where it is doing very little – where the clock frequency is low, the voltage supply is low. At the other extreme it will be playing HD video – the clock will be run at high rates and the supply will be correspondingly high. Obviously, if you took that device and left it in the low power state it would age at a significantly lower rate than if you left it in the high power state. The trouble is, at design time you don’t know what that ratio is. Of course, this example is actually already a simplified case because more often than not there will be more than two states so you have to make assumptions about time spent in each state, and build margins in to cope with the unknowns. By allowing the system to monitor that aging, then, potentially, you can optimize DVFS schemes, you can predict lifetime or, perhaps, even reign in certain modes to insure that a particular lifetime is met.
Another example is the bitcoin mining application. This is at the other end of the scale, where devices are manufactured to sit in large arrays. Each chip will vary with process and they will age differently partly as a result of process variation, and partly because their loads won’t always be equal. If you can monitor all those conditions, then you can optimize each of those chips to run at peak performance.
Nice article capturing the methodology of on-device monitoring! Simulation using device models is another faster, cost effective method out there to predict lifetime of IC components.