Circuit Aging Becoming A Critical Consideration

As reliability demands soar in automotive and other safety-related markets, tools vendors are focusing on an area often ignored in the past.

popularity

Circuit aging was considered somebody else’s problem when most designs were for chips in consumer applications, but not anymore.

Much of this reflects a shift in markets. When most chips were designed for consumer electronics, such as smart phones, designs typically were replaced every couple of years. But with the mobile phone market flattening, and as chips increasingly are used in automotive, industrial and medical applications, reliability has become much more important. Aging is a major component of reliability, and concerns are even starting to spill over to chips designed for mobile phone devices. Numerous industry insiders say mobile phone OEMs are demanding that new chips last at least four years rather than two, and in other markets they may have to remain functional for up to 20 years.

“For automotive, we want it to last longer, and now a car is the largest piece of electronic equipment we own,” said Seena Shankar, senior principal product manager at Cadence. “With self-driving cars, it’s something that’s very important. Aging is something that everyone is talking about and it’s affecting not just automotive, but IoT, which is growing and getting into all spheres of our life as it gains traction. It is an important problem that we all have to figure it out and solve soon.”

This creates additional problems, because IC aging is extremely context-sensitive.

“Two devices that were manufactured identically experience a different set of behaviors, such as a car driven in California versus a car in Massachusetts or Alaska,” said Joao Geada, chief technologist for the Semiconductor Division of ANSYS. “They experience a completely different set of aging patterns. In this way, everything is very context-sensitive, so the solutions that we need to look at it have to be able to address that, or in some way guardband or mitigate against expected variability between how devices actually are going to show up. It’s not just automotive. Automotive is certainly a big common area in demand anytime a device is put in a long-term critical position, sometimes because it’s life critical as automotive and aerospace, sometimes because it’s just really difficult to get to the device to maintain it. You find that in industrial IoT, baseband stations, and wind generators out in the sea where the only way to maintain them is with a helicopter. For anything that has a long lifetime and is difficult to maintain, you want to make sure the device will continue to work in its predicted environment long enough to get the return on investment that people want. So it covers a very large spread of things.”

Interest in aging has been escalating, particularly as chips find their way into more safety-critical applications.

“The automotive industry today is $35 billion,” said Siddharth Sawant, member of technical staff in the Design Enablement Group at GlobalFoundries. “It’s going to keep growing in the coming years with greater autonomy, so reliability becomes a major concern for a lot of these automotive chips. ADAS or advanced driver assistance systems are subject to very harsh weather conditions, and they have a long product life. There is a lot of sensing and processing going on in these modern automotive chips. Hence, aging becomes very critical for a lot of our customers in terms of timing sign off.”

The same is true for IoT and other safety applications, Sawant said. “Aging is a topic that is fairly complex. In terms of timing sign-off, it’s not well understood and there’s a lot more that needs to be explored. Designers today are mainly relying on a degradation-rate-based solution to model aging and adding more guard-banding to timing sign-off, which results in pessimistic designs. We want to come up with solutions that are more accurate to model aging.”

Others agree. “Specifically in the mobile area, we are sensing that the consumer is not necessarily going for the new version as often as in the past,” said Greg Curtis, senior product manager for AMS Verification at Mentor, a Siemens Business. “Therefore, the manufacturer, the developer is having to ensure that that product has a longer life lifespan than perhaps previously was the case.”

What isn’t always apparent is the connection between aging and functional safety, and this is particularly true in the automotive market.

“For autonomous vehicles, we go into a safe state, but we don’t have reliable reliability data,” said Jörg Grosse, product manager for functional safety at OneSpin Solutions. “If you go to a foundry and ask for base failure rates for particular failure or fault types, you typically don’t get them. This is a very important input for the whole calculation process. In the end, what we really want to do is calculate the FIT (failure in time) rate for an IC, and if you don’t have reliable base failure rates, you can’t calculate it. Today it goes as far as still using a very old Siemens model for that data, and only to compare the chips with each other. We basically say the FIT rate is definitely wrong because we don’t have the right data, but at least if everybody is using the same data we can compare the chips with each other. That’s the state of the art.”

How to model aging
Much of the aging data today comes from foundries, which understand exactly how the transistors it manufactures will age. That data increasingly is being shared with EDA companies to ensure that aging models used for designs are accurate. Those models, in turn, are used to create timing solutions and sufficient margining for transistor derating.

“From a simulation characterization perspective, we use the models, we use those parameters and run some stress simulations,” Shankar explained. “We then come up with a model that can then be translated into a Liberty file. Once we’ve stressed the SPICE model and come up with an aged netlist, we can run it through a duty cycle for a specific number of years. Most of our customers are doing an analysis of 1, 5 or 10 years. I haven’t seen people looking beyond that. Then they’re trying to study how the transistor behavior is degrading. We always talk about the bathtub curve, where the rate of failure is pretty high in the beginning, then it stabilizes, and then there are failures that comes in the end. That’s where aging comes into play.”

Many factors go into aging, she noted. “It could be the frequency of use, temperature, process variation. All have an effect on aging. There are a lot of different parameters that go into it, and it should be handled in Liberty files so that the data is captured. After that, the libraries go through static timing analysis or timing and placement. Then you can really see the effect of it. That’s going to be quite expensive because it’s a lot of characterization. Most of our customers use a derate effect, but I call it a tax. It’s a flat tax you’re applying to all of your transistors across all cells and models, and this is where you tend to leave a lot of margin on the table. If you make it specific for each slew or load, or make it very granular, then it becomes more accurate. But it’s a long way to go to get there.”

That type of specificity is accurate, but it’s also expensive.

“That’s the catch,” Geada said. “I personally dislike margining strategies. Margining is a good crutch. It’s a way of moving forward when you no longer have models at the appropriate abstraction levels, but it’s still a crutch. It’s not an engineering solution and you can never correctly margin. If you margin for the worst case, you can’t build a chip that works with the PPA that you need, so you have to compromise between how much risk you’re willing to tolerate versus what kind of designs you can sell. There is never a good tradeoff there because particularly when we’re dealing with life-critical applications, you have to account for the worst case. When a device is used constantly in a heavy load model for aging, particular stress patterns exaggerate things. An Uber-like vehicle, whether fully automated or not, has a completely different use model than the standard family car that actually stays parked in a particular state a lot of the time, even though the electronics are always somewhat alive. There’s a completely different aging model and you can’t guard-band both cases correctly. While we understand the physics, we need the parameterization that the foundry gives to construct the model that describes how their particular transistors are being described or how they age, but then that needs to be applied. This is where I disagree with the library approach. The library is easy, but you need to do it context sensitive in what the design is actually doing in the simulation patterns that the design actually experiences. You need to age in place in the design, not an ideal situation at the library level.”

And this is where better utilization of data can help.

“The foundry now provides the added piece of the aging equations and the parameters that affect aging over a period of time,” said Mentor’s Curtis. “If a device is going to break down, it’s going to happen very early—not necessarily late. So many customers tend to age in very short increments, from zero to two years, and then maybe then they spread it out after that because the breakdown happens very early.”

Specifically, Sawant explained there are different solutions to model aging. “One solution adds derates and margining. Other solutions are more accurate, which is where you add the aging data in the library. Library aging is more granular, because there is aging data for each individual cell in your design, versus derate where you apply a flat tax on all cells in the design, so it’s not granular and hence it’s pessimistic.”

An aged library can be much more granular. “Based on our experiments we’ve seen in terms of timing set up, we’ve seen about 20% improvement when we compare this to a derate-based solution,” he said. “In terms of hold, you’ve seen more hold violations with the library-based aging, which means that it helps to capture critical fixes that might have been missed using a derate based solution. In terms of leakage, we see about 7% to 11% improvement. Again, the derate-based solution does not model leakage improvement because of aging. With aging, there’s a VT shift happening. There is an increase in the threshold voltage of the device. The derate-based solution does not capture that. Aged libraries capture that and help to pass on that benefit to the designers. Also, slew modeling is a very critical degradation that happens because of aging that’s not captured in the derate. So essentially there are two ways of doing things. One is the easy, a less expensive way. The other is a more accurate way of modeling aging. It typically depends on the customer. Right now, aging is becoming a topic of hot interest but not a lot is known about it.”

Ideally, a customer would go with just one approach, Sawant said. “Because margining is crutches, to start, when there’s no kind of data available, you would start with a derate-based approach just to get going and then move on to an aged library, which is more accurate.”

Changing the mindset going into design may be another challenge. Curtis observed that initially customers design with margin in mind. But increasingly they cannot afford to leave margin on the table because of the cost of fabrication and the impact on power and performance. “Aging is one of the areas they are looking at to basically squeeze out every bit of margin that they can out of a design. It’s a much more accurate approach than just derating the whole design.”

Moreover, margin typically isn’t just one thing. It’s actually a stack.

“The foundry, with the models that they give us, includes a little bit of padding to cover themselves,” said ANSYS’ Geada. “And then the library vendor adds a little bit of padding and nobody talks about what that is, but everybody adds up this stack of margin along the way. As a result, we’re no longer really approaching what silicon can give us. The approach that we have, the engineering that we need to do to get workable devices reliably, inherently leave a lot of margin on the table. The world has become so competitive that we’re now needing to peel off all of these layers and get closer to bare metal and really provide an advantage, because that’s how you differentiate your design versus somebody else.”

Related Stories
Why Chips Die
Semiconductor devices face many hazards before and after manufacturing that can cause them to fail prematurely.
Chip Aging Becomes Design Problem
Assessing the reliability of a device requires adding more physical factors into the analysis, many of which are interconnected in complex ways.
Minimizing Chip Aging Effects
Understanding aging factors within a design can help reduce the likelihood of product failures.
Aging In Advanced Nodes
Why aging and reliability no longer can be addressed with margining in finFETs and automotive applications.



4 comments

Jon Peddie says:

What design precautions were taken for the long distance long running space probes that in addition to aging have to withstand cosmic and planetary radiation?

Charles R. says:

Like your Article. Makes very good points. Think we are reaching the point where even consumer products must have a built in self-check or even some type of calibration cycles esp. with RF Device, to address aging and drifting over time.

David Botma says:

The article discusses aging and begins to include a “real world” approach to estimating the safe and useful lifetime of a circuit. The next step is to identify the mitigation of aging and the avoidance of circuit failure consequences. Should there be an estimated lifetime provided with a circuit? Or perhaps include an elapsed clock feature that can be queried during circuit usage?

Joe Dickson says:

The density of PCB’s is not allowing redundant sensing circuits as the challenge of even routing the basic required circuitry is substantial. Surrounding sensitive signal vias with ground vias also is increasing. These pressure points that were solutions for previous PCB technologies are not available for the long-life requirements on the 5G applications. I see only 2 options today: Dual systems with switchover capability, or new reliability standards (and materials) for PCB’s. This isn’t really being addressed yet at a industry level. Only certain high performance applications are looking at methods to solve this issue.

Leave a Reply


(Note: This name will be displayed publicly)