Unknowns Driving Up The Cost Of Auto IC Reliability

More testing, metrology, data are required, but some chips will still fail.


Automotive chipmakers are considering a variety of options to improve the reliability of ICs used for everything from sensors to artificial intelligence. But collectively they could boost the number of process steps, increase the time spent in manufacturing and packaging, and stir up concerns about the amount of data that needs to be collected, shared, and stored.

Accounting for advanced process node variation, electrical and mechanical stress, and accelerated aging in an extreme environment collectively add up to a huge challenge, for which there is no precedent in mass-produced semiconductors. No one has ever used 5nm chips for safety-critical applications in extreme environments, and there is no industry learning about the behavior of different types of sensors, logic, and memory crammed into a single packaged system that relies on novel partitioning and prioritization schemes. This is made worse by the fact that automotive chips typically are exposed to various types of noise (power, electromagnetic interference), continuous vibration, and wide swings in temperature over long lifetimes.

Even under the most controlled conditions — especially inside a data center, where the temperature of server racks is closely monitored and controlled through air or liquid cooling — advanced-node chips can behave differently from each other. Those differences can become more pronounced over time, too. But in automotive applications, none of those controls are present, and potential problems are more difficult to predict. The fallback in new applications has always been industry learning, but in the case of automotive, that learning is limited at best.

“A few years ago, the traditional automotive companies were comfortable with mature technology nodes,” said Lee Harrison, automotive IC test solutions manager at Siemens EDA. “They had massive amounts of data that told them how long these systems would last and what the aging process was like. A lot of them still had their own fabs, as well, so they were pretty confident in their technology. That’s changed immensely. So instead of 120nm processes, now they’re at 14, 7, and 5nm. They don’t have the data to back that up. They don’t know if these things are going to last 1 year or 10 years. So we’re working with them now to effectively build those data models, which will allow them to predict when these chips are going to fail.”

Jay Rathert, senior director of strategic collaborations at KLA, agreed. “How can the whole industry take yield learning — which turns into reliability and used to take years — and shrink it down to a period of months so we can be confident in putting a 5nm chip in a mission- and safety-critical role? That is the big question. It’s what’s driving the discussion across multiple companies. What can we do differently? One of the interesting things the OEMs are looking at is not just maturity, but any change. They say they want continuous improvement, but they also are risk-averse. So they want to be able to qualify change so it brings improvement in a reliable, repeatable, yet accelerated way,” he said.

Fig. 1: Not all latent defects will ruin a chip. Source: KLA

Business issues and benefits
The market dynamics for reliability are becoming increasingly complicated. In the past, the onus for was on the OEM to make sure everything worked. Many automakers have carried that business model over as they swap mechanical for electrical parts, and add more driver-assistance features into vehicles. But as they make those transitions, they also are drawing from a wider sphere of new technologies and a supply chain filled with many different players. That makes it more difficult to characterize the various components, because they need to work with other components that were created without knowing the exact context in which they will be used. In many cases, development timelines don’t mesh, and because these designs are changing so often and so quickly, they may not be as compatible as automakers expect them to be.

The solution is to build in margin, but that’s expensive, both from a financial perspective as well as a resource overhead standpoint.

“Today, it’s being handled with redundancy,” said Mike McIntyre, director of software product management at Onto Innovation. “When you used to design a car, you’d say that axle needs to support 2,500 pounds of load, so I’m going to design it for 7,500 pounds. With electronics, you say you’re going put in a backup system. The problem is that’s very costly to create, and you can’t think of that like a structural engineer where you add in those safety margins — especially at the bleeding edge. You can add a little more margin at the older technology nodes, but not always.”

On the flip side, electrification and digitization of data can provide a more detailed picture of what is going wrong even before there is a problem, and that can be traced back to the source of the issue on a very granular level — a section of a wafer, timing of individual processes in manufacturing, why one die or chiplet is better than the one next to it, and so on. Moreover, that data can flow in multiple directions, from the field back to the manufacturer and OEM, and even across the supply chain when and where it makes sense.

“When you look at automotive design optimization, you’re typically talking to a Tier 1 company,” said Steve Pateras, senior director of marketing and business development at Synopsys. “It’s the same for production optimization, where you’re talking to a Tier 1 or even the OEM. But when you get into the field, you’re talking to rental agencies and companies like Uber, because you want to optimize performance over time. So the cone of opportunity expands as you go down into later lifecycle stages. That also means there’s a lot more complexity in terms of business models and the selling process for the chip ecosystem, but the opportunity is much, much bigger.”

One result has been a subtle but noteworthy shift toward resiliency. In the past, the primary focus of OEMs was zero defects, which was an obvious carryover from the mechanical world. While still important as electronic content increases, latent defects will always exist. More often than not, they will go undetected and never cause a problem. But under extreme stress, they can result in a component failure.

The cause of this defect can range widely, from process variation that is endemic at advanced nodes, to chemical mechanical polishing, which can thin an already-thin dielectric to the point where it is more susceptible to heat or vibration. To make matters worse, different use cases and unique architectural demands can affect whether those chips work as expected throughout their lifetimes.

So rather than just focusing on zero defects, which is always a goal, there is a growing need for systems to be able to continue working — especially in conjunction with other systems. Some of this is set out in the ISO 26262 standard, which provides for graceful failover in case of a problem. But resilience can push well beyond that standard, allowing vehicles to continue operating without any noticeable impact throughout their lifetimes.

“We’ve got extensive capability in the manufacturing test side,” said Siemens’ Harrison. “But that only ensures these devices go out the door as defect-free as possible. And then we take over with system test and embedded analytics. Those analytics also look at other things, like bad software, cybersecurity attacks, and any other strange things.”

This is no longer optional, particularly in safety-critical designs, where a faulty chip could result in an accident. “Automotive companies create a mission profile for a device, and they show the curve of where the device was expected to be and where it actually is,” said Uzi Baruch, chief strategy officer at proteanTecs. “But when there’s a variance, they don’t always know how to explain it. What we’re doing is accurately projecting statistical variance, but based on actual measurements and correlation to usage. If you’re buying an expensive car and it gets stuck, you would be annoyed that it failed, even if statistically 98% of those same cars didn’t have that problem. We need to be able to assess each device in the car, and compare that to how you expected it to behave. It’s not about mean time to failure for all of the cars. It’s about each specific car, which may be operating in different environments, under different operating conditions and different driving styles.”

In effect, this is more customized and personalized data, and it can have a big impact on user experience and brand affiliation. It also can play a significant role in improving safety, because rather than one-size-fits-all rules, the data can instigate a workaround in hardware and software that keeps the vehicle from stopping or crashing. In addition, relevant data can be fed back to the carmaker to determine what went wrong and how to avoid future problems — not just for the chip, but in the context of packaging and other systems.

“If you look at the statistics in the automotive industry, more than 50% of failures are due to packaging,” said Andrzej Strojwas, CTO at PDF Solutions. “That does not end at wafer manufacturing. It includes the whole spectrum, like final test, packaging, and so on. And the assembly process is very complex. That’s important for many industries, especially automotive and military. We need to track where a particular chip came from, what part of the wafer, but that’s just the beginning of the story. In the assembly process, you have to understand substrates and what is happening with the bonds when you do the attachment. And this data has to be represented and then acted upon to predict early failures.”

Data retention and utilization
That data is extremely useful, but it also raises other questions, such as how long data is required to be kept for the various components in a system design. This becomes especially important with IP from startups, which can include everything from software to chiplets or integrated chipsets.

Danielle Baptiste, vice president and general manager for software at Onto Innovation, said that with banks and other high-security applications, data is required to be kept for 5 to 10 years. But with automotive and some industrial and aerospace designs, that data may have to be kept much longer, raising security and competitive issues.

“Data retention has lots of complexities around how certain pieces of data were interlocked with each other, when you could put them in cold storage, or completely purge them,” Baptiste said. “The other side of this is focused on the fact that I may need to use that data within the next 12 months because I want to do trends and analytics. It might be a little bit longer than a year, depending upon how many different fabs are involved or your relationship with your customers, and how many different vendors that end up involved in the whole traceability aspect of the solution. But what you share between those different segments — customer and vendor — becomes interesting from a security standpoint. How much of the data am I revealing? Is it just pure data, or is it my process? And are you making sure that we’re able to shield that effectively?”

What’s also changed is how that data is being used. “You want to share data across the lifecycle stages,” said Synopsys’ Pateras. “We haven’t talked about this in the past. If I have knowledge about my wafer level test, or even design characterization information, I may want to use this in the field to understand trends. If I get field failure information, or delay of a signal because the signal path degrades over time, I want to be able to cross-correlate that to my original wafer data and maybe even drive that back into the design. So there’s definitely this desire to feed data forward and backward. That works especially well if you’re a vertically integrated company. But if you’re not vertically integrated, you have to figure out how to share some of that data. That’s still somewhat of an unsolved problem.”

More of everything
One of the other changes underway is simply doing more inspection, metrology and testing. While this isn’t necessarily different than in the past, there is certainly more of it, and it’s done for longer periods of time, and often at more insertion points from the design-through-manufacturing flow.

“We see a real trend from the customers we’re serving for ‘severe’ metrology,” said Samuel Lesko, product manager in Bruker’s Nano Services Division. “Before they were batch sampling and not really paying attention to quality control as closely because the processes were stable. Now they’re asking for multiple systems that are fully automated and really controlling every single wafer. They need to be able to assure their customers, through the supply chain, that they have manufacturers that are complying with a specification. Within metrology, there is a total shift toward more consistent and systematic control. Every wafer is stamped that it has the right etching depth, the right high for each layer being deposited, and the right depth for a via used for high-power electronics or for sensors used in automotive.”

The same is happening in inspection, where automakers are requiring 100% coverage. “Even if you have done everything functionally, and you checked it all out before the chip leaves the fab, as it goes through subsequent downstream processing —heating things up, soldering, literally gluing parts together, wire bonding — things shift,” said Subodh Kulkarni, CEO of CyberOptics. “So even though you’re not supposed to move anything, things apparently move enough to create a problem. The chips or packages are coming down the line and everything looks okay functionally, but by the time it’s shipped out into the field, you may have infant mortality or a failure six months down the road because the I/O check was done before everything settled into place. Everything is not as settled as you would think it should be.”

As a result, every process is more focused, and often more time-consuming and expensive. “We do see significant accuracy requirements that are demanding more from a tester than previously,” said Chuck Carline, senior manager of factory applications in Teradyne‘s Precision Power & Analog Division. “Because this is a safety space, there is an increased test intensity for high-voltage stress. The idea is that it will activate defects in the silicon that might otherwise be hidden. It’s a way to weed out latent defects, replacing what would previously be a burn-in step. It’s a different way of attacking the same problem.”

That adds other potential concerns, though. High-voltage stress can damage a chip or package, so it needs to be done with great accuracy. “Everyone wants to stress it as high as their process will allow,” Carline said. “It’s a little bit of each manufacturer’s secret sauce as to how long they do that, because you need to make sure you’re not damaging the part. They will do things like post-stress leakage to make sure parts are still good after the test, and that there are no issues with the gate oxide in the BCD (bipolar-CMOS-DMOS) process for analog chips.”

Others point to similar requirements. “The IC manufacturers will have more stringent requirements of the IC tester,” said Don Blair, business development manager at Advantest. “Automotive SoCs need high-accuracy DC and high power, so high voltage and currents, as well. That’s all required in an SoC tester. No one will buy the tester unless it meets all those requirements. We’re basically testing everything in one insertion. But for reliability, there may be more insertions, like in baking ovens. That’s another way to test the robustness of the chips. There’s also something called ‘shake-and-bake,’ where they have both mechanical vibration and heat to identify the weak chips.”

Once Tesla began gaining market share, automotive OEMs had little choice but to jump on the electrification bandwagon and race toward increasing levels of autonomy. But this turned out to be a lot more complicated than it initially sounded, despite the potential opportunity and the risk of not taking appropriate action. It took nearly a decade before carmakers recognized what it will take to create an autonomous vehicle and what can go wrong along the way.

This is reflected in the amount of funding pouring into startups for everything from autonomous systems, battery technology, displays, power semiconductors, and sensors. Still, not all of those companies will survive, and what becomes of their IP when they are sold, merged, or disbanded is unknown at this point. And while the various manufacturing processes have been proven to the point where defectivity is minimal and yield is acceptable, plenty of unknowns remain. In fact, they increase as automakers progress from L1 and L2 all the way to L5, pushing reliability and safety in directions for which there is no precedent. And that should keep the entire automotive chip supply chain busy for years to come.


Martin Buehring says:

Great article Ed. It hits on the next testing and diagnosis challenges of semiconductor aging.
Without sufficient history it remains an unknown for now

Ray Barrett says:

Super awesome read. Enjoyed your turmoil as once upon a time I worked for a startup that was fully integrated from growth to packaging. Of course we were growing AlN every vendor we delt with was confident they could do whatever we needed. Example a polishing system, three different vendors said yup we can do thos or that with wafers. Turns out they had no data with AlN. We had to create every process from scratch. Long story long changing technology is hard work. Keep up the good fight.

Leave a Reply

(Note: This name will be displayed publicly)