Sensing Automotive IC Failures

Improving reliability and yield will require new ways to identify defects in chips.


The sooner you detect a failure in any electronic system, the sooner you can act. Together, data analytics and on-chip sensors are poised to boost quality in auto chips and add a growing level of predictive maintenance for vehicles.

The ballooning number of chips cars makes it difficult to reach 10 defective parts per billion for every IC that goes into a car.  And requiring that for a 15-year lifetime just gets harder. Nevertheless, chipmakers and systems vendors are getting much better at identifying defects before they can cause a problem, both at time zero as well as years later on the road.

This is a non-trivial achievement. Detecting defects at advanced process nodes is proving extremely challenging due to subtle physics that cause small changes in electrical parameters. While those may go unnoticed at older nodes, at 7nm and below tolerances become tighter and the effects of exceeding those tolerances can cause quality and reliability issues. These issues may manifest in premature aging and signal interruption.

Identifying these subtle defect mechanisms using part average testing that relies on a single metric is insufficient. The costly alternative, however, is more testing, which likely lowers yields. And for reliability defects this approach assumes development of a test to identify these illusive defects.

So while traditional testing approaches remain essential, almost everyone recognizes those need to be supplemented with in-circuit data. By combining those two approaches, high-level defects can be identified early, and testing for less-obvious and in some cases inaccessible defects can continue throughout the lifetime of a chip. But this all needs to be baked into the IC design as well as the manufacturing and test processes, which is why fabs, IC designers, test houses and data analytic companies have been working hard to deliver these needs.

Automakers’ expectations
Automotive ICs always have been tightly controlled for quality. However, that was simpler when chips were developed using long established process nodes. As more AI is added into vehicles with assisted and ultimately autonomous driving, the required processing power increases dramatically, and the only way to achieve that in a given power envelope is to increase the transistor density.

And this is where things begin to get really interesting, because no one has ever designed a leading-node chip that would be used outside of a highly controlled environment. The initial idea was that all of these chips would be redundant, in case one of them failed. But as the cost and weight of redundant parts is better understood, rather than redundancy most automakers have focused more on reliability.

“There is a growing volume of semiconductors in automobiles, driven by electrification, connectivity and autonomy,” said Jay Rathert, senior director of strategic collaborations at KLA. “The advanced chips are being asked to do more and more — important roles that are both mission-critical and safety-critical. Yet when we spoke to the head of R&D at one automaker, he said, ‘You really haven’t built a car until you’ve argued over a nickel’s price difference on a part, because margins are so tight.’ So the last thing they want to do with these expensive, high-end chips is put them in there in duplicate for redundancy.”

This is particularly true for the centralized logic in these devices. Advanced levels of ADAS (4 and 5) require large computational engines for AI/ML applications that only can be supported by advanced process nodes. In fact, some automakers are now working on 5nm chips, but some of the companies developing these chips have little experience in this market segment.

“You have start-up companies designing AI acceleration SoCs,” said Tom Wong, director for marketing design IP at Cadence. “These new entrants do not have the deep bench of automotive knowledge compared to established automotive SoC suppliers. Oftentimes, they rely on third-party IP partners to deliver key IP blocks so they can integrate them into a final SoC.”

Chips designed for consumer electronics and data centers are changed out every two to four years. Chips designed for automotive applications, in contrast, need to work reliably for 10 to 15 years. Robotic cars add even more requirements. “For automotive SoCs, there are a number of fundamental criteria addressing quality, reliability and functional safety,” Wong said. “These requirements cascade down to the third-party IP itself.”

AEC-Q specifies test conditions and practices to meet automotive quality and reliability goals. For CMOS ICs, reliability includes transistor-level and interconnect characteristics. Ideally, the manufacturing test processes can screen defective die/parts for time-zero failures and end-of-life failures.

This is different from many other market segments, where finding early life failures at time zero and with reliability product characterization guaranteeing end-of-life is good enough. But in automotive the lifetime is longer, the operating environment is harsher in terms of temperature, voltage and mechanical stress, and a wider range of functional operating conditions must be considered. All the above, combined with the advanced process nodes, means business as usual is no longer good enough. And this is where on-chip sensors increasingly become so important.

More data from within a chip
The adage among statisticians is, “You’re only as good as your data.” On-chip sensors, also known as in-circuit sensors, provide deeper and more targeted data than standard manufacturing test metrics offer. That data provides the necessary intelligence upon which both engineers and the engineered circuits can act.

In the beginning, on-wafer sensors focused on process monitoring and only resided within scribe line structures. Their measurements continue to support wafer acceptance test (WAT) and provide feedback on fabrication process health based on direct measurements of transistor characteristics and interconnects, contact and via structural integrity, and printability.

Process variability across the wafer prompted IC designers to employ on-chip sensors to assess the process skew for that die. The relatively simple ring oscillator became the de facto circuit design of choice. The ease of measuring frequency as a proxy for process skew has enabled simple insights into process skew with the caveat that this is an inferred insight.

Fig. 1: Ring oscillator illustrated with three inverters. Source: Matthew H. Plough CC SA 1.0

Yet as process nodes have advanced, on-chip sensors have become more sophisticated and more widely used within the IC design to detect process variability that can adversely impact chip performance. “Process variability always been there,” said Andrzej Strojwas, chief technologist at PDF Solutions. “What has been happening is that due the complexities from shrinking devices, design rules have given little margin for error. It has become much harder to characterize a semiconductor process to fully comprehend how best to design circuits for large devices. Circuit design and process interactions have significantly increased. For instance, systematic layout sensitivities (geometric patterns/spatial relationships) weren’t a concern until 28nm process node.”

On-chip sensors are not restricted to just comprehending process variation. Designers use voltage and temperature sensors to modulate circuit/system behavior, as well as monitor chip-health in the system.

“You can place a number of these tiny sensors throughout the SoC,” said Cadence’s Wong. “They can be used for DVFS, slowing down the clock to reduce power consumption to reduce temperature.”

Such operational sensors can chronicle a chip’s in-use history. That data, in turn, can be combined with other chip data that would enable analysis of relationships between in-use, circuit design and process. Put simply, more data from different sources can be used to identify patterns of behaviors and much finer groupings of chip behaviors.

“In our approach we classify the chips into families based upon inferred data from on-chip Agents, from which we extract many dimensions of process and design behavior,” said Evelyn Landman, CTO and co-founder of proteanTecs. “We can say they will behave very similarity around 1 sigma for many, many parameters. So it’s a much finer resolution than doing it by grouping per wafer.”

These refined chip groupings enable more sophisticated data analytics, which can guide pass/fail decisions and dictate adaptive testing steps.

While ring oscillator technology continues to be used, IC designers and data analytic companies now embrace a wider variety of on-chip sensors. This sensor menagerie can be used to comprehend what is going on in a particular part of a die, circuit or subsystem.

System-level learnings from your car
Despite defectivity targets, the automotive industry is preparing from a system perspective for reacting to a failure. With the projected assisted driving features in mind, ISO 26262 began to specify IC component requirements for functional safety as far back as 2011. The standard requires on-chip monitoring to detect random hardware failures due to extrinsic or intrinsic forces.

“From an IP perspective, we provide the capabilities from a functional safety standpoint in addition to qualification and reliability,” said Synopsys Senior Staff Product Marketing Manager Faisal Goriawalla. “These hooks enable the SoC’s different test phases. A safety manager will initiate, manage and schedule these tests. There is a safety network to provide the infrastructure to bring together all the mission-critical memories, logic, PHYs, as well as the in-field mission-mode testing”

That’s one side of reliability testing. In the field, periodic testing is required to support functional safety, similar to the kinds of tests that an airplane pilot runs prior to takeoff. But because no one will sit around and test their vehicle every time they hit the ignition, those tests need to be done periodically, such as while charging or while idling at stoplights, or when some aberrant behavior trips an internal alarm.

The periodic testing required to support in-field functional safety provides on-chip sensor data. More analytics potential exists in functional safety periodic testing due to redundancy built into IP blocks and subsequent self-repair.

“The test solutions that our customers are using for functional safety have a certain degree of self-repair built in,” said Lee Harrison, Automotive IC Test marketing manager at Mentor, a Siemens Business. “The up-and-coming modern automotive devices delve into AI, which may have hundreds or thousands of identical processor cores. If you find a defect in a core, then you can switch out that core because you have two or three spares.”

This switch can occur during manufacturing test to boost yield, or while driving the car to handle failures. This incremental repair strategy for system failures generates insights into reliability, as well.

“That data becomes very useful,” Harrison said. “You can start to see how defects appear over time. ‘Am I having to repair additional memory elements, or am I having to swap out different cores?’ We can start reporting that data, and whether it’s the OEMs or Tier 1s looking at the data depends upon where it sits in the ecosystem.” The data can now be made available.

The redundancy data then can be combined with other sensors like temperature, power and workload analysis to understand the system in-use data. A deeper dive into circuit-level behavior provides insights that might explain the need for repair.

Refining outlier tests
Chips in cars must work the same 15 years after passing production tests. Design engineers rely upon aging analysis and best-known design for reliability techniques to meet such stringent goals. Yet aging analysis comes from simulation, and thus has limitations based on assumptions being made in that analysis, including operational, environmental and transistor physics.

“Automotive Tier 2 IC design companies are super sensitive about reliability throughout the lifetime of the product,” said Dennis Ciplickas, vice president of Advanced Solutions at PDF Solutions. “Optimizing performance versus reliability tradeoffs, such as VDD values, clock speeds, and temperature mission profiles, is necessary but insufficient. Sensors that give insight into the physics of failure should be included, as well. Simply knowing that a product failed in the field doesn’t give insight as to why it failed, and data from RMAs and 8D processes is precious but extremely limited.”

The on-chip sensors focused on circuit performance provide this additional insight and come in several flavors, such as monitoring the duty cycle of a critical PLL or a set of PLLs on chip, said Synopsys’ Goriawalla. “We have a capability called the Measurement Unit, as part of our DesignWare STAR Hierarchical System (SHS), which integrates some of this clock and process monitoring capability by tracking these embedded sensors and monitors to record and confirm that these measurements meet some certain criteria on silicon.”

Timing margin and speed sensors represent design-specific on-chip sensors. In a 2018 International Reliability Physics Symposium paper, STMicroelectronics and TIMA Laboratory proposed critical-path sensors to monitor process and aging effects using the replica circuits concept. A simulation study of in-situ measurements of actual circuit performance showed promising results as far back as 2013 from joint research between UCLA and Arm (presented at the Design Automation Test Europe conference). These have begun to be used in production designs.

“In the field we provide Margin Agents, which measure the margin in the design itself,” said proteanTecs’ Landman. “It’s not a separate circuit standing on the side. It’s really measuring the margin of millions of paths of the design while it’s working. Then machine learning analytics is applied to provide context and insight to those measurements.”

Design engineers find value in such sensors during silicon bring-up. The ability to observe the degradation in frequency over time brings value to reliability engineers. “A degradation of frequency is a very good precursor of failure for many physics of failure mechanisms like NBTI, electromigration, and hot carrier injection.”

On-chip sensors also support production test processes. For IC designers and test engineers working in the automotive space, meeting the AEC-Q100 requirements and the low defectivity targets becomes a tug of war between yield and quality. Stringent fail limits may meet quality goals but often sacrifice yield, so engineers strive for that balance by screening for outliers. Part average testing (PAT) screens for outliers using static or dynamically set statistical limits which has typically been done on a single measurement. But outlier detection becomes even more useful when multiple measurements are used in a multi-variant analysis. On-chip sensors provide useful measurements in this space.

“Knowledge that circuit or performance is drifting on one chip is difficult to generalize into predictive action beyond that chip,” said PDF’s Ciplickas. “On-chip sensors that track systematic trends in structural and electrical characteristics can show the underlying physics of the trend. Our experience from manufacturing and test is that models based on physics have more predictive power.”

That, in turn, can be used to find outliers that are not obvious without impacting the yield.

Couple that with new machine learning algorithms and domain expertise and the picture becomes much clearer, which is especially needed for complex devices used in safety-critical applications.

Identifying issues during fabrication
Detecting defective ICs before they can get into your car meets the quality goals. Finding them earlier in the manufacturing process achieves the same thing, while also supporting yield learning and providing insights for adaptive test flows. That requires some highly specialized on-chip sensors, though.

Fab process engineers and test engineers rely upon the electrical structures within scribe lines to detect processing errors. Those structures provide feedback on transistor properties, fundamental and physical design structures, as well as the quality of the wafer. Distinct from the architectural and circuit-based sensors already discussed, these structures have evolved to understand the intricacies of design-process sensitivity.

“Starting with the 28nm process node, the process step to design sensitivity has become challenging to predict with just pre-silicon simulation,” said PDF’s Strojwas. This has prompted us to closely work with our customers on characterizing the silicon to process interactions with characterization vehicle test-chips and product-specific scribe line structures.”

With recent process nodes, these on-chip sensors have been developed to find defects that cannot be picked up by optical scans of wafers. Those defects do manifest electrical signatures, such as small leakages, increased resistance, and tiny Vth shifts. A purpose-built on-chip sensor can detect these subtle manifestations. A 2019 Electron Device and Technology Manufacturing paper describes PDF Solutions’ approach using design-for-inspection circuit structures and a transistor/resistor array circuit structure.

Figure 2: DFI filler cells. Source: PDF Solutions

Design for inspection structures are tiny and equivalent to the filler cells used to populate the empty space in physical designs. Thus, a single large die could contain tens of millions of DFI on-chip sensors, which provides a spatial resolution not possible with other on-chip sensors. DFI filler cells are designed to catch tiny leakages, as well as shorts and opens, and they can illuminate leakage path issues at FEOL/MOL/BEOL process steps.

Detecting defects early saves cost- yield/quality. During an IC’s lifetime in a car, detecting defects can establish maintenance needs and enable safer automobiles. On-chip sensors play a greater role in detecting due to machine learning algorithms, which can effectively leverage the insightful richness of their data.

In-field data completes the end-to-end analytics required in this sector. “The ability to provide end-to-end (aka birth to product end of life) analytics has arrived,” said PDF’s Strojwas. “The industry now has data from every process step that can be combined with t=0 test and t>0 in-field-data. The business problem is having access to all that data. Machine learning algorithms on a subset of this data have limitations on the actions that can be taken. For the most effective machine learning from on-chip sensors, incorporating information about the die/parts history prior to in-field operation provides customers a much richer set of analytic capabilities and actionable decisions.”

And this is just the beginning for carmakers. “In the next couple of years this is going to become more and more mainstream,” said Mentor’s Harrison. “By adding additional structures in the design and looking at other metrics in the design, there is even more data you can collect. Going forward the OEMs are going to start to define what data they are going to want to collect out of these systems.”

The sooner you can predict a failure, the sooner you can take action to reduce IC manufacturing cost, and the safer and more trouble-free a car will become.


DJG says:

illusive defects == imaginary defects?

Leave a Reply

(Note: This name will be displayed publicly)