Degradation Monitoring – From Vision to Reality

How to use in-circuit monitoring and off-chip machine-learning to improve reliability and predictability.


Reliability physics has historically focused on models for time-to-failure, but that approach is reaching its limit. Those models generally were developed using data gathered from very simple test structures that could be stressed to failure. Today, with electronics playing a such a critical role in our everyday life, failures are no longer an option. The underlying ICs being implemented call for mega-functionality, Nano-scale manufacturing processes, advanced packaging and eventually, ceaseless use. Manufacturers must reach volume production quickly and efficiently, and adhere to strict requirements, especially in applications that demand high field reliability, such as Automotive, Datacenter and Telecom. A paradigm data-centric transformation is needed on a much broader scale.

So what will be the next big step in ensuring that sudden chip failures never occur? This is a vital question that must be answered as mission-critical electronics grow in complexity and scale, and take stage in almost every aspect of our lives.

The key is not just to identify failures, but to be able to predict failures in electronics. This is all about the Physics of Failure, estimating the remaining time-to-failure and creating alerts in advance. The next paradigm shift in reliability assurance is in performance-degradation monitoring and analysis as a precursor of failure.

Multiple physical mechanisms (HCI, BTI, EM, SM, etc.) demonstrate continuous degradation well in advance of failure. Relatively small monitoring circuits strategically placed and connected in many locations on the chip can be used to forwarn of chip-circuit degradation and send alerts to the user of impending failure.

One such approach utilizes a combination of IC embedded circuits (proteanTecs calls these Agents), and off-chip machine learning algorithms that infer the digital readouts of circuits during their entire operational lifetime. The margin degradation of the ICs, as well as other vital parameters of the IC and its environmental stress are continuously monitored, predicting and preventing potential failures before they occur, and point to the Physics of Failure – providing an estimation of their time-to-failure.

There are several types of Agents:

  1. Process Classification and Design Profiling Agents. These can classify chips into “Families” with similar parametric characteristics, such as power (dynamic, static) and frequency. They provide an intrinsic baseline for yield and quality improvements, and a ruler for native correlation.
  2. Margin Agents. These measure the margin to the operating frequency of millions of the embedding IC paths. The margin Agents provide these measurements at time zero, during lifetime acceleration tests and during operational lifetime of the ICs. These measurements provide detection and alerts of aging, reliability issues and latent defects that develop over time. They also provide a trace back to the physical phenomenon.
  3. Operational Sensing Agents. These agents can detect quality and reliability degradation external to the chip, measure of workload, expert system for pinpointing the issue.

Throughout the lifetime of the product embedded with a suite of agents, a software-based platform uses their combined outputs as input into machine learning algorithms. Correlating readouts of a full population of a specific product further provides extremely reliable predictive maintenance in autonomous vehicle, hyperscale datacenters, medical instrumentation and other sectors where reliability is of prime importance.

The paper was initially presented at the 2019 IEEE International Reliability Physics Symposium (IRPS) and is available at 10.1109/IRPS.2019.8720527.

Leave a Reply

(Note: This name will be displayed publicly)