Monitoring IC Abnormalities Before Failures

Deep and widespread dedicated circuitry for monitoring internal states supports deeper analytic insights for engineers


The rising complexities of semiconductor processes and design are driving an increasing use of on-chip monitors to support data analytics from an IC’s birth through its end of life — no matter how long that projected lifespan.

Engineers have long used on-chip circuitry to assist with manufacturing test, silicon debug and failure analysis. Providing visibility and controllability of internal circuitry has supported economical testing, effective silicon debug, and the ability to diagnose problems in complex silicon devices. On-chip monitors enable specific parametric measurements. With each generation of CMOS process technology have come new ways for on-chip monitors to be used for optimizing performance, identifying thermal variability, and understanding process variation.

What’s new is the increased deployment of on-chip monitors to provide greater granularity for these parametric measurements throughout an IC device. Applying machine learning algorithms to this internal parametric data, along with other silicon data being collected, gives engineers deeper insights into the manufacturing process and in-field behavior. That, in turn, can be used to reduce defectivity in future generations of chips, and to improve the reliability of existing and new chips.

There is nearly universal agreement that visibility inside the chip can be helpful with the manufacturing process. Now, the semiconductor industry is becoming more proactive in using this data throughout the life of a product.

“The traditional ATE can be used to collect the data,” said Doug Elder, vice president and general manager of the Semiconductor Business Unit at OptimalPlus, an NI Company. “The sophistication of those sensors is getting better at the kind of information they can collect. In our talks with design engineers/organizations our discussions have focused on how to collect data.”

Semiconductor data analytics companies have become more involved with on-chip monitors, not just analyzing the large volume of data that is possible with such circuitry. Designing circuitry specific to an IC design with the end goal of analytics requires a more deliberate methodology.

“We view our technology as a different approach. It’s not a sensor company with software. It’s an analytics company based on high-coverage chip telemetry,” said Tamar Naishlos, director of marketing at proteanTecs. “So it’s really all about the deep data and the comprehensive picture that stems from many Agents, i.e., ‘feet on the ground.’ But they have to feed into the machine learning algorithms so that we can interpret everything in the best possible way.”

Companies that have focused on IP blocks that monitor clock jitter, voltage and temperature recognize their customers can use monitor data beyond its intended design optimization applications.

“We are very interested in this analytics enablement of making the data available not just to directly to optimize the individual chip. There’s also value in looking at the data from larger populations of chips,” said Richard McPartland, marketing director at Moortec. “You can draw parallels with health care. Rather than just looking at an individual patient, you look at the population and identify a particular illness or health problem. In the same way, on-chip monitoring can be applied. Make this data available and extract actionable information from large populations as well as just looking at the individual die. Our on-chip sensors are essential to provide visibility of all the conditions on the inside. And that has huge value.”

The added data from on-chip monitors cuts across all steps of the manufacturing process that lead up to a shipped product. The insights from this data enable a more sophisticated analysis that engineers can use to meet yield, quality, and reliability goals.

Monitor capabilities
Before engineers analyze data from on-chip monitors they need to understand the metrology needs for analytics and the metrology capabilities of the circuitry. A variety of on-chip monitors are used in today’s ICs.

“The most common on-chip sensor will be PVT monitors – process, voltage and temperature sensors. You can place a number of these tiny sensors throughout the SoC,” said Tom Wong, director for marketing design IP at Cadence. “Simple ring oscillators can tell you the process variation over time because you can measure frequency degradation over time. It gives you an indication of the health of the chip over the number of hours/days/months/years of usage.”

Ring oscillator circuitry has been the workhorse for process monitoring. “Ring-oscillators have been around several decades. Scribe line structures used at wafer acceptance test provided an assessment of wafer process control and general process conditions,” said Andrzej Strojwas, chief technologist at PDF Solutions. “To get a more specific understanding of an IC/chips behavior with respect to process, then you look at ring oscillator as a process monitor for a chip.”

By the mid 2000s, manufacturers like Intel, TSMC and Samsung relied upon a network of ring oscillators spread throughout the die to better comprehend process variation within an IC. By applying analytics with ring oscillator data, insights can be gained into process variation experienced across a wafer as well as throughout the die. Over the past 15 years, engineers have designed more sophisticated ring oscillators. In a 2011 IEEE Transactions on Electron Devices paper, Intel engineers described in detail their use of on-chip circuitry to measure process technology variation. The authors explained their use of ring-oscillators and the modifications made to track transistor properties like VT.

For microprocessors, monitoring temperature with on-chip thermal sensors became necessary at the 0.18 µm CMOS process node. With new-product introductions, validating hot-spots in the silicon has fostered including more thermal monitors throughout the die. For reliability purposes, engineer have an interest in monitoring temperature in the field.

A circuit with a history of temperatures higher than normal can predict a reliability field failure. Craig Hillman, director of product management for new and emerging technologies at Ansys, said that both thermal and electrical conditions impact electromigration. “One of the challenges with electromigration is that accurate analysis and prediction requires that you hit a steady state condition, because as I increase my current density, temperature goes up. As temperature goes up, resistance goes up, which means temperature goes up. This synergistic process accelerates electromigration.”

But that also can vary based upon use cases and chip layout, and sensors need to be able to account for all of these variations. “Our temperature sensors are offered with a range of accuracies,” said Moortec’s McPartland. “Many of our customers like the convenience of our sensors uncalibrated, as this offers reasonable accuracy with minimal test overhead. For customers looking for higher accuracies we provide convenient calibration schemes. Calibration offers higher accuracy that can be a good option if you need to throttle back clock frequency if the on-chip temperature exceeds a particular threshold level. By contrast higher levels of calibration offers good accuracy over a wider range of temperatures but can sometimes come with additional overheads in production test. At Moortec we have a very good understanding of these trade-offs and can help to implement the most appropriate solution for the application.”

Fig. 1: Sensor management hub. Source: Moortec

Signal and clock timing relationship in terms of margin represent a parameter of interest to digital circuit designers, and this also can feed into analytic algorithms.

There are two sides to understanding this margin — comprehending a clock’s duty cycle and jitter properties, and comprehending the path delay of combinational circuitry between two clocked storage elements.

“Certainly on-chip monitoring is a critical requirement for SoCs targeting high reliability applications” said Faisal Goriawalla, Synopsys senior staff product marketing manager. “These could be monitoring the duty cycle of a critical PLL or a set of PLLs on chip. As part of our design STAR hierarchical system, the measurement unit integrates some of the clock and process monitoring capability. It’s a lightweight digital IP core that can be easily integrated by SoC designers for the chip, once, twice, or hundreds of times.”

The more monitors, the better the coverage. “We use Margin Agents that measure the margin to the frequency of the design itself,” said Evelyn Landman, CTO and co-founder, proteanTecs. “It is not a separate circuit standing on the side. It’s really measuring the margin of millions of paths of the design in parallel while the chip is working.”

These monitors have applications during post-silicon design characterization, production test and in-field usage.

Monitor integration and design
On-chip monitors and access to their data needs to be architected into an IC device, especially if engineers use a multitude of monitors throughout the IC for spatial granularity and parametric diversity.

First, to facilitate data collection, the monitor circuit often transforms the parameter of interest into a digital readout. This enables engineers to use available test access ports to permit analysis during the manufacturing process, new product introduction, and in-field assessment.

Second, to access a distributed network of monitor circuits, the infrastructure circuitry has to work within the constraints of the chip design. For instance, with a large set of ring oscillators, designers need to enable activation and selection of each ring oscillator to connect it to the external die pin for measurement. In addition, they may use frequency dividers to make it readable via a lower speed interface.

With multiple on-chip monitors, having a well-integrated controller for those monitors provides easy adoption.

“We deliver a subsystem to digital design teams so they can easily integrate our monitors,” said Stephen Crosher, CEO of Moortec. “At the very heart of it is the precision analog circuitry. Then there is the subsystem around it, with a common standard interface for designers to connect into their architectures. Having deployed our monitoring subsystem over several years across a range of technology nodes, we have developed different configurations of the subsystem, depending on the application.”

Crosher noted that, depending upon the industry sector and size of the device, the type and number of subsystem instances varies. “The nature of sensor placement is critical, however — especially in automotive, where you’re concerned about particular supply levels across the chip or concerned about areas where you may have hot spots.”

The quality of the data feeding any analytic framework is important. That quality can be strained if the on-chip monitors lack the design robustness required to deliver reliable metrology. When using on-chip monitors to assess process variation and degradation over the life of a product, circuit designers need to assure their monitors are not impacted by semiconductor physics and manufacturing variation.

A number of design practices are used to obviate these factors. “Our design teams work very hard to develop sensors to a spec that already compensates for process variability,” said Crosher. “Using our evolved test suites, we conduct extensive corner and Monte Carlo simulation work, which also includes running extensive aging simulations. In providing embedded monitors, it is important that we seek to ensure that our circuits are unlikely to ‘fail first.’ As a design strategy, we adopt circuit structures and topologies that have a reduced exposure to stress effects and NBTI, which may otherwise manifest over long periods of time. We don’t use circuit approaches that can be sensitive to noise.”

For aging, design engineers can use duplicate circuits that experience different activity levels than the on-chip monitor circuit itself may experience. “It depends on the different Agents,” said Landman. “For example, we can measure for the presence of NBTI. With side measurements we can measure the two extreme conditions, see how much they will age, and see the difference. Then we can see the effect of aging on the Agent itself and take this into account.”

Monitor analytics during manufacturing
Analytic companies historically have used electronic test data to assist their customers in making decisions to support the quality and yield objectives. On-chip monitors provide more data, but they also provide data that better informs the machine learning algorithms used.

“The on-chip sensors are being used more and more. You primarily place a sensor on the device to collect parametric data, which can be used for performance verification and thereby assist with post-silicon design validation (for new product introduction),” said OptimalPlus’ Elder. “Such data also can be used during production test. Naturally, there are tradeoffs in silicon real estate and the test time to get it off the chip.”

Analytics based upon the spatially placed and design-aware monitors enables a level of refinement that has not been available before.

“We classify the chips into families,” said Landman. “Our approach is based upon the Agent readouts, and we have many dimensions that we are extracting from the process. By applying advanced analytics, we can say families will behave remarkably similar for many parameters at about 1-sigma. So it’s a much more resolution than doing it per wafer. The wafer distribution is too wide to find outliers and the families are much tighter. You can find outliers significantly better and and reduce DPPM by 10X without impacting the yield.”

A new development in the on-chip monitor analytic space is understanding process variability and detecting defects prior to wafer probe test.

Taking advantage of the filler cells used at advanced process nodes to manage yield, PDF Solutions has developed technology called Design-For-Inspection (DFI) to identify process issues that can impact product quality and reliability. Using layout analysis, the filler cells can mimic nearby circuitry’s layout. These are probed using a proprietary e-beam tester at Metal0.

“DFI filler cells are designed to catch tiny leakages, as well as shorts and opens,” said Dennis Ciplickas, vice president of advanced solutions at PDF. “DFI filler cells are tiny.”

Leveraging filler cells for data collection enables a very high spatial resolution. “There are billions of DFI cells per wafer,” Ciplickas said. “A large chip, such as a 250 die/wafer might have 30 million DFI fillers per die, or more than 7 billion DFI fillers per wafer.”

Combined with wafer and unit test data, this extremely deep defect detection data provides a powerful tool to identify yield and reliability issues. “Based upon the similarity we and our clients have observed between the Pass/Fail results from failing DFI filler cells and P/F results from burn-in, HTOL and RMA samples, the signals captured by DFI filler cells indicate risk of field failure,” Ciplickas noted

Product reliability and on-chip monitors
The increasing demands for part longevity in multiple industry sectors — IoT, automotive, telecomm and data centers –— have motivated silicon device companies, on-chip monitor IP providers and data analytic companies to increase their efforts to characterize reliability, to screen out latent defects and to anticipate field failures.

Assessing temperature during a product’s lifetime enables ongoing assessment of aging and reliability risk. “First and foremost, customers value our temperature sensors for reliability purposes as high temperatures play a key role in reliability and aging, with most aging processes/reliability issues being accelerated at high temperatures especially electromigration, NBTI and PBTI (negative and positive bias temperature instability) and TDDB (time-dependent dielectric breakdown),” said McPartland. “Moortec also has process monitors, which are designed to enable the amount of aging to be measured comparing aged and non-aged circuits. These can be used for predictive and adaptive maintenance to either swap out parts in a planned timely manner or adapt supply voltages to maintain performance.”

Added Landman: “During qualification we help engineers to see not only pass/fail issues, but also marginality or degradation that is going on inside the chip. Engineers may use HTOL or burn-in to stress a part. The analytics platform provides degradation monitoring by looking at the Margin Agent readouts.”

The ability to collect this same data during actual usage enables an analytic platform to compare data at time 0 and time N. Yet to fully understand in-system usage, engineers need to know more than just that the timing margin changed. As Landman noted, they have multiple on-chip monitors that look at environmental conditions that enable engineers to understand why something changed.

With deliberate placement of on-chip monitors throughout a design, engineers can foster deeper insights into what happens on silicon and hence, provide guidance on how to react to observed manufacturing variance. Specifically, the data informs feedback on design and process improvement and guides feed-forward decisions for manufacturing test and for potential reliability failures.

Both circuit monitor providers and data analytic companies recognize the potential for powerful insights from on-chip data.

“Once sensor fabrics are integrated within a chip design, you are then able to collect data, not only across an entire product range but also for each individual device produced. By enabling embedded monitoring, we can be in every step of the chip’s lifecycle. This presents a powerful, new opportunity to make assessments of the chip during manufacture, test, packaging, and then into mission mode, leading eventually towards the end of life,” noted Crosher. “This really excites us, as we’re providing designers and the industry with valuable insights from within the chip, generating meaningful data that can be used for product and system lifecycle analytics.”

Ciplickas concurred. “On-chip performance and reliability sensors bring this systematic observability into the die itself and enable ongoing observability throughout a chip’s lifetime.”

Tying in analytics to on-chip monitors results in extracting specific information that targets the engineering actions that the analytical platform can support.

Related Stories

Aging Problems At 5nm And Below

BiST Vs. In-Circuit Sensors

Sensing Automotive IC Failures

Reliability Challenges Grow For 5/3nm

Automakers Changing Tactics On Reliability

Leave a Reply

(Note: This name will be displayed publicly)