Ensuring Memory Reliability Throughout the Silicon Lifecycle

How to identify and mitigate potential failures prior to tapeout.


By Anand Thiruvengadam and Guy Cortez

Memories are everywhere in modern electronics. Discrete memory chips account for much of the space on printed circuit boards (PCBs). Embedded memories consume much of the floorplan in system-on-chip (SoC) devices. Many multi-die chip configurations, including 2.5D/3DIC devices, are driven by the need for faster memory access. Designing and verifying memories is a major portion of many projects.

Safety-critical applications such as autonomous vehicles, space-borne systems, implanted medical devices, and nuclear power plants are no exception. The ICs powering these applications contain a lot of memory, and the memory technology used must meet the same high reliability and functional safety standards as the rest of the electronics. Memory development is neither full digital design nor fully hand-crafted analog circuitry. It has its own challenges and its own solutions.

The context for memory reliability is important, as well. Today’s electronic systems demand high memory bandwidth, fast throughput, and low latency. Further, generic memory devices are giving way to application-specific chips with tight requirements on power, performance, and area (PPA). Memories are moving into a hyper-convergent space, consisting of multiple technologies, protocols, and architectures in one highly complex design.

A recent blog post discussed the increasing “digitization” in memory development, using digital electronic design automation (EDA) tools for design components on the periphery of the core array. Many of the functional safety techniques developed for digital logic can be adopted for memories as well. These techniques satisfy the requirements of safety standards such as ISO 26262 for road vehicles.

Although the terms “safety” and “reliability” are sometimes used interchangeably, the overlap is only partial. Functional safety requires building in safety mechanisms to detect faults in electronic devices and respond appropriately, as well as calculating that this detection and response produce a high degree of fault coverage. Reliability demands that the chances of a fault occurring be reduced as much as possible in the silicon design and manufacturing.

Both safety and reliability must span the entire silicon lifecycle, from design and verification through lab bring-up all the way to production use in the field. In the case of memory designs, the early and late stages of the lifecycle present the greatest challenges for reliability. Early chip failures (sometimes called infant mortality) shake out marginal devices, followed by a period (perhaps years) of low-risk operation. As silicon aging effects start to kick in, reliability goes down and failures become more common.

The memory development process must include robust static and dynamic analyses to identify and mitigate potential failures across the silicon lifecycle before tapeout:

  • Early life
    • Static analog and digital circuit checks
    • Analog fault simulation
  • Normal life
    • High-sigma Monte Carlo analysis
    • Static power/signal net resistance checks
  • End of life
    • Dynamic electromigration/IR drop (EMIR) analysis
    • Silicon aging analysis

Synopsys PrimeWave Reliability Environment delivers a unified workflow around all the reliability analysis technologies of Synopsys PrimeSim Reliability Analysis and the engines of Synopsys PrimeSim Continuum to improve productivity and ease of use. The process starts with Synopsys PrimeSim CCK, which extends traditional electrical rules checking (ERC) into the analog domain.

PrimeSim Custom Fault complements digital fault simulation to make functional safety and test coverage analysis practical for complete chips. It satisfies even the demanding requirements of ISO 26262 and other safety standards for complex and comprehensive failure modes, effects, and diagnostic analysis (FMEDA).

Synopsys PrimeSim AVA provides high-sigma (typically 4-7) Monte Carlo analysis. It uses machine learning (ML) techniques to run more efficiently while delivering accuracy to within 1% of Synopsys PrimeSim HSPICE circuit simulator. ML reduces the number of runs by orders of magnitude over the traditional brute-force Monte Carlo simulation approach.

Power/ground integrity analysis is provided by Synopsys PrimeSim SPRES, which is fast enough to run early in the memory development process. Similarly, Synopsys PrimeSim EMIR provides both high performance and foundry-certified signoff accuracy. This analysis covers the power distribution network (PDN) as well as the signals in the memory design. If issues are uncovered, what-if analysis and debug hints make it easier to find and fix the source of potential faults as the silicon ages.

Synopsys PrimeSim MOSRA checks for reliability risks due to silicon aging effects. It also offers high performance with foundry-certified accuracy. When combined with the other PrimeSim Reliability Analysis technologies within the Synopsys PrimeWave Reliability Environment, memory designers can be sure that their chips will be functionally safe and reliable throughout a long and productive silicon lifecycle.

Lastly, the memory reliability solution is complementary to the Synopsys integrated Silicon Lifecycle Management (SLM) family of products, which improves silicon reliability and performance at every phase of the device lifecycle:

  • In-Design phase: embed in-chip monitors to provide insight about the dynamic environmental conditions on the chip in later phases; the reliability analysis occurs in parallel with this phase
  • In-Ramp phase: focus on achieving an acceptable yield before mass production by identifying systematic failures within the silicon
  • In-Production phase: continuously monitor and analyze all related test manufacturing data to maintain high yield and reliability
  • In-Field phase: monitor aging and health of the device in real time while it is in use and extend its life by optimizing performance and power consumption.

Reliability for the entire silicon lifecycle is key to many of the demanding applications driving memory design. Only a full spectrum of analysis, linked with a unified flow and certified by foundries, can provide the level of certainty stipulated by industry standards, demanded by customers, and needed by end users.

Guy Cortez is a senior staff product manager at Synopsys.

Leave a Reply

(Note: This name will be displayed publicly)