Ensuring Multi-Die Package Quality And Reliability

When one problematic die can cause the entire system to fail, the quality of each die and the integrity of the interconnect is critical.

popularity

Multi-die designs are gaining broader adoption in a wide variety of end applications, including high-performance computing, artificial intelligence (AI), automotive, and mobile.

Despite clear advantages, there are new challenges that need to be addressed for successful multi-die realization. This article gives a high-level overview of the multi-die test challenges that go beyond the design phase, covering both manufacturing and deployment in the field. Support for multi-die test, monitor, and repair must span the entire scope of the silicon lifecycle. Methods and standards must provide access for monitoring resources as well as test and repair of both dies and assemblies. Analytics are required at every stage of the silicon lifecycle: in-design, in-ramp, in-production, and in-field. The goal is to ensure multi-die quality, reliability, and yield.

A comprehensive multi-die test solution

Only one failed die in a multi-die configuration can cause the entire system to fail. Thus, the quality of each die and the integrity of the interconnect is critical. Experiencing late-stage failures can be catastrophic if not resolved quickly. Any effective solution mandates the use of a co-design platform, backed by several techniques and established standards. Figure 1 shows the key elements of a comprehensive multi-die test solution.

Fig. 1: Elements of a multi-die test solution.

For manufacturing, all aspects of intra-die, inter-die, and package-level testing can be accomplished during the pre-bond, mid-bond, and post-bonding stages. Synopsys offers a DFx solution that includes automated design for test (DFT) insertion with die-to-die and stack level access, die-to-die interconnect pattern generation with identification of faults, pattern porting for die-to-die and stack level, multi-die diagnosis and traceability as well as monitoring in-system for purposes such as predictive maintenance.

Solutions for IEEE1838 and Lane Test and Repair (LTR)

The IEEE 1838 Standard for Test Access Architecture for Three-Dimensional Stacked Integrated Circuits provides the key DFT access architecture. This standard supports test of both individual dies and die-to-die interconnects. It is intended for low speed/low volume lanes. IEEE 1838 is a die-centric standard, applying to a die that is intended to be part of a multi-die design and defining die-level features. When compliant dies are combined into a stacked configuration, these features form a stack-level architecture for test of both intra-die circuitry and inter-die connections. It supports individual dies, partial stacks, and complete stacks, thus spanning pre-packaging, post-packaging, and board-level stages. Interconnects supported include through-silicon vias (TSVs), wire bonding, and other technologies.

Lane test and repair (LTR) is required to prevent system failure due to faulty (high volume, high data rate) die-to-die lanes, which could otherwise occur due to coupling faults (as an example). To achieve high test coverage, it is important to detect these types of faults at speed. Figure 2 shows an example of this technique. A faulty lane has been detected by die-to-die test, so signals are shifted from the faulty lane to a spare lane. The same lane shift is implemented for the dies on both sides of the interconnect. The faulty lane output I/O is placed into standby mode.

Fig. 2: Faulty lane remapping with LTR.

External DRAM test and UCIe monitoring and test

The architecture defined in the previous section supports die-to-die test, including logic-to-logic and logic-to-memory. Dies are tested individually before packaging, external memory, and interconnects are tested, and the TAP controllers are used to run tests after packaging. However, there are two more key technologies needed for a complete multi-die test solution.

The HBM standard defines an interface for 3D-stacked synchronous dynamic random-access memory (DRAM) dies. It specifies the PHY-level logic-to-memory interconnection. The UCIe is an increasingly widely adopted PHY standard for die-to-die connectivity. Despite its popularity, UCIe presents challenges for effective SLM. Traditional probing is challenging for the bump pitch of current and emerging packages, so a built-in approach is required. Any solution cannot mandate additional DFT signals beyond the mainband and sideband connections defined by the UCIe standard. The solution must provide high interconnect defect/fault coverage, loopback test, incremental multi-corner test in manufacturing, and in-field detection of aging/degradation effects.

Summary

The growing importance of multi-die designs highlights the challenges associated with monitoring, testing, and repair of packaged parts. Traditional probe-based methods cannot meet the demands. A solution must not only meet these challenges but also span the full silicon lifecycle. Required DFx features must include test and repair of different types of die-to-die interfaces, perform lane test and repair, test for and diagnose known-good stack and known-good die, support extensive BIST capabilities, and offer in-field interconnect monitoring for purposes such as predictive maintenance. Synopsys provides a comprehensive and scalable solution for multi-die monitor, test, and repair.

Read the full version of the Synopsys white paper, Effective Monitoring, Test, and Repair of Multi-Die Designs.



Leave a Reply


(Note: This name will be displayed publicly)