Test requirements of AI chips that integrate multiple dies and memories on the same package.
By Rahul Singhal and Giri Podichetty
Part I of this article discusses the design-for-test (DFT) challenges of AI designs and strategies to address them at the die level. This part focuses on the test requirements of AI chips that integrate multiple dies and memories on the same package.
Many semiconductor companies are adopting chiplet-based design technique for large-scale SoCs and AI/ML accelerators. This technique enables 2.5D/3D integration of heterogenous dies from different processing nodes, called chiplets, into a single package to meet the power-performance and design development requirements. Additionally, this design style can potentially achieve higher yields by implementing a die with multiple smaller chiplets instead of one monolithic large die. One of the primary reasons for using chiplet-based 2.5D/3D designs for AI accelerators is because they need to perform parallel accessing and processing large amounts of data, which would require high-bandwidth memory transactions and high compute capability. 2.5D chiplet-based AI SoCs integrate high-bandwidth memory (HBM) stacks along with processor-chiplets side-by-side on an interposer as shown in figure 1. HBM technology involves 3D-stacking of memory chips for high bandwidth transfers from memory, and multiple processor-chiplets provide an effective alternative to a monolithic large CPU die to meeting compute and yield requirements. Some designs are also using 3D design where multiple active dies are placed on top of each other (figure 2). The communication between the die stacks in 3D designs is facilitated by through-silicon vias (TSVs).
Fig. 1: 2.5D chiplet-based design.
2.5D/3D devices present new design-for-test (DFT) challenges because their test requirements extend from the die-level to the package-level. Traditionally, a system-level test (SLT) is used to test the die and memory assembly but the integrations of these components on a single package shifts the testing of this system from SLT to manufacturing test. In addition to the DFT solution for these devices, the test quality of individual dies is also of paramount importance. This is because the yield of the final product is highly dependent on the yields of component dies and the integration of one defective die in a multi-chip package would render entire package defective. This new paradigm of 2.5D/3D chiplet-based designs requires advanced test tools such as Synopsys TestMAX that provides high-quality and complete die-level to package-level test solution. The DFT strategy for these designs can be broadly formulated by addressing two challenges:
In monolithic dies, the pins are easily accessible for delivering scan and test configuration data (test-setup, memory BIST, logic BIST and so on). With the integration of multiple chiplets and HBM stacks on the same package, many of these pins are permanently utilized for inter-die/stack communication and are not available at the package-level. This reduces the die accessibility and number of pins for test. This limitation requires accessing and testing some dies through primary dies with higher pin accessibility using an efficient test data delivery mechanism.
Figure 3 shows an example of test data delivery structure for a 2.5D package with interposer and package substrate. IEEE Std. 1838 defines a mechanism to deliver the test configuration data to the stacked dies through primary and secondary test-access-port (TAP) interfaces driven by the TAP pins on the package (shown in blue). Similarly, the data path for scan data delivery is shown in black from flexible parallel ports (FPP) to the scan networks in the dies. FPPs are optional ports defined by IEEE Std. 1838 to allow configurability to deliver scan data to dies in stack. In figure 3, FPPs can be configured to send the scan data to either Die 1 or Die 2. For a 3D designs, the test configuration data is delivered to all dies by extending the PTAP-STAP structure through the stack. For scan test, FPPs are used throughout the stack to either test or bypass the die as shown in figure 4.
Fig. 3: 2.5D design showing test configuration (blue) and scan data (black) connection between dies through interposer. Die 1 is used to access the test networks in Die 2.
Fig. 4: 3D design showing test configuration (blue) and scan data (black) connection through dies. Die 1 provides access to all the dies above it.
The integration of multiple chiplets in a in 2.5D/3D package reduces the number of pins available for test and increases the test time. In addition to the benefits to die-level scan test discussed in Part I of this article, a test-fabric such as streaming fabric is an ideal solution to this problem as it can operate at higher speeds than scan shift and deliver test data to wider parallel scan networks using few test pins. It also avoids the need of FPPs for configurability because the FPP functionality is inherent in streaming fabric. Streaming fabric can provide even higher test-bandwidth in conjunction with new technology called SiliconMAX High-Speed Access and Test (HSAT) from Synopsys that leverages high-speed functional I/Os (HSIO) such as PCIe or USB to drive scan network. With HSAT, a few functional I/Os, operating at significantly higher speeds than GPIOs, can drive a wider streaming fabric which in turn can drive a wider scan network. The same HSIO can be used to drive TAP networks to deliver test configuration data for the design. The implementation of streaming fabric and HSAT for 2.5D and 3D designs is shown in figures 5 and 6.
Fig. 5: 2.5D design showing HSAT logic driving test-fabric to transport scan data with no need of FPP. HSAT logic uses high-speed functional I/Os (PCIe or USB) to deliver both scan and test configuration data to the die.
Fig. 6: 3D design with HSAT logic driving test-fabric.
HSAT logic can be configured to either use HSIO to deliver test data or can be bypassed to use regular GPIO and TAP pins. Since HSAT uses functional HSIO for test data delivery, it enables reusing manufacturing test patterns for in-field and system-level test.
This section discusses tests needed for multi-chiplet packages from a 2.5D design perspective; the same concepts can be applied to 3D designs. With monolithic dies, the required tests are well known, but multi-chiplet packages need additional tests due to expanded design features such as interconnects between dies, different interface protocols, memory stack on package, and so on. Below are the key tests categories that are needed for chiplet-based AI design with HBM:
As mentioned earlier, achieving high yield for multi-die products depends highly on the yield of component dies, hence, individual dies require testing before being integrated together. The DFT methodology and tests needed for AI component dies are discussed in Part I of this article. The die-level testing employs hierarchical DFT and ATPG techniques to handle large AI designs with replicated processing units. The hierarchical sign-off blocks are identified and both scan and test configuration patterns are ported from the block level to the die level. The article also discussed the benefits of advanced test-fabric with sequential compression over previous techniques for scan data delivery and test configuration mechanism architecture.
Once the component dies are integrated together into a single product, their operating conditions change, which could result in transient defects. This requires testing the dies individually in the package from the package level. For scan network, the package-level test-fabric would connect to die-level test-fabric, which further connects to core-level test-fabric, and the same for TAP network. This integration of networks enables reuse of the die-level test patterns for package-level die test by porting them from die to package level. This process is similar to hierarchical test at die-level where core patterns are ported to die level.
There are mainly three different types of D2D interconnects used in chiplet-based products: (a) PHY-based high-bandwidth interconnect, (b) non-PHY-based interconnect and (c) test-related interconnect. PHY-based interconnects shown in figure 7 as High-Bandwidth Interface (HBI) are used for high-speed signals between chiplet. The logic of HBIs usually have a Build-in Self-Test (BIST) mechanism for testing which can be controlled using IEEE 1500 or IEEE 1687. All the non-PHY-based interconnects for functional paths between dies require Die Wrapper Register (DWR) which provide isolation from other dies during internal test mode and enable die-to-die interconnect testing in external test mode. The corresponding process at the core level is implementing isolation wrappers for hierarchical test. DWRs are controlled using IEEE 1500 as shown in figure 7. The functional I/Os on the die that connect to the package-level pins are tested using boundary-scan registers (BSR) driven by PTAP using IEEE 1149.1. The test-related interconnects between the dies, such as PTAP-STAP, test-fabric, or test I/Os to the package-level, are not tested explicitly.
HBMs are made up of 3D stacks of high-density DRAM memory modules with an optional base logic die. The memory modules in HBM are connected to the PHY in base logic die which communicates with the PHY on the processing die (Die 1 in figure 7). To test the HBM stacks, this path is intercepted with an HBM Test controller (HTC) which is controlled using IEEE 1500.
Fig. 7: Example network to control and enable die-to-die (D2D) and HBM test.
The test access to these interconnects is typically performed through different serial interfaces such as IEEE 1687, IEEE 1149.1, and IEEE 1500 because the interconnects may be based on different standards or use third-party IPs, etc. This makes it challenging to control their test from a single interface from the package level. An effective approach to address this issue is by using a local controller that acts as a bridge between the package interface and test access of the interconnects. Instead of driving STAP driving through PTAP over IEEE 1149.1, PTAP interfaces with local controller over IEEE 1687, which then drives STAP using IEEE 1149.1. The local controller also enables HBI and HBM test over IEEE 1500 with the appropriate test setup is initiated from 1149.1 package interface. Any IEEE 1687 compliant IP in the dies would also be operated using the local controller.
Several semiconductor companies are adopting 2.5D/3D chiplet-based designs for their AI SoCs that can deliver higher-compute and memory-access performance. In addition to the tests needed for monolithic dies, these designs have new test requirements due to the integration of multiple devices and need new test structures to enable test. IEEE Std. 1838 defines a standardized and scalable test access architecture to transport test data and enable test in multi-die and stacked-die configurations, however, operating different test logic could involve handling different serial interfaces. Advanced DFT tools are needed that provide a complete solution for 2.5D/3D designs including automated test logic implementation, control of serial interfaces, and test pattern generation. To learn more about DFT for AI architectures, attend the tutorial on the topic at the SNUG conference.
Giri Podichetty is a product marketing director at Synopsys.
Leave a Reply