Left-shifting DFT, scalable tests from manufacturing to the field, enabling system-level tests for in-field debug.
By Srikanth Venkat Raman and Sri Ganta
Today’s highly complex and large system on chip (SoC) devices and systems present many challenges to be addressed from manufacturing tests to the field while meeting stringent requirements for test costs, test quality, yield, debug, and turn-around-times. Scalable and efficient end-to-end test solutions that scale to large and complex SoC design cores – for manufacturing tests and for the entire silicon life cycle – are needed to address these challenges.
In this article we present four real world cases — innovative and successful applications — that use Synopsys Test products and solutions. [1]. The first is a shift left of DFT to help with early validation of DFT, integration with design, and faster turn-around time during implementation, with an application on IBM processors and AI accelerators. Next, we cover the application of scalable and efficient test solutions from manufacturing all the way through field deployment, with successful results from MediaTek’s mobile processors. Third, we show the application and benefits of enabling system-level tests on Google mobile processors. Finally, we cover the successful application of in-field debug and repair on AWS custom AI chips used for training on the cloud.
Modern AI and HPC chips require sophisticated test architectures because of their size and complexity. They contain many IPs, many instantiations of each IP within a SoC, and many (hundreds) of blocks within each – many of them identical but used in different ways and connected differently. As devices have become more complex, access to device cores is more difficult, with most cores not directly accessible from test ports. At the floorplan level, abutted tiles may make it hard to find room for the structures needed to implement design for test and built-in self-test (BIST) techniques [2].
Advanced nodes have new failure modes, so test engineers need to apply new fault models to ensure that the testing process covers these faults and detects most if not all the defects. This is leading to an exponential increase in test time and test data volume (TDV), driving up manufacturing costs and reducing profitability.
Many AI chips are used in safety-critical applications, where chip failure is a serious problem. A very low defective parts per million (DPPM) count is required, putting even more pressure on test time and coverage. Further, regular tests in the field are mandatory to find parts that are heading toward failure due to environmental or aging effects. The testing process and DFT hardware now span individual dies, assembled packages (including multi-die options), and on to the system level, all requiring in-field access.
Meeting these challenges requires a complete test solution. Advanced fault models such as transition delay, path delay, and cell-aware must be supported. Automatic test pattern generation (ATPG) must be able to generate tests with sufficiently high fault coverage to meet DPPM targets. Both ATPG software and the hardware test architecture must support intelligent compression and scheduling to reduce test time and minimize TDV.
For efficient testing, it is necessary to test multiple parts of the chip in parallel while respecting power limits. Since often many cores are identical, pattern reuse and broadcast testing are essential. The architecture must enable high-speed test access, generally by reusing functional interfaces, and support testing of replicated cores or replicated dies in a multi-core design. A unified test architecture using modular and hierarchical test strategies is essential.
Test strategies must span manufacturing with automatic test equipment (ATE), system-level test (SLT), and in-field fault detection and recovery to ensure long-term reliability for mission-critical AI and HPC applications. Diagnostic support must efficiently find the cause of test failures without exploding test time or volume. Finally, all requirements for safety standards such as ISO 26262 [3] for automobiles and other road vehicles must be satisfied.
Synopsys TestMAX Advisor [4] performs RTL testability analysis and optimization, allowing users to fine-tune RTL early in the design cycle to predictably meet manufacturing and in-system test coverage goals. Advisor is commonly used throughout the design implementation flow at various handoff points when a design’s content changes.

Fig. 1. Synopsys TestMAX Advisor addresses testability issues early at RTL
Synopsys TestMAX Manager extends DFT automation to the RTL implementation phase to provide a comprehensive RTL integration flow while extending into the system and field-testing domains [5]. It assists with test planning at RTL, generates RTL IP for memory test, XLBIST, codec, and on-chip clocking, and instantiates and connects these DFT IPs into the design RTL as illustrated in figure 2.

Fig. 2. Synopsys TestMAX Manager shifts left RTL integration for DFT

Fig. 3. Shift Left DFT Implementation at IBM using Synopsys TestMAX
Leveraging the Synopsys test products and solutions, IBM was able to shift left their DFT on the processors and SoCs used on their high-reliability systems, and on their enterprise-grade AI accelerator chips. This resulted in early validation of DFT, easier integration with design, and faster turn-around time during implementation as shown in figure 3.
With Synopsys TestMAX, users can enable a consistent and scalable test approach across all stages of the silicon lifecycle of their products from design through manufacturing test, to system-level test, and finally to in-field test. There are three key components to enable a scalable test framework, including a unified test compression architecture (DFTMAX SEQ) for both ATPG pattern compression and X-Tolerant Logic Built-In Self-Test for robust self-testing even in the presence of unknown values, a Streaming Fabric (SF) designed for high throughput testing by transporting large amounts of test data quickly and efficiently and, HSAT (High Speed Access & Test) to support fast and flexible access for wafer, system-level, and in-field test application through high-speed IO interfaces like PCIe and USB.
To meet DPPM targets, advanced fault models such as transition delay, path delay, and cell-aware must be supported with sufficiently high fault coverage. Both ATPG software and the hardware test architecture must support intelligent compression and scheduling to reduce test time and minimize test data volume. The Scan compression architecture is critical for test cost savings as it directly impacts both the test time and the test data volume. Additionally, the compression architecture is used not just for manufacturing tests but also for burn-in, in-system, and in-field testing applications for mission critical devices.
The Synopsys test solution is based on a Unified Compression Architecture (DFTMAX SEQ) as shown in figure 4. The Unified Compressor supports both compression of ATPG patterns and the application of Logic BIST. This infrastructure spans all test phases of the silicon lifecycle: manufacturing, burn-in (BI), SLT, and in-field/in-system test. The Unified Compression Architecture is a scalable and flexible solution designed to accelerate test throughput, reduce test data volume, and enable efficient diagnosis. The sequential (DFTMAX SEQ) compression/decompression infrastructure seamlessly integrates into various test environments, eliminating the need for environment-specific customization. Synopsys provides reusable IP blocks to embed compression logic in the design and reuse it across all test stages.
For efficient testing and test time savings, it is necessary to test multiple parts (blocks) of the chip in parallel while respecting power limits. Since many cores may be identical, pattern reuse and broadcast testing are essential. The architecture must enable high-speed test access, generally by reusing functional interfaces, and support testing of replicated cores or replicated dies in a multi-core design. A unified test architecture using modular and hierarchical test strategies is essential.
Synopsys Streaming Fabric (SF) [6], is a foundational infrastructure designed to address the increasing complexity and scale of testing in AI and HPC SoCs. It enables efficient test data delivery, minimizes bottlenecks, and supports advanced test methodologies across all test phases. Pattern porting retargets core-level pattern sets to this system-level access architecture.
Test strategies must span manufacturing with automatic test equipment (ATE), SLT, and in-field fault detection and recovery to ensure long-term reliability for mission-critical AI and HPC applications. Diagnostic support must efficiently find the cause of test failures without exploding test time or volume. Finally, all requirements for safety standards such as ISO 26262 [3] for automobiles and other road vehicles must be satisfied.
Synopsys TestMAX ALE and High-Speed Access & Test (HSAT) IP form the solution that leverages high-speed interfaces such as USB and PCIe® that exist on many semiconductor devices. The combined solution provides the ability to apply manufacturing tests through these functional interfaces, enabling an extremely high test-bandwidth compared to traditional test interfaces, while often reducing the number of dedicated test pins. Additionally, all tests are portable to the device, from die testing to in-system testing and at all stages of the product life cycle.

Fig 4. Synopsys Scalable End-to-End DFT Solution
Figure 5 shows the scalable test architecture implemented by MediaTek on their mobile SoCs leveraging components of the Synopsys Test solution described above. The SEQ modules provide test compression, reduce test data volume, and shorten the overall test time. The Streaming Fabric delivers test data to multiple subsystems at the same time, making it possible to test many cores or blocks in parallel instead of sequentially.
The architecture really simplifies DFT planning and implementation. With standard interfaces, assigning codec pins is straightforward, and easily scales to support many cores. Since test resources and I/Os are limited, shared inputs and minimal outputs increase test bandwidth and reduce test time. The combination of Streaming Fabric and SEQ enable efficient and scalable testing for complex CPU subsystems with lots of embedded blocks. HSAT, which acts as the high-speed access interface that connects external hosts or testers directly to the internal test infrastructure, enables executing the ATPG patterns through the USB interface to the die. Results from this application show a 60% reduction in test costs relative to a prior baseline.

Fig 5. MediaTek Scalable Test Architecture Leveraging Synopsys Test Solutions
Mobile SoC designs have high complexity, integrating hundreds of IPs – including multi-cluster compute cores (CPU/TPU/GPU), multimedia IP, HSIOs, and security engines – across 2000+ clock and 100+ power domains, supporting 20+ DVFS (dynamic voltage and frequency scaling) modes & AVS (adaptive voltage scaling) technology. Achieving low DPPM test-quality for advanced process nodes requires robust structural testing. Coverage requirements are typically >99.5% stuck-at and >95% at-speed transition delay fault coverage. This requires the deployment of advanced fault models and custom memory algorithms. Most advanced fault models including bridging, hold time, small delay, and cell-aware are typically applied. Custom memory algorithms including retention, coupling, read/write disturb, and multi-port are typically required. This drives an exponential increase in test-pattern content, directly inflating both test time and test cost. These requirements drive the need for system-level tests to complement and supplement tests applied on a tester (ATE) during manufacturing testing. Figure 6 shows the successful implementation of Google’s test architecture for both ATE and SLT leveraging Synopsys Test solutions.

Fig 6. Google Leveraging Synopsys Test Solutions for System-level Tests
Synopsys SMS IP [7] is a comprehensive, integrated test, repair and diagnostic solution that supports repairable or non-repairable embedded memories across any foundry or process node. Silicon-proven on billions of chips on a range of process nodes. Synopsys SMS IP is a cost-effective solution for improving test quality and repair of manufacturing defects found in advanced processes. In-field tests are important to address the issues of silent data corruption (SDC), silicon aging or degradation, and potentially infant mortality if missed by earlier test steps. Additionally, in-field debug and repair have become important for AI and HPC systems with thousands of accelerators working in tandem over weeks and months for training large AI models like LLMs. AWS adopted the Synopsys SMS solution to test memories for their in-field benefits in addition to benefits for manufacturing tests. Figure 7 shows its implementation in the AWS ML SoCs for in-field tests. In-field test access and control is enabled through PCIe. The capability was used to debug SRAM fails in the field. ECC errors were seen during execution of certain workloads with certain columns failing. SMS was used to root-cause these failures to a metal overhand causing slow discharge of a sense amp node. Higher Vmin was required for reliable operation on those devices. After swapping columns by forcing a soft repair in the field, the memory was retested to confirm a lower Vmin, and no ECC errors were seen on the workload after the fix.

Fig 7. AWS Leveraging Synopsys Memory Test Solutions for In-Field Application
—Sri Ganta is a principal product manager at Synopsys for the Test line of business.
References:
Leave a Reply