Rising complexity and heterogeneous integration are reshaping test methodologies and fault coverage, but challenges persist.
Experts at the Table: Semiconductor Engineering sat down to discuss the rapidly changing landscape of design for testability (DFT), focusing on the impact of advancements in fault models, high-speed interfaces, and lifecycle data analytics, with Jeorge Hurtarte, senior director of product marketing in the Semiconductor Test Group at Teradyne; Sri Ganta, director of test products at Synopsys; Dave Armstrong, principal test strategist at Advantest; and Lee Harrison, director of Tessent automotive IC solutions at Siemens EDA. What follows are excerpts of that conversation.
L-R: Teradyne’s Hurtarte; Synopsys’ Ganta; Advantest’s Armstrong; Siemens EDA’s Harrison.
SE: How do you see DFT evolving with the increasing complexity of designs, especially in high-frequency mixed-signal devices and other new or complex technologies?
Hurtarte: The key trend we’re seeing is heterogeneous integration, particularly with chiplets. These can include digital, analog, or mixed-signal technologies, and the whole idea is to take the optimal solution for each application and co-package them using heterogeneous integration, whether it’s 2.5D or 3D. When we have this scenario, we have to consider how to test these chiplets. That means testing them at the singulated die level or at the wafer level, but also once they’re integrated into a package. Clearly, when you’re combining different technologies in a single advanced package, DFT approaches need to be rethought. Fortunately, the industry has been working on some standards over the years. Both Synopsys and Siemens have taken leadership roles in this area, with standards like IEEE 1838 for advanced packages. The main point here is that the trend toward integrating chiplets with different technologies and node types — aiming to optimize the overall solution — is where the complexity lies.
Ganta: I’d like to add some points from the DFT side, especially on the front end. The complexity of today’s designs — whether it’s AI-based or high-performance computing — is growing significantly, and this drives the need for structural and architectural changes early on in the process. This is primarily to address issues like test cost and test data volume. For instance, you need a cost-effective compression approach that works with lower test data volumes, but which still achieves high coverage. Within SoC design implementation, you also need a fabric or network that can generate and deliver test content efficiently to individual cores. How do you optimize that? That’s another challenge. That’s why we need structural changes to deliver this content from the testers to the chip, where you need a very high throughput interface. We’ve been addressing this with our sequential compression tool based on PRPG (pseudo-random pattern generator) and MISR (multiple input signature register), which helps achieve lower test data volumes while maintaining high coverage. Then we have a test fabric that delivers test data to multiple cores. This allows for parallel testing, which optimizes bandwidth and reduces test time. For external interfaces, leveraging existing functional interfaces like PCIe or USB for structural testing is critical.
Armstrong: There’s no question that DFT is evolving, but there’s still a lot of work to be done. For instance, we’re seeing test times going through the roof, partly because I don’t think our current DFT is catching everything. Professor Sean Blanton published a paper analyzing the root causes of various defects, and he found that a lot of them weren’t even detected by the DFT in place. That points to a need for better fault models. High-speed I/O is another area where we need improvement. High-speed I/O operates like a digital interface transmitting analog signals, and we need more effective ways of tracking those mixed signals and testing them effectively if we’re going to make progress at speeds like 224 gigabits per second, for instance. Another point I’d highlight is functional testing. It’s becoming a fallback for the industry to catch test escapes, but the DFT for functional tests is still evolving. We need better fault models and methods to analyze what comes out of these tests. So, in short, there’s still a lot of ground to cover.
Harrison: We introduced our SSN (Streaming Scan Network) product back in 2020. It laid the foundation for test pattern delivery and optimization, and at the time, it was a significant step forward for DFT. Since then, the requirements have only grown. We are adding capabilities for high-speed I/O support, including interfaces like USB and PCIe, as well as enhancements to pattern delivery within the core infrastructure. Another area we’re focused on is addressing silent data errors. There are still faults that our DFT might miss. We’re working on functional fault grading and developing more advanced fault models to capture those errors. The next big direction is applying structural test patterns in functional environments to catch those faults that we missed at the ATE level. Using functional monitors and silicon lifecycle management, and the ability to deliver the patterns in system, are vital for success. For example, we recently launched an in-system test product at ITC that allows manufacturing test patterns to be applied in a functional environment. Things are definitely evolving at an incredible pace, and it’s a really exciting time to be in this field.
SE: What are the challenges for DFT, especially in achieving signal integrity and noise isolation in high-density devices where sensitivity is critical?
Ganta: Signal integrity and noise isolation are concerns both in monolithic devices and in multi-die systems. But they’re becoming even more pronounced with heterogeneous integration, where multiple dies with different functionalities are packaged together. These challenges are especially evident with high-speed and high-density interfaces. Traditional fault models like stuck-at and transition delay faults are no longer enough for detecting defects related to signal integrity or crosstalk. To tackle this, we have advanced fault models targeting specific challenges, such as bridge faults and slack-based faults. For inter-die interfaces, where you have high-speed or high-density connections across dies, we use SLM. For example, our UCIe monitor and test repair IP includes signal integrity testing capabilities for high-speed interfaces like UCIe and PHY. Additionally, for high-volume applications, we have a product called LTR (Lane Test and Repair). It’s similar to how redundant memory lanes are handled, allowing us to test and repair faulty lanes instead of discarding an entire chip or package. This redundancy is critical for maintaining signal integrity and reducing waste.
Armstrong: There are two big challenges that come to mind. First, the vias — the connection points between the dies, especially for 3D integration — are becoming incredibly small. Treating these vias as simple wires is no longer realistic. We need to view them as resistive interfaces, potentially resistors, which can change over time. That creates multiple challenges for developing DFT strategies to identify potential issues both up front and over the device’s lifetime. The second challenge is high-speed I/O. For example, Ethernet at speeds like 224 gigabits per second requires connectors and sockets with bandwidths up to 100 GHz. That’s not an easy feat. When it comes to noise for these large parts, much of it stems from the power distribution network. Clean power supplies and effective voltage management are critical to keeping noise levels low while ensuring the device performs reliably. It’s all about managing voltage droop and recovery times to maintain signal integrity while keeping the system stable.
Harrison: With high-frequency devices and signal integrity, fault models play a crucial role. You want to exercise the device as close to its functional operation as possible. Advanced fault models are key here because traditional static models, like stuck-at faults, may not detect issues. For example, if the surrounding logic is static, a fault might go undetected. But if you toggle the logic into different scenarios, you might uncover a fault you hadn’t detected before. Another example of a silent data error is when a device behaves differently depending on its environment. For instance, when the device is tested on an ATE with clean, stable power supplies, it may exhibit a different profile compared to when it’s operating in-system, where conditions are less controlled. In the past, these types of issues were less common, but now they’re becoming more prevalent. This is especially true with the high-speed nature of today’s devices. For example, with our streaming scan network, we’re seeing customers running tests at extremely high frequencies, alongside their functional operating frequencies. All of this high-speed logic introduces additional challenges to the DFT process, making it more complex and demanding than ever before.
Hurtarte: We can look at it from a couple of other angles, too. With heterogeneous integration, you typically have an interposer, usually a silicon interposer, which acts as either an interconnect or uses through-silicon vias (TSVs). Ensuring that these TSVs are functional is critical. We often focus on the die itself, but we can’t ignore these intermediate layers. The question is, ‘What can we do from a DFT perspective to ensure the integrity of these TSV structures?’ For example, could we embed some intelligence into the TSVs, essentially applying DFT principles to them? Right now TSVs are ‘dumb’ interconnects, but adding intelligence could help ensure these structures are robust for both DC and AC testing. That could play a big role in improving both signal integrity and noise isolation. So we need to think beyond the die to include these intermediate contributors to noise and signal integrity issues. The other angle I’m curious about is how companies like Synopsys and Siemens might approach testing random analog defects. Dave mentioned a paper by Sontag and Jurca at ITC about testing for such defects. Even though we think of digital circuits as purely digital, they’re implemented with transistors, which are fundamentally analog devices. Transistors operate in saturation or cutoff regions, but their analog characteristics can impact signal integrity and noise isolation. Could we use techniques from random analog defect testing to measure parameters in these transistors, particularly in I/O structures, to gain insights into signal integrity? This kind of testing might help bridge the gap between the analog and digital worlds in a meaningful way. It’s an interesting question. Can we expand our view of analog testing to include the analog behavior within digital circuits and use it to improve signal integrity?
SE: What recent advancements have improved fault coverage and test efficiency? And what’s next? Where do we go from here, especially as we move into the next set of nodes?
Armstrong: One of the most significant advancements in recent years has been the introduction of scan networks. These are really key because they allow for efficient pattern sharing between homogeneous cores, whether they’re on an interposer or a single die. This has major implications for test time, pattern generation, and a host of other things. What’s coming, and in some cases is already here, is scan over high-speed I/O. For example, with interfaces like UCIe or PCIe, you can embed scan bits directly into the payload and extract them later. This is a game-changer because GPIO (general-purpose I/O) interfaces for these bits are becoming impractical. Another shift is running scan streams over high-speed signals, sometimes up to 2 GHz. That’s a significant change, and it has a big impact on test time. Finally, I mentioned functional testing earlier. Functional testing at wafer probe, what we call ‘system-like testing,’ is becoming more critical. It allows us to identify known good die as early as possible in the flow, which has tremendous benefits and will see broader adoption in the future.
Harrison: The scan network is foundational to everything we’re doing today. On top of that, functional patterns remain an important area. While we try to minimize functional testing because it’s less efficient, there’s always some need for it. We’re working on ways to stream functional patterns more effectively over the scan network. We’ve also introduced capabilities for functional fault grading. This allows us to sort a large suite of functional tests to prioritize the ones that are most effective at identifying defects. Customers often have functional tests that contribute little or nothing to overall testability, so optimizing this is a big step forward. Couple this with advanced fault models, and we’re seeing not just better fault coverage, but also improved defect coverage. That’s critical, especially when you consider large-scale deployments like AI data centers. Even with 99% defect coverage, the untested silicon still represents a significant area. Pushing that coverage as high as possible is vital.
Hurtarte: One of the exciting trends is embedding smart agents into chips, especially larger and more complex ones. These agents monitor the health and performance of the chip and feed that information back to the ATE. This has been a recent innovation, maybe over the past three years, and there’s a lot of potential to build on. For example, metrology is primarily used in the front end to analyze structures like finFETs and geometries. But what if we embedded additional metrology capabilities into these agents? That would require collaboration with third-party innovators to enhance the functionality of these embedded agents. It’s a promising area for development. But let’s not forget the importance of intelligent data sharing — what some call ‘shift-left’ or ‘shift-right’ insights, or accelerated insights across the lifecycle. If we can enable seamless data sharing between SLT tools, upstream design, and downstream processes like wafer probe and final test, it would be transformative.
Ideally, real-time data sharing could dynamically determine when to increase test coverage upstream or reduce testing downstream, optimizing both cost and quality in real time. That kind of capability would be fantastic for improving efficiency and maintaining reliability. Of course, there are challenges. Managing data flow and protecting intellectual property are significant hurdles. For integrated device manufacturers (IDMs), it’s easier to control this within a single organization. However, in a fabless model, collaboration between design houses and foundries becomes essential, adding another layer of complexity. The future lies in overcoming these barriers to create a more integrated and adaptive testing ecosystem.
Ganta: Going deeper into the transistor level is becoming more critical. Even though these are digital designs, they’re built on transistors, so we’re seeing more adoption of cell-aware testing. This isn’t entirely new, but is gaining traction with the tighter DPPM (defective parts per million) requirements of advanced nodes. Cell-aware testing works at the transistor level, using SPICE-based simulations to model and target defects that traditional ATPG misses. Traditional ATPG only targets inputs and outputs of standard cells, which are purely digital in nature. But defects can occur at the transistor level within the cell, and cell-aware testing addresses those. The challenge is that achieving very low DPPM levels can dramatically increase test time and cost because it involves thousands of patterns. To address this, we’re leveraging AI and advanced data analytics to optimize these vectors while still achieving the required quality. Another important area is using analytics across the entire lifecycle—from design through manufacturing and in-system use. By feeding data forward and backward, we can continuously optimize testing.
SE: Just to clarify, when you mention on-chip testing, are you referring to built-in self-test (BiST), or is this something different?
Ganta: It’s related but distinct. BiST focuses on embedding self-test capabilities into the chip. Cell-aware testing, on the other hand, targets transistor-level defects using models built from SPICE simulations. Traditional ATPG tests the digital behavior at cell boundaries, but cell-aware testing dives into how transistors within the cell behave. So while they share some similarities, they’re addressing different layers of the testing challenge.
Related Reading
Standardizing Defect Coverage In Analog/Mixed Signal Test
IEEE P2427 is poised to be the cornerstone in the testing and validation of AMS designs; full industry support is still developing.
Leave a Reply