Understanding connectivity issues and interactions are only part of the problem; ECOs can cause unexpected problems in other areas.
Key Takeaways
Multi-die assemblies are forcing engineering teams to map out how the various components will function and interact much earlier in the design process, and to develop a detailed plan for how they will be verified and tested.
While the “shift left” and “extend right” concepts have been in place for at least the past few years, the level of detail required at the leading edge of design and the amount of data that needs to be considered are exploding. Designs now must include interconnects for chiplets developed at different process nodes, various types of memory, as well as thermal mapping based upon workload-specific gradients, monitors to track aging effects, and much more detailed characterization of hard and soft IP.
These designs can involve multiple chiplets and some type of interposer or advanced substrate, and there are even some full 3D-ICs in development. Development costs for the most advanced chips may run $100 million or more, so the stakes for ensuring these devices are functioning properly and reliably are significant.
“Designs are getting huge, and if you’re doing billion-gate designs, validating connectivity at RTL becomes important instead of waiting for validating connectivity at the net list, because you cannot load those designs,” said Kiran Vittal, executive director for product management at Synopsys. “Some techniques that are being used today, including ad hoc methods, may not even work on these kinds of designs. The size of the SoC is growing in terms of gate count, in terms of reusing IPs, and there are more and more power and clock domains.”

Fig. 1: IP for multi-die test, including embedded memory test, support for IEEE 1838 DFT, and high-speed access and test for intra-die logic test. Source: Synopsys
Any missteps throughout the flow can snowball, making it imperative to identify and solve problems as early as possible, particularly if there are changes in any part of the design. Any changes on the design side need to be reflected in updates to the design for test plan, which has become increasingly complicated.
New DFT challenges
“A small change for a functional ECO can impact the DFT,” Vittal said. “This means the changes must be validated at RTL and then taken through the design implementation. Along with that, every design now uses complex DFT structures like test compression, which previously was added at the SoC level. But now the design team is also adding test compression at a block level or a subsystem level. There may be multiple layers of test compression, with connections from the block through the subsystem to the SoC, with numerous new connections within the DFT part.”
That’s just for starters. “At RTL there are functional DFT issues, such as combinational loops, as well as Shift/Capture mode issues related to the controllability of clocks, resets, and clock gates,” explained Kanwarpal Singh, product engineering group director at Cadence. “There can be additional issues like clock/data intersection, memories not bypassed in test mode, and latches not being transparent.”
The level of detail required for DFT needs to reflect the massive increases in complexity of these multi-die assemblies. This is made all the more challenging by the fact that all the leads or signal paths may not be accessible to testers, and chiplets and memories may be different heights and shapes. On top of that, many advanced packages are integral parts of the design, and may be highly customized.
“DFT is all about controllability and observability, and to ensure that the design is testable, designers need to make sure test clocks are properly connected, resets are reaching the desired flops, clock gates are enabled/controllable, and test signals are correctly connected to the memories,” Singh said. “Any issues in the connectivity of the DFT logic can lead to design flops becoming non-scannable and loss of test coverage. The RTL stage is the best place to find these issues and fix them. This will save costly iterations later.”
Others agree. William Wang, CEO of ChipAgents, noted that most DFT failures today are integration failures, and that missing or broken propagation of scan, test, and reset signals across hierarchy, power domains, and reused IP are structural connectivity problems, not synthesis problems. “RTL is the last place where fixes are cheap. After DFT insertion and physical implementation, connectivity bugs cause ECO cascades and schedule slips. Catching them at RTL is the highest ROI point.”
That’s easier said than done, however. “Test controls, such as scan_enable, test_mode, test clocks, and resets, are not reaching all intended endpoints,” Wang said. “Power-aware DFT bugs where scan paths cross powered-down domains or isolation and retention are not test-safe. There are clock and reset inconsistencies between functional and test modes, and there are wrapper and top-level integration mistakes caused by parameterization and the generation of logic.”
What’s the plan?
Engineers may be contending with hundreds of different interface IPs in multi-die designs. “It’s very common nowadays to see a million connections,” Vittal said. “Recently, a customer told us that they have 6 billion connections/nodes that they need to verify at their SoC level. These are the kind of challenges that exist and need to be addressed. The shift left methodology to catch issues upfront at RTL is the best way to address these challenges.”
That means chip architects must determine where the testing will happen, what will be tested, and how the results will be interpreted.
“The diagnosis part is not too difficult,” noted John Ferguson, senior director of product management at Siemens EDA. “We have some of the standards already in place. And for the most part, if you’re LVS-clean and you’ve designed everything carefully, then from an input pin to an output pin, you can do the diagnosis anywhere within that three-dimensional system. There are some challenges around how to do the physical test. You can put a standalone die or chiplet on a test bench, and if it’s a known-good die, you can put this into the system, and everything’s going to work great. The problem is that when you put it into the system, it’s getting hot, it’s working, it’s being stressed, and it’s not going to behave the same. Now the question is whether it’s within my specs. That’s a whole new question. We need to figure that out.”
How this gets resolved can vary. “Some companies will do known-good die, known-good stacks, and known-good packages, and they’ll put those together and get it a little bit better,” Ferguson said. “But there’s still a problem. Even if you want to take the whole 3D-IC assembly and say, ‘The whole thing is known-good, and I can use this in whatever I’m putting it into,’ you still have a problem. It’s gone through this manufacturing process where you’ve heated it, and it has warpages. Now I put it on the test bench and probe it, but is it connecting to the right things? Probably not. So now you’ve got a whole new issue. There probably are some ways to do it right. We can do modeling of the warpages, and we can tell you where you may need to have a longer one in this location, and a shorter one in this location. But this still needs to be solved. It’s an outstanding issue.”

Fig. 2: DFT challenges in multi-die assemblies. Source: Siemens EDA
More dies equals more potential problems. “One is the performance testing,” said Chris Mueth, senior director of new markets and strategic initiatives at Keysight EDA. “It’s hard to get test points on a chiplet, but following some industry standards, test equipment companies will innovate around that. If you’re looking at how to get to a one-test solution, where you have a magic wand system-level test that will exercise the chip in such a way that you know you have a good assembly, that’s easier said than done. Built-in self-tests are important for these kinds of assemblies because you can’t probe everything. If you’re looking at the structural integrity of the die, which is the main bottleneck, how do you test that? You can test it with a TDR (time domain reflectometer) system, which allows you to peer into the package itself by probing on pins on the package. You can probe and essentially look inside like an X-ray machine, and you can deduce defects inside that package with a TDR system. So that’s one way of doing it. And, of course, the ultimate shift left is to simulate this stuff up front rigorously so that you have fewer concerns about the package integrity on the back end.”
Given these persistent challenges and the complexity of multi-die testing, design teams must rethink their approach to verification and design-for-test strategies. This is where advanced verification methodologies come into play, bridging the gap between physical and logical testing demands.
In the verification realm, static and dynamic verification play a role in addressing connectivity challenges. “Static verification can help catch issues early at the RTL stage, saving costly iterations later,” Cadence’s Singh said. “Dynamic simulation can also help check the correctness of the DFT circuitry in the RTL.”
Still, while static checks catch the majority of connectivity bugs, including reachability, completeness, illegal crossings, and constraint mismatches, “dynamic simulation is only effective after static correctness is guaranteed,” said ChipAgent’s Wang. This is why ChipAgents focuses on agent-driven static connectivity reasoning, then recommends minimal dynamic tests.
Building a plan
Putting the pieces together in a way that will work for a single workload is a challenge. But building a unique design for an AI data center, for example, is much harder.
Still, there are some common elements. Synopsys’ Vittal pointed to five initial steps that design teams need to take to address these challenges, including:
Other steps then need to follow. “Chip architects/designers need to be aware that ensuring proper connectivity of the test clocks/resets and other DFT logic helps make the design scan ready and helps with test coverage goals,” Cadence’s Singh said. “This means they need to plan for DFT at the very beginning and perform connectivity checks at the RTL stage to save on costly iterations later in the design cycle.”
And they need to approach it as part of the design, before it is sent to manufacturing. ChipAgents’ Wang said design teams should treat DFT connectivity as an interface contract, not a post-RTL script problem, as well as run connectivity checks continuously in CI, not once before tapeout. “Also, use power-aware and hierarchy-aware static analysis at RTL, and align DFT checks with functional connectivity, since many bugs affect both.”
Conclusion
Early and robust verification methodologies are essential for managing the complexity of modern chiplet and multi-die designs, with static verification at the RTL stage catching most connectivity and DFT issues before costly downstream errors occur. Scalable, design-agnostic tools and continuous connectivity checks — supported by features like schematic visualization and reusable macros — enable efficient debugging and adaptation across frequent design changes.
The goal is to understand where the problems are, and to address them as quickly as possible. “Using static tools can help catch DFT connectivity issues early at the RTL stage,” Singh added. “These checks can be done at all levels — IP, sub-system, and SoC. And having a full connectivity map of the DFT signals can help with scan chain insertion and test sign-off later.”
Related Reading
When To Move To Multi-Die Assemblies
Multiple factors are involved in deciding when and whether to disaggregate a planar SoC.
Leave a Reply