Multiple factors are involved in deciding when and whether to disaggregate a planar SoC.
As chip designs become larger and more complex, especially for AI and high-performance computing workloads, it’s often not feasible to fit everything onto a single planar die. But determining when to move to a multi-die assembly isn’t always straightforward.
Multi-die approaches have some well-documented benefits. They allow designers to split functions across different dies, which can improve yield. And they can reduce costs by using older, cheaper process nodes for some parts, which can enhance reliability. This has become somewhat easier with the adoption of standards and improvements in tools. Additionally, as performance demands increase and first-time silicon success at advanced nodes decreases, companies are motivated to adopt multi-die solutions to remain competitive and manage power, cost, and reliability more effectively.
Two main forces from opposite ends of the design-through-manufacturing flow are driving multi-die assemblies. “The first is disaggregation,” said Pratyush Kamal, director for central engineering solutions at Siemens EDA. “The dies are becoming so big, especially in very advanced nodes, that they are hitting the size of the reticle, which is an issue since we are limited by the reticle size in the wafer manufacturing process. We’re working to understand how Cerebras managed to connect the design across the reticle boundary with a new technique. Now we are using wafer-level packaging to do the same thing, using the backside RDL and all of that. Still, for a monolithic design, the boundary is the reticle. The second driver for advanced package adoption is system aggregation. Here, we are talking about embedded voltage regulators. We are bringing more components within the package. And fundamentally, if we are shrinking the footprint of our electronic design, we are saving on power, saving on performance. These two drivers are driving the adoption.”
But which multi-die approach — and there are many — is best is a difficult question with no single answer. “You can move to a standard package with two dies side by side, connected via UCIe, or some other die-to-die connection,” explained Mayank Bhatnagar, director of product marketing at Cadence. “You can have an advanced package with a silicon interposer in between, which is, of course, much costlier. And you can even have a two-die stack, or vertically in a 3D stack. For each of them, the threshold is a bit different. If you’re talking about performance, if the design can be fully created within a monolithic die, at the latest node that is possible, then definitely that performance will be the fastest. But that has a lot of assumptions, which are not exactly true most of the time. That is why we see the highest-performance computing dies are the ones that are embracing this advanced packaging first. Performance is good if you can fit something into a monolithic die, but that’s a big ‘if,’ given the AI HPC workloads we see. And even when you can fit it into a die, if the die is too big, then the yield is low. So the criteria for going multi-die sometimes may not be so much about performance in a given design as being able to create a big enough design economically.”
In many cases, the decision to disaggregate a design is motivated by the need to pack more compute into an area larger than a single die, but there is still more to consider. “You may want to re-use some of the IPs, which are part of a chiplet, because that part of the design is not changing,” said Amlendu Shekhar Choubey, senior director of product management at Synopsys. “Additionally, you may want to have heterogeneous integration, where you use different process nodes that are more suitable for those kinds of functions. All these will come into play when you’re making the decision. We have flows that let system architects make these decisions and go through these tradeoffs even before they have RTL or a netlist. Once they have conceptualized or designed what their system needs to do, they can start playing around in our tool flow with the information they have on the possible technologies they want to use. Based on that partition, does a multi-die architecture make sense for them? Then, if they want to do that, how do they want to architect it? How do they want to partition? What are the tradeoffs? Do they want to go vertical? Do they want to go horizontal, or both vertical and horizontal, and for different die, depending on what technologies they want to use? All that tradeoff between the power, performance, and area can be evaluated before they start putting the RTL or the netlist together. That gives them a powerful option to make an optimal decision before they start doing any kind of front-end design.”
At that point, a number of top-level decisions need to be made. “Before we start a project, we look at power, performance, and area, which form the architecture-level specification,” said Esha Dubey, hardware engineering manager at Synopsys. “That’s where these decisions are made. Then you have to see the cost and power, and what kind of thermal estimation is needed. Those are decisions that a chip architect makes. If it’s a 2D or 2.5D design, or if we are going to the 3D stacking with multiple dies, then there’s a floorplan in place. Then you look at the different die connectivity checks, and since we also offer different IPs, you have to decide what kind of IP connectivity you want to establish.”

Fig. 1: Multi-die architecture flow. Source: Synopsys
The decision to move from a monolithic die to a multi-die approach depends on factors that stretch from architecture to manufacturing costs. This shift is both technical and strategic, and it impacts both design and manufacturing processes. To further understand this transition, it’s important to examine how these factors interact and influence the overall direction of chip development.
“We have to look at how we ended up having a chiplet ecosystem,” said Stephen Slater, integrating manager for EDA products at Keysight EDA. “The size of an individual die was becoming so large and taking up so much of the wafer that you ended up with manufacturing issues, such as little zones that are out of spec. That means you’ve got to throw the entire die away, and the yield is incredibly poor. By breaking it down into smaller functions, you can get more die out of a wafer, so you’re going to get a higher yield. The FPGA vendors, CPU vendors, and the Nvidias of the world led the way in bridging compute functions across multiple chips. It allowed them to reach performance goals that traditional scaling on a single wafer wouldn’t allow. The more we push a lot of the complexity into the packaging, the more we start to care about mitigating any potential performance limiters. Now you need to get the high-speed signals from one chip to another chip, so there’s a slight addition of latency and the potential for crosstalk as you’re making the connection. Most companies operating in the chiplet ecosystem will adhere to a digital standard for this high-speed interconnect, like UCIe. In UCIe, they have both standard and advanced packaging, where standard packaging is more like a traditional, organic package, but we’re sending the signals through the package, from one chip to the other. Then you get to advanced packaging, and that’s with a silicon interconnect. So that is its own IC, and needs to be assembled with the chiplets that it is forming the connection between. This latter advanced packaging is a high-density interconnect, so you can get to the fastest possible speeds, and the density of connections is the highest possible.”
As a result, the engineering team needs to spend more time on signal integrity and power integrity analysis. “This is an area we care about a lot, and we find our customers putting many more cycles into designing and optimizing these links,” Slater said. “We need to be careful about the layout, which is especially tricky since interposers don’t have a solid ground plane for return currents (it’s typically a hatched/meshed ground plane). And an increasingly important issue in chiplet systems is how to get the power to where it’s needed in a flat-impedance, lower-resistance path. Something that we have also seen in these applications that are using advanced packaging, they tend to be for applications that draw a lot of current. It’s lower voltage, but a lot of current, and that creates big challenges for the design of the vertical supply of power.”
Multi-die assemblies require more work, take more time, and depending on the approach and the target workload, they can cost significantly more — at least for the initial implementations. “It all comes down to an ‘absolutely must have, let’s move,’ situation,” said Bhatnagar. “When you have multi-die, it means you have multiple tapeouts, so the cost of all those masks and the packaging is higher.”
Part of the cost depends on the sophistication of the engineering team in working with these architectures. “If I look at the various tradeoffs like performance, power, area, cost, when it comes to performance, if it fits within a single die, that performance will be the fastest,” Bhatnagar said. “Whenever die-to-die happens, some bottlenecks get created, and data is moved, which is why partitioning is so important. You want to partition the design in such a way that you are moving a minimum amount of data from one die to another.”
Power consumption is also a significant concern when multiple dies are involved. “When you move data from one die to another, you have to sacrifice some power, but then you may be able to save power if you can keep some of your design in an older node and not go to a leakier one,” Bhatnagar said. “When you split a die, the regular standard cell connection becomes a die-to-die connection going to the package. So, per big transfer, that is more expensive than if it were on the monolithic die. That is something that we see every customer wanting to aggressively reduce, and as a supplier of UCIe and die-to-die interfaces, that is probably our number one goal — to reduce the power consumption — because that is something the customer sees the moment they want to split the die into two.”
The economics of multi-die assemblies
Economics plays a significant role in deciding whether to move from a planar SoC to a multi-die assembly.
“When you move your entire SoC to a very new process, like 2nm, the cost of the wafer — the cost of every single monolithic die — is high,” Bhatnagar explained. “Also, if it is a large design, the yield is low. That, in collaboration with the high cost of the wafer, means your per-die cost is very high. Say you had a GPU or a high-performance computing core that you wanted to move. With multi-die, you could just move that piece and leave everything else at an older technology node. So in that case the cost is lower, not only in terms of the wafer cost and the per-die cost, but also the design cost. Let’s say you have an RF interface that is not going to get much benefit [from moving to a multi-die assembly]. You might as well leave it there, as it’s all tested and silicon validated. That reduces the design cost and improves the reliability of those things. You don’t have to redesign all your analog portions into the new process.”
Over the past few years, the cost of packaging has come down, as well. New players have entered the market, making advanced packaging more readily available, and developers are taking advantage of that.
Things to consider
The chip architect typically is the one who decides when to disaggregate a design. They start by defining the product based on some requirements. For example, a hyperscaler may require 112, 224, or 448 gigabits per second throughput.
“From there, they look at a portfolio of IP from which they will make a selection,” said Shawn Nikoukary, senior director of SoC engineering at Synopsys. “Does this IP support multi-die or not? Then it breaks down into power. Power is the most important requirement, especially in the data center. So at the architectural level, they have to look at all the power savings they can achieve, and that usually pushes the design into more advanced packaging. To support those IPs at really high data rates, at lower power and smaller nodes, it automatically goes into advanced packaging.”
There are other architectural and tool considerations. “The real breaking point is integration complexity, not just scaling,” said William Wang, CEO of ChipAgents. “Advanced packaging becomes necessary when system-level integration (latency, bandwidth, power domains, resets, clocks) can no longer be reasoned about reliably with monolithic RTL and late physical signoff. Chiplet boundaries turn architecture assumptions into hard contracts. And once logic spans dies, interface correctness, latency assumptions, protocol rules, and power and reset behavior must be explicit and continuously checked. Silent violations here are a major source of late failures.”
The tools themselves need to be robust, as well. “Tools break because the architecture intent is not machine-checkable,” Wang said. “Specs, diagrams, RTL, and integration scripts drift over time. ChipAgents’ strength is turning architectural intent into continuously checked constraints at RTL and integration time, before package and silicon decisions are locked.”
Compounding and intertwined challenges abound, and despite progress, EDA tools still trail the rapid changes in packaging. “Simulation needs to be done, and signal integrity is key,” Nikoukary said. “It used to be a PCBM package. Now it’s more like silicon within the chip. The types of simulation and new tools are evolving there. It’s not a single thing going into the multi-die decision. It’s IP, architecture, ecosystem, and tools. It’s a very complicated, multi-pronged problem. It’s nice to be the packaging guys these days, to be able to be in the center of all this, to bring all the chiplets together and resolve the simulation. That includes thermal, electrical, EMIR, mechanical, multi-physics. They also must work with the ecosystem to ensure that for the design we come up with, the PDKs are available. And by the time the design comes out, this new technology is available, and the yield is good.”

Fig. 2: Multi-die design methodology. Source: Synopsys
Still, the tools are making progress. “Compared to the advanced packaging tools from the past, tools today work 10 times faster than doing layout by hand,” Nikoukary added. “It’s all AI-enabled or -automated, and things are evolving. It’s not just one challenge on top of another challenge. There are solutions coming up to speed, and people are understanding how to do things faster. You’re talking about 50 chiplets going into a single package, so you cannot design it using the previous processes and tools.”
So why do it? “The goal is to reduce costs, but also improve performance,” said Siemens EDA’s Kamal. “HBM is a classic example. It’s an aggregation example. It’s not a disaggregation example. We had memory lying on the board somewhere, we had DDR, then we moved to HBM within the package.
Similar trends are at work in 6G communications. “Governments are focused on 3DHI (3D heterogeneous integration) because what is happening in 6 G communications is you’re talking about 100GHz-plus spectrum,” Kamal said. “Communications 101 says the size of your antenna is tied to the wavelength dimension. The pitch of the antenna cannot be below ‘this.’ So when you look at the wavelength lambda of 6G carrier waves, the pitch of the antenna is going to the micron level, which means you’re thinking about antenna-in-package — the whole 6G stack. DARPA wants to build this. Other governments also want to build this. Naturally, the physics drives in that direction. The U.S. is working on. We call it NGMM (next-generation microelectronics manufacturing). It’s a DARPA and Texas government-funded project between the two entities. $1.5 billion was funded in 2024 for the Texas Institute for Electronics to build this 3D Heterogeneous Integration Facility, and Siemens is a partner. Twenty-plus companies within the U.S. and universities are partnering to make this happen. So 6G at full system impact is an end goal for us.”
The future
So what would make it easier for engineering teams to migrate to multi-die assemblies? A common answer is fewer choices.
“One challenge for users who want to go for an open chiplet economy or a chiplet marketplace is the number of variants that are possible, said Cadence’s Bhatnagar. Currently, there are too many. Since I manage the UCIe and custom Ultralink die-to-die IPs, I see the number of variants possible. I was giving training to some new folks who had joined last year, and I was telling them that in a single process node, I can make 32 UCIe, not counting a 3D stack. Just 2D and 2.5D, I can make 32 variants. That is the problem, because once the market gets split down into such narrow segments, for any one user to make a chiplet that has wide appeal becomes difficult. After all, they may end up targeting market ‘A,’ but two years later, by the time that chiplet is done, they may see the market is going in direction ‘C.’ A bit of cohesion, coalescence behind something will help.”
Leave a Reply