Options for how to build systems increase, but so do integration issues.
The migration from monolithic SoCs to chiplet-based designs is creating a confusing array of options and tradeoffs for design teams working at the leading edge, and the number of choices is only going to increase as third-party chiplets begin pouring into the market.
That hasn’t dampened the appetite for chiplets, however, which are deemed essential for future generations of semiconductors for several reasons. Among them:
“We predict that this year about 50% of high-performance compute will be multi-die,” said Mick Posner, vice president of product management for high-performance computing IP solutions at Synopsys. “That’s probably conservative. And 100% of AI designs are multi-die. Because of their requirements of high bandwidth and scaling, they have to be designed like that. The other trend we’re seeing, which is going to change everything yet again, is 3D, and this means logic-to-logic stacking. What we see happening in 2025 is that more customers are moving to a prototype stage, where they are preparing to do a 3D design. They’re not in production yet, because they need to prove out things like hybrid bonding and through-silicon vias between dies. Face-to-face bonding has an impact on their designs, and it has a lot of impact on IP.”
Fig. 1: Multi-die chiplet design. Source: Synopsys
An increasing amount of that IP is being hardened into chiplets. Large systems companies, which account for roughly 45% of leading-edge designs today, have limited experience developing that IP. And processor giants like Intel and AMD, which previously developed all IP internally, are looking to cut costs and speed time to market by leveraging third-party chiplets. That has engendered an entirely new ecosystem, this one focused on custom and semi-custom chiplets, as well as various assembly and packaging options, novel PHYs, and new materials and strategies for dissipating heat.
has observed a number of changes as monolithic design evolved into chiplets. “In monolithic design, it used to be that signal integrity was done by a separate group people on the PCB side, and they perfected that art,” said Subramanian Lalgudi, product specialist at Siemens EDA. “There was a process as to how they wanted to sign off on compliance. Today for chiplets, there are different protocols — USB PCIe, MIPI, SATA. The process is clear. If you are a chip designer, designing a transceiver or if you are a board person like HP, or somebody else designing the board, or if you are a repeater company trying to take that, amplify, send it to something, that process is clear, and the standards evolved as to the compliance required at the transmitter. But what is the compliance required at the repeater? What is the compliance required at the receiver, both for serial standard and for parallel standard? Serial is point-to-point. The parallel is basically the DDR applications there, but the energy per bit was all pretty high in the PCB so they could tolerate it. It’s a bigger surface area.”
When chips were monolithic, there were just proprietary considerations. “There was no standardization,” Lalgudi said. “The moment chiplets came up, they needed to do static timing analysis, which is a clock-to-clock task, which makes sure all the bits arrive on time before it could latch on and go and do that. There is a set-up time. There is a hold time that they have. This used to be called static timing analysis, but the moment they introduced chiplets, which means the chiplet guys or the producers may be different from the guys who integrate them together. Intel and AMD have already shown that. Intel has taken FPGA designs, and they can mix and match stuff. They can go ahead with the processor on one technology node, they can go with chiplets on older technology nodes. This is beneficial because now they can focus on what they’re really good at.”
Partitioning is paramount
The first thing the design team needs to understand is how to partition the chiplet system. Letizia Giuliano, vice president of IP product marketing and management at Alphawave Semi, explained, “The first thing to naturally disaggregate are I/Os. Those types of building blocks don’t scale with process nodes. It’s easier to keep those in older process nodes, and to keep your compute power in advanced technology nodes. The first things we do with our customers is help them disaggregate the system. So we talk about I/O disaggregation and memory disaggregation. And we talk about the compute, where they can take advantage of the latest technology and the latest power and performance benefits of using a leading-edge technology node.”
Where companies are today on the chiplet adoption curve can vary greatly. “We’re seeing two categories of customers,” said John Lupinski, vice president of product engineering at Blue Cheetah. “One is still learning about chiplets and trying to figure it out, and they know their product eventually will have to be chiplet-based. They’re trying to understand the interconnect packaging technology, what they can do, and the bandwidths they can achieve. The second category is trying to tape out real solutions to get to a production demo at conferences. They know they have so many hundreds of terabits per second, and they’re trying to move that from chiplet to chiplet.”
This is where much of the high-visibility chiplet work currently is focused, particularly for data centers and industrial and automotive applications. While UCIe and Bunch of Wires provide a standard way to connect devices, that’s just one important piece of a much bigger puzzle. Getting data to and from chiplets using those standardized protocols, and routing it to wherever it needs to go, opens the door to all sorts of possibilities starting with moving data through a physical layer (PHY). In the past, PHYs were largely proprietary because most chiplets were developed in-house (the exception is HBM). But as more third-party chiplets are included in designs, there is a growing focus on how to improve data speeds and ensure the integrity of the data at every level possible.
“If you use one vendor’s PHYs on both sides, it’s pretty much guaranteed to work,” said Ramin Farjadrad, CEO of Eliyan. “And an important reason this interoperability has been slow to come to market is that big companies put fear into the minds of their customers that unless you use the same technology on both sides, you cannot guarantee that it will work smoothly or flawlessly. These PHYs are not any different than SerDes today. In fact, they’re simpler than a SerDes, because a channel is simple. There’s no difference, and it can easily be adopted and built to be interoperable.”
Others agree. “What customers are looking for is the maximum bandwidth at the lowest power profile,” said Blue Cheetah’s Lupinski. “There are only two ways to do that. One of them is with legacy architectures, like SerDes. Some of the UCIe vendors are just pumping the clock rate up. The problem is your picojoules per bit is going way up, too. And if you try to take one of those macros and do 100 terabits per second, your power consumption is huge.”
That’s a big area of focus today. How fast data moves from chiplet to chiplet, or chiplet to memory, has a big impact on the overall performance of a device, and there are multiple ways to approach it. Typically, clocking schemes are synchronized so that computations from multi-threaded applications can be parsed and then combined. Any delay at any point can increase latency, which reduces time to results. Or put simply, a system is only as fast as the slowest component in that chain.
But clocks also can be globally asynchronous and locally synchronous, which minimizes those kinds of delays. “With the constraints being put on modern designs with the chiplet interfaces, the timing constraints are becoming too complicated and too onerous with traditional techniques,” said Lee Vick, vice president of strategic marketing at Movellus. “If you have localized clocks, that typically happens across a NoC, which is part of most traditional architectures, anyway. An asynchronous approach is a little more work, but the freedom that it gives you compared to traditional clock design techniques makes it worthwhile.”
PHYs can be customized to improve performance, as well. “If I want to build an NVIDIA Blackwell 2 chip, I need the highest possible bandwidth, the lowest possible power, the largest possible bandwidth per millimeter edge, and the smallest PHY area,” said Patrick Soheili, chief strategy and business officer at Eliyan. Those are the things that are really, really important to companies like NVIDIA, Broadcom, Intel, and AMD. And all of these guys are running at between 5 and 20 terabits per second per millimeter. If you don’t have that, then two GPUs that are connected are not going to act as if they are one chip. You’re going to miss on latency, on power, and on performance.”
Data and power integrity
Mapping how data moves to and through all of these heterogeneous components is non-trivial, and it all needs to be considered very early in the design cycle.
“There are two fundamental approaches,” said Ashley Stevens, director of product management and marketing at Arteris. “It’s about whether you have a full picture of everything in a top-down view, or whether you look at the design bottom-up, where you do something and then connect it to something else. The top-down approach is much simpler, because you know what you’re going to talk to, and because you know how everything is partitioned within the system. For example, you know the memory map of the complete system. You know what’s there, versus if you have a system whereby you intend to connect to arbitrary chiplets, third-party, or whatever. Then it’s much more complicated for several reasons. One of them is verification, because when you take the top-down approach, you can verify the whole thing together. But if you take a bottom-up approach, if we don’t have the other part of the system, then you need very well-defined interfaces, both in hardware and software.”
In addition to rapid data movement, that data needs to remain intact, and so does the power to process and move that data.
“In chiplets, because all the die are broken up, we have a lot of different die-to-die connections, which means that signal integrity becomes very important,” said Chun-Ting “Tim” Wang Lee, signal integrity application scientist and high-speed digital applications product manager at Keysight Technologies. “Then, of course, when you have different die, you have power that’s going to be on a different die. How are you going to distribute the power to all these other die? And that’s why power integrity also becomes a problem in chiplets. But also, once you have power integrity issues, you have thermal integrity issues. It adds on to itself.”
That view was echoed by numerous experts at the recent Chiplet Summit. “In the older design style of SoCs, you knew you had a package you could start designing with, assuming you were going to get a certain clean supply at the power pins of your design,” Rajat Chaudhry, product management group director for Voltus at Cadence. “Now you have multiple chiplets, and you have to set up that early model for the whole system, for whatever package style you’re using. You have to do it for the power integrity, but it also could be done to explore what works better for your system. What kind of technology or multi-die packaging style works, which can satisfy the constraints of what you’re trying to do? That’s one of the biggest changes with chiplets. So what becomes paramount now is trying to make sure early on, are you way off, or are you in the ballpark? Can you really make this system work from a power integrity perspective?”
Fig. 2: Multi-chiplet aggregation and optimization using different materials. Source: Cadence
Thermal integrity adds yet another challenge. Stress from heat can cause warpage in a substrate, and the thinner the substrate the more susceptible it is to warpage. This is particularly problematic for organic interposers, which require special handling, but it can affect large silicon interposers, as well. The thinner the substrate, the shorter the interconnect through that substrate, which can be through-silicon/through-substrate vias, or microbumps. That shorter distance improves overall performance and reduces the amount of power that is needed to drive signals, but warpage becomes more problematic. It can cause misalignment in the vias, particularly with different coefficients of thermal expansion, and those in turn can negatively impact performance, power, and signal integrity.
Bridges are another option, and increasingly they are being mixed in with interposers. In effect, those bridges and interposers are being carved up into smaller pieces to minimize the thermal effects, but that approach adds its own set of issues.
“It’s not just one bridge,” said Synopsys’ Posner. “You can have multiple bridges. They’re still subject to the same stress and strain, but the impact is less because it’s a cross-section. But as the overall size of the whole package increases, thermal expansion is still going to play a part. If you look at some of the architectures being deployed in the data center, you can see why a bridge fits in. These are tightly linked compute clusters, where the actual compute is scaled. There are very tight interposer-based links, but then peripherals go out to maybe an I/O chiplet, which could be on an organic substrate. And that fits into this kind of bridge architecture with mixing and matching of a very dense interconnect, and then broader, lower bandwidth-per-millimeter interconnect.”
Tradeoffs vary by application
Not all chiplets are created equal, and not all chiplets behave the same way under stress or in different package configurations.
“We see more and more differentiation for chiplets used in different applications,” said Andy Heinig, head of Efficient Electronics in Fraunhofer IIS’ Engineering of Adaptive Systems Division. “In industrial and automotive, chiplets must be much more robust. That means temperature cycles, mechanical robustness, vibration tests. This is totally different from what we see in data centers. In the early days of chiplets, it looked like you could use the same integration technology, the same IP, the same things for all applications. That’s not the case. You need very specific packaging solutions and IP for different applications.”
That also impacts the cost of the chiplet. “If you look at automotive, that may be as little as $20 per package,” said Heinig. “In data centers, it could be as much as $2,000 per package. There is a huge range, depending on the different types of packages. We need different package types in different price categories.”
Conclusion
Chiplets provide immense design freedom, and the potential for big improvements in both performance and power. In fact, there is widespread concern that there may not be sufficient power to run all the AI data centers being planned.
“We’re on a trajectory as far as the consumption of power to fuel all electronics, so it’s in our best interest to try to diminish what that trajectory looks like,” noted Mike Ellow, CEO of Siemens EDA. “The number of data centers that are going to be required across the world will increase. But in the footprint of power of existing data centers, can you increase the capacity three, four, or five times the capacity and recycle the resources associated with that? It’s an interesting problem?”
It’s also one that will require much more focus on real workloads, economics, and the laws of physics, all of which may put a damper on just how far architects can push this approach. The future of advanced chip design is certainly heterogeneous, but it’s also incredibly complicated. Getting comfortable with this approach and figuring out what can be automated best and how to do it will take time. There are a lot of knobs being turned, and at this point there are still a lot of questions about what works best where and why.
Related Reading
Chiplets Make Progress Using Interconnects As Glue
Industry learning expands as more SoCs are disaggregated at leading edge, opening door to more third-party chiplets.
Leave a Reply