Swapping Out Chiplets: I/Os Vs. Compute

Multi-die assemblies give chip architects the option to change some dies while keeping the rest of the system intact, but which is best to keep?

popularity

Key Takeaways:

  • Companies can save time and money by swapping out a compute, memory, or I/O chiplet to gain technology improvements, while keeping the other dies stable.
  • Chip architects may choose to keep their I/Os stable and swap out compute to move from a 5nm process node to 3nm to achieve performance and power improvements, or swap out memory from LPDDR5X to LPDDR6.
  • Swapping out I/Os makes sense if chip architects want to keep the die on a less advanced node than the compute, but upgrade to a protocol such as 224G SerDes.

Despite initial design and verification challenges, chiplet-based architectures are proving to be a cost-effective way of reusing large portions of a design while staying current with the latest I/O protocols and logic.

Early discussions about chiplets were largely targeted at developing different functional blocks and IP in whatever node made the most sense, especially analog components. While that is still the case, the bigger economic benefit may be the ability to selectively incorporate new IP, support protocol or memory transitions, and/or create product variants without a full multi-die respin. In effect, the architecture supports modular recomposition — preserving stable portions of the design while updating the blocks that deliver the greatest system-level benefit.

One element that can be swapped out — or kept — is the I/O chiplet. This is important because interface protocols, interconnect standards, physical input/output (I/O) connectors, processors, and memories are all evolving quickly in order to keep up with the demands of artificial intelligence (AI), machine learning (ML), and high-performance computing (HPC). In a chiplet-based system, the logic and memory can stay the same while the I/O chiplet is replaced with one that takes advantage of higher speeds or more optimized interconnect protocols for specific workloads. Alternatively, the I/Os can remain while logic or memory are changed out for manufacturing process improvements or scaling to increase logic density and reduce power. Chip architects are exploring both options.

Many AI data centers and HPC clusters will use a variety of Ethernet-based scale-up protocols, UALink, and others in the system. “If you’re building an SoC and you’ve got one shot at a tape-out — and it’s going to take you the entire cycle to get the design back, validate it, and get it qualified for systems — you don’t have much leeway to keep changing this plumbing,” said Arif Khan, vice president of product management and marketing for the Silicon Solutions Group at Cadence. “Chip architects are having to make multiple implementations. Sometimes they’re putting both solutions on the same die. We have customers who are looking at advanced solutions they want to leverage for multiple generations. We have customers that are building disaggregated designs with chiplets, where certain technology stays in, say, 6 or 7 nanometers, they’ve got a core that would then be at a different, more advanced technology, and some other I/O dies on yet a different technology. Then on the I/O die, maybe I can use one version to talk to one protocol, and another to talk to a different one. Those strategies are being looked at very closely by our customers.”

With all these technologies on different dies on different nodes, all kinds of swapping patterns are likely to emerge, from I/O to compute.

“Historically, compute dies were upgraded more frequently than I/O dies,” said Rob Kruger, director of product management for 3D IP and chiplets at Synopsys. “Chiplet I/O dies could often be reused across multiple compute generations, provided I/O was not a bottleneck with the new compute die and there were no platform shifts, such as transitions from PCIe 5.0 to PCIe 6.0 or 7.0.”

What’s different today is that AI-driven demands for compute and bandwidth are growing at unprecedented rates. “As a result, compute, I/O, and memory systems are increasingly being updated simultaneously to optimize overall efficiency and avoid I/O and memory bottlenecks in system architectures,” said Kruger. “In applications such as automotive or physical AI, however, there will likely continue to be greater reuse of I/O dies.”

The choice may come down to whether the chiplet is internally developed, or purchased commercially. “In a captive system, we see swapping out the compute chiplet, but not in the chiplet open market,” said Priyank Shukla, director of product management for high-speed SerDes for I/O at Synopsys. “In a captive system, if a manufacturer is manufacturing both, they have one year in tandem to leverage cost.”

Chip architects might have a main chiplet and then tailor different I/O chiplets for different markets. “That could be a possibility,” said Kruger. “HPC is one market. Then there’s the automotive chiplet market, which is looking at having a main base die and adding chiplets to address different models of cars, for example.”

Keep I/Os, swap out compute
There are a couple of reasons why companies may swap out the compute chiplet.

“One reason is that you are upgrading the CPU, or maybe your accelerator,” said Mick Posner, senior product marketing group director for chiplets and IP solutions at Cadence. “Let’s say you developed it on 5nm, and you’re going to spin it to 3nm. You get a performance improvement. You get a power improvement. Potentially, you may have scaled your architecture, but using the latest and greatest technology allows you to scale it back down. That is one scenario. At the same time, maybe the I/Os have not moved forward. Your PCI Express, SerDes 224 Gbps I/Os give you enough scalability.”

Also, the developer may want to keep I/Os stable while swapping out memory. “We had a scenario recently in which LPDDR5X was used as a chiplet, and they wanted to move to LPDDR6 because they had a memory bottleneck,” said Posner. “This meant their system was limited by the memory bandwidth, not by CPU performance, which is typical in an AR/AI space. You typically have a memory bottleneck, not a processor bottleneck.”

Economics plays a large role in decision-making. “In almost every case I can think of, the compute die is what rotates,” said Ashish Darbari, CEO of Axiomise. “It makes a lot of sense from an economic point of view. The compute die is where you’re paying the process-node premium — N3, N2, A16 — and where architectural change happens from generation to generation. I/O, PHYs, SerDes, memory controllers, and security blocks mature slowly and don’t benefit much from leading-edge nodes. Re-spinning a 224G SerDes on N2, when N5 or N6 meets spec at a fraction of the mask cost, is just a waste.”

A few things reinforce this. “Anything touching PCIe, CXL, Ethernet, automotive functional safety, or security carries certification and qualification costs that nobody wants to pay twice,” said Darbari. “I/O chiplets on mature nodes yield well. You’re not going to throw that away because your compute roadmap moved. And since UCIe 2.0 and BoW were explicitly designed to stabilize the die-to-die boundary, the I/O chiplet can present the same logical interface across multiple compute generations.”

Others agree. “In a chiplet marketplace, the most likely reuse pattern is to keep the I/O chiplet stable and swap the SoC or compute chiplet,” said Andy Nightingale, vice president of product management and marketing at Arteris. “The reason is unromantic. High-speed SerDes/PHY/analog and board-facing compliance are expensive to redo, don’t scale cleanly with leading-edge nodes, and benefit from maturity and yield. This means they’re great candidates for a reusable I/O die on a steadier process.”

Compute tiles, by contrast, where the next node and next microarchitecture are being chased, follow the NPU/GPU variant. “The counterexample is product-line tuning. Some vendors may keep a ‘base compute’ tile and swap I/O tiles to target different markets, such as cloud vs. edge vs. automotive, other memory attach points, or other external standards,” Nightingale noted.

That pattern is not universal, however. In systems where external standards, bandwidth targets, or market-specific interfaces shift faster than the compute roadmap, the I/O chiplet may be what changes first.

Keep compute, swap out I/Os
The speed at which interconnect standards evolve factors into the decision.

“In practice, I/O chiplets are swapped more frequently since system requirements and connectivity standards evolve faster than compute-logic architectures,” noted William Wang, CEO of ChipAgents.

Manufacturing nodes are one reason for this. “The chip architect is more likely to keep the compute die and swap out the I/O chiplet,” said Satish Radhakrishnan, head of GTM for semiconductor and electronics at Vinci. “Compute dies are usually manufactured on advanced nodes, and as they scale from N5 to N3 or N2, more compute can be added within the same space. Because these dies are expensive and complex to redesign, chip architects are less likely to replace them just to support a new protocol.”

In contrast to leading-edge compute, I/O chiplets are usually manufactured using older technologies that are cheaper and easier to update. “That makes them a better place to absorb new protocols or interface changes,” said Radhakrishnan. “The important caveat is that swapping an I/O chiplet still changes the physical system, including routing, power, thermals, and reliability, so it needs to be validated at the package and system level.”

Still, the core device with the compute tends to be the one that stays static. “For some product lines, customers are saying, ‘I’ve got a compute core, but I want to be able to sell it as different potential products,’” observed Kent Orthner, principal solutions architect at Baya Systems. “This may mean they have different chiplets for the I/O. Maybe one is all about memory expansion. Maybe one is all about communication over, say, PCIe. Then, they might have the same compute core, but more in a networking environment where, instead of PCIe, they want large numbers of 400 Gbps Ethernet. The idea of being able to have your compute core and then swap out the I/O that it uses to connect to the rest of the world tends to dominate.”

The long development times for high-speed interconnects impact decision-making. “The compute wants to be super-fast and leading-edge,” Orthner noted. “But for the I/O stuff, like PCIe cores, it’s okay to do those on older technology nodes. You might have your bleeding-edge compute on 3nm TSMC while you’re doing your I/O on 7.”

Also, many I/O standards, such as PCIe, are heavily dependent on the PHYs and the SerDes. “SerDes is now running at 224 Gbps per lane, so they’re just crazy, crazy fast. But they take a long time to develop, and they’re much more tied to the technology node than the digital logic where the computer clusters are,” he explained. “By having the I/O chiplets on an older technology, you can reuse that investment, and you can have a fast SerDes available in the time it would take you to do your computer cluster design. There are some exceptions where people say, ‘My I/O is great. I want to be able to swap out different capabilities of processors, so I might have a larger compute cluster versus more compute clusters.’ But I tend to see swapping the I/O much more often.”

Viewed another way, the tradeoff often comes down to which part of the design is treated as the stable core and which part is expected to adapt around it.

A useful analogy is the brain versus the limbs. “If you put me on the spot and asked me to choose which one is more likely to be swapped out, I would say you are probably going to keep the main SoC chiplets and swap out I/Os,” said Hee Soo Lee, high-speed digital design segment lead at Keysight EDA. “As an analogy, it is like asking if you’d rather change the main brain or the arms and legs. Both are super important most of the time. The smart move is to keep the core brain the same and swap out the I/O parts, because it gives you more flexibility and makes more sense.”

Fig. 1: Multi-die chiplets from different vendors. Source: Keysight

Application and use case
Ultimately, the application drives decisions around whether a chiplet die needs to be on a leading-edge manufacturing processing node or would benefit from a faster interconnect protocol.

“Figuring out whether to swap the main processor or SoC versus the interface and I/O in a chiplet really depends on factors such as what the use case is,” said Keysight’s Lee. “What are the components’ needs? How flexible does it need to be, and what are the costs associated? All of these are factoring those decisions, so it’s going to be very, very difficult.”

Additionally, some sectors move faster than others. “Domains such as automotive and industrial, sensors, networking, and functional-safety islands can evolve faster than the compute requirement,” said Axiomise’s Darbari. “So you sometimes see the pattern where you can have a stable compute die with rotating I/O tiles.”

Combinations of interconnects must also be considered. “If you’ve got a storage device and it needs to go into different kinds of systems, you may use different protocols, such as CXL,” said Cadence’s Khan. “You may choose one I/O that is more geared toward a different PCIe-style implementation, or a UALink implementation, and then you would swap out the I/O die. But if you have an I/O that gives you enough bandwidth from your I/O subsystem, and you want to add additional compute, more storage capacities, or the like, then you would keep the I/O and use other chiplets to solve the problem. We’ve seen customers evaluate all of this.”

Swapping out chiplets offers flexibility for applications as diverse as an AI data center to a cheap consumer device.

“If you are keeping the main SoC chiplet, the chip architect can optimize a whole setup for different uses,” said Lee. “If it’s going to be a giant server or a small consumer device, it’s going to be very straightforward. You need more power for the data center. Then you can stack HBM I/O chiplets. But if you want to make it a cheaper consumer product, you can swap it to a standard, cost-effective I/O chiplet instead. Manufacturing-wise, it is more simplified. You can adapt these changes much more quickly as a new standard comes out. Lastly, saving money is a key consideration. Building an SoC in a chiplet format still costs a lot, especially when using advanced nodes. Therefore, leveraging that common piece across a bunch of different interconnects or protocols makes good sense.”

FPGAs with programmable I/Os are another solution. “Everybody has a different protocol and different ways of implementing it,” said Venkat Yadavalli, head of the Business Management Group at Altera. “As an example, in industrial applications on a factory floor, there are many protocols. Some of them are running on EtherCAT (Ethernet for control automation technology). Some may be on Ethernet. Some may be on a different kind of bus architecture, but all of them — Internet of Things at the edge — need to be connected and translated into something that can run on a factory floor for decision making. The programmable I/O is there to enable customers to connect their data, whether data planes or control planes.”

Integration challenges
One of the central challenges is keeping all the components and chiplets optimally connected, before and after swapping.

“Whichever chiplet is swapped, engineers need to design I/O as a system — partitioning in terms of what stays on which die, traffic models, including AI bursts versus sustained streams, NoC-to-I/O coupling with reference to QoS, backpressure, and ordering, and test or observability,” Arteris’ Nightingale said. “If you want good I/O in AI-era silicon, assume the data will arrive in inconvenient surges, insist on end-to-end flow control, and make sure your interconnect fabric can enforce the rules when reality ignores your block diagram.”

The key role of a system architect is to decide where to pull things out and what to put onto advanced nodes. “If you’re using the 2nm process for a data center compute node, it makes perfect sense,” said Lee. “That is where you need speed and power. However, if you’re putting very low-speed logic into it, you are wasting the real estate because it’s a very expensive process anyway. Chip architects also need to maintain some circuitry or functional blocks in medium-level nodes or technologies. Those are going to be very important.”

The same could apply to 3D-ICs, where chip architects have the same base die and then swap out I/O chiplets on top for different models or different applications. “With 3D, that’s an interesting concept, which people are starting to talk about,” said Synopsys’ Kruger. “I haven’t seen anyone do it, but that is a possibility. You can have a base die, maybe that base die lives on its own, and then you can add another die to add functionality, so you can maybe have more memory through your cache, or higher-end features. That is a concept that is being explored. It’s a little harder to do, but it’s manageable. You have to think about the different bumping, whether you’re going to tape out a chip by itself, or you’re going to add 3D, then you go to a hybrid bond bumps, and you have to go through that process to redesign it a little bit, but it’s only a few layers difference.”

Conclusion
There is not always a clear match between development cycles for chip design and the pace at which processors and interconnects are evolving, which is why it makes sense for chip architects to swap out one chiplet sooner than another. But what stays and what goes depends on the use case.

“Everything is moving quickly,” said Keysight’s Lee. “The challenge is getting to market without delay. With a single-chip workflow, the development cycle is longer and more expensive. With chiplets, you can reuse existing parts and assemble new systems more efficiently.”

Using chiplets like building blocks is an optimized way to bring products to market. “It’s faster and less expensive because it reuses existing chiplets to reconfigure the system for different functions,” said Lee. “That is one reason so many companies are adopting chiplets instead of building everything as a single SoC.”


Related Articles
When To Move To Multi-Die Assemblies
Multiple factors are involved in deciding when and whether to disaggregate a planar SoC.
Confusion Grows With More Interconnect Options And Tradeoffs
Each standard serves a specific use case, so chip architects are choosing more than one for a single design.
Options Grow For Standardizing Data Movement And Sharing Resources
But figuring out which ones to use, and when to use them, isn’t always clear.



Leave a Reply


(Note: This name will be displayed publicly)