Enabling Cost-Effective, High-Performance Die-to-Die Connectivity

When it comes to die-to-die PHY interfaces, the best solution is highly dependent on the end application.

popularity

System advances in accelerated computing platforms such as CPUs, GPUs and FPGAs, heterogeneous systems on chip (SoCs) for AI acceleration and high-speed networking/interconnects have all pushed chip integration to unprecedented levels. This requires more complex designs and higher levels of integration, larger die sizes and adopting the most advanced geometries as quickly as possible. Facing the true limits of Moore’s Law, the semiconductor industry is coming up against barriers to continued performance improvement. The old design paradigm is no longer sufficient, and we’ve now reached practical limits that we haven’t encountered previously. Are we witnessing the end of a monolithic design for these advanced SoCs? We have already experienced the escalating costs of the most advanced processes and the limits of reticle size. For some time now, CPUs, GPUs and FPGAs have all started the march to a disaggregated multi-chip solution.


Figure 1:  A high-speed optical networking chip by using die to die connected chiplets on a substrate.

Chiplet value proposition

One manufacturing reality is that very large chips don’t yield as well as the same design split into separate smaller die. If the design is a large multi-core processor, FPGA, networking chip or AI accelerator, it is attractive to split the design into multiple small die, which are commonly referred to as chiplets.

Chiplets offer a compelling value proposition, which includes:

  • Better yield due to smaller die size
  • Volume cost advantage when the same chiplet(s) are used in many designs (design reuse)
  • Flexibility in picking the best process node for the part—especially when  SerDes I/O and analog do not need to be on the “core” process node
  • Shortened IC design cycle time and reduced integration complexity by using pre-existing chiplets
  • Lower manufacturing costs by purchasing known-good die (KGD), if available


Figure 2: SoC die sizes vs. maximum reticle size from 2006 to 2020. Source: AMD.


Figure 3: Trends of normalized cost per yielded die from 45nm to 5nm. Source: AMD.

Chiplets address two key challenges. First, you have an economic challenge due to the costs associated with a very large die size. Then you have a technology and manufacturing challenge due to the maximum size of the reticle that the lithography machines can etch— The most advanced stepper today is the 193i immersion stepper, with a reticle size limit of 33×26 (~856mm2). If a design were to approach that limit, there is no alternative but to split the SoC into multiple chiplets and package it as a “chipset” using some kind of 2.5D or 3D packaging technology. There are quite a few packaging options here.

Partitioning a single, large die into smaller functional die and packaging the resulting chiplets in different combinations offers numerous benefits. At the same time, delivering high bandwidth over a small die edge is challenging from a design and packaging perspective. As a result, beachfront density, or the amount of data you can transfer within a certain die edge, has become a major consideration. Against this backdrop, Cadence recently released the UltraLink D2D PHY IP to provide an optimal solution for die-to-die connectivity requirements and challenges.

Different flavors of a die-to-die solution

There are many approaches to a die-to-die solution. As we traverse a diverse landscape of usage scenarios, the best solution is highly dependent on the end application. In a CPU/GPU environment, it is likely that partitioning will result in chiplets that are identical (such as splitting a multi-core design). In an FPGA environment, the solution will likely require the FPGA to be on a common interposer with high-performance memory such as HBM. In a data center networking environment, partitioning may include the core compute in a more advanced process and an I/O chiplet that is silicon proven in an earlier process technology. Photonics integration will require a die-to-optical engine (D2OE) solution. For applications more in the consumer space such as applications processors or the integration of 4G/5G modems, solutions will likely require logic chips in an advanced CMOS process, RF circuits in fully depleted silicon on insulator (FD-SOI) or gallium arsenide (GaAs), and also passive components to form a module. These applications will require an ecosystem of KGD suppliers and a new supply chain to manage the entire manufacturing process.

The long-term vision of multi-die integration entails that system-in-package (SiP) becomes the new SoC, and chiplets become the new IP. However, for this to be viable, standard/common communication interfaces between the chiplets must exist. The following diagram shows three types of interfaces. First, there is the parallel interface as used in high-performance HBM, which is typically in the 2 to 4GHz range. Because parallel interfaces feature very high pin counts, a silicon interposer is required. If one is not desired due to cost, yield and reliability concerns and a multi-chip module (MCM) on an organic substrate is preferred, users can turn to SerDes interfaces and still achieve high bandwidth. For SerDes interfaces, users have the choice of NRZ signaling and PAM4 signaling. Non-return-to zero (NRZ) coding is suitable for performance in the 20 to 40Gbps range and has the benefits of lower latency and easier routing between chiplets. PAM4 requires forward-error correction (FEC) for error-free links to meet low bit-error rate (BER) requirements, but has the advantage of having an industry standard, i.e., CEI-OIF-XSR 112G standards. Both options offer the ability to use organic substrates as opposed to more expensive silicon interposers.


Figure 4: Pros and cons of the 3 different types of die-to-die connectivity interfaces.

Die-to-die connectivity enables high-performance applications with cost-effective packaging

The Cadence UltraLink D2D PHY IP is a high-performance, low-latency PHY for die-to-die connectivity targeted at the AI/ML, 5G, cloud computing and networking market segments. It is an enabling technology for chiplet and SiP applications, which empower SoC providers to deliver more customized solutions that offer higher performance and yields while also shortening development cycles and reducing costs through greater IP reuse.

The UltraLink D2D PHY IP delivers up to 40Gbps wire speed in an NRZ serial interface, providing up to 1Tbps/mm unidirectional bandwidth. The IP includes built-in de-skew and scrambling/de-scrambling logic to enable easy system integration. Its low wire count of 28 data wires for 1Tbps bandwidth enables easier routing and potentially reduces package cost, whereas alternative solutions can require 30% or more wires. While some existing lower speed die-to-die solutions require a silicon interposer to achieve the same bandwidth, the UltraLink D2D PHY IP offers significant cost advantages by supporting multi-chip modules on organic substrates. This IP features latency as low as 5ns round trip from receiver to transmitter, utilizes standard NRZ coding and achieves better than 10-15 BER without requiring FEC. The UltraLink D2D PHY IP is silicon proven in an advanced 7nm FinFET process at multiple foundries.


Figure 5: Using the Cadence UltraLink D2D PHY IP as a die-to-die interconnect for chiplets on a 2.5D package substrate.

Topology
The UltraLink D2D PHY IP’s top-level design aims were to maximize bandwidth across the edge of the die (beachfront) without having bump pitches so tight as to necessitate expensive silicon interposers, although those can obviously be used if they are motivated by other reasons such as using HBM memory stacks.

Feature summary

  • Line-rate of 20-40Gbps
  • ~500Gbps bidirectional BW in 1mm of beachfront
  • Insertion loss of 8db @ Nyquist (25-40mm)
  • Ultra-low power – a couple pj/bit
  • Ultra-low latency – 5nm round trip
  • DC coupled
  • Forwarded clock raw BER of 1e-15, no FEC
  • Single-ended NRZ signaling with spatial encoding for signal and power integrity
  • Sideband for link management
  • Targets 130u bump pitch for MCM application
  • Also supports micro bump for silicon interposer


Figure 6: Eye diagram from a 7nm die-to-die test chip, operating at 40G chip-to-chip connectivity.

An example of the sort of design enabled by this technology is the 25.6Tbps switch design shown below. This is on an organic substrate, which is less expensive than a silicon interposer. Each chiplet provides 1.2Tbps of bandwidth, so 16 of them provide aggregate bandwidth of 25.6Tbps. The die-to-die interface is used between the chiplets and the switch core itself.


Figure 7: An example of a 25.6Tbps switch using 16 lanes of 112G SerDes.

Summary

There are three types of die-to-die PHY interfaces: parallel I/O, NRZ SerDes and PAM4 SerDes. Depending on the preferred packaging type, power and latency requirements and the importance of standardization, the user should consider the tradeoffs and select the best type of PHY based on one’s application requirements. The Cadence UltraLink D2D PHY IP is a high-bandwidth, low-power, low-latency NRZ SerDes-based PHY that supports high-performance applications using organic substrate. Users can avoid the high cost of silicon interposers by choosing the UltraLink D2D PHY IP.



Leave a Reply


(Note: This name will be displayed publicly)