Die-To-Die Chiplet Communication

Deciding what type of high-speed interface to use when connecting chiplets.


At CadenceLIVE Americas 2020, one of the most viewed videos was by Samsung Foundry’s Kevin Yee and Cadence’s Tom Wong, titled “Let’s Talk About Chips (Chiplets), Baby…It’s All About D2D!”

They went for this title because it reminded them of the lyrics of an ’80s song…which they proceeded to sing.

Process and packaging trends

Tom led off with a look at the trends in semiconductor economics that are driving the transition to chiplets:

  • Silicon economics
    • Advanced nodes are costly
    • Advanced SoCs trend towards larger die size
    • Chip disaggregation allows us to stay on Moore’s Law
  • Reticle size limitations
    • Most advanced applications pushing die size limits
  • Process scaling continues but costs continue to increase
    • Increasing die sizes are increasingly problematic

The graph on the right shows the cost per square mm as we ride down the process node roadmap. Of course, the costs have gone up, that is nothing new. The rule of thumb used to be that the costs would go up by 15% from node to node (for the same area) but you would get twice as many transistors in that area, leaving you a cost reduction per transistor of about 35%. In recent nodes, the costs have gone up faster than that and the area scaling has gone down more slowly, so that the cost reduction per transistor is, depending on who you talk to, small or even negative (more expensive). This is the reality of Moore’s Law. This phenomenon has led chip designers to adopt a disaggregated approach to advanced SoC designs to address this issue. By moving to a multi-die architecture and using advanced 2.5D packaging, you get the benefits of a smaller die size and the corresponding benefits of better yield at these advanced-process geometries. Many refer to this as More than Moore.

The graph on the left shows how CPUs, GPUs, Ethernet switches, and so on are all getting larger. In fact, once you go to multi-core, the limitation on how many cores you get is pretty much the maximum number you can fit on a reticle. At least, it would be if yield was linear.

As I noticed myself at HOT CHIPS a year-and-a-half ago, we seem to be witnessing the end of monolithic integration for these very advanced chips. See my post HOT CHIPS: Chipletifying Designs with examples from AMD, Intel, NVIDIA, HP Enterprise, and more.


The combination of reticle limitations and yield challenges makes the chiplet approach attractive. But how do these chiplets communicate with each other? These are all high-speed chips needing high bandwidth communication. There are basically two approaches: a serial interface, and a parallel interface. The state of the art for serial interfaces is 112G USR/XSR (ultra-short reach, extra-short reach); for parallel is HBI (an offshoot of HBM) or BoW (bunch of wires). You can compare them looking at the table.

Kevin then took over to look into how you make a decision about what types of interfaces to choose. Of course, there are more details than just serial versus parallel, but that is the biggest decision. The big considerations are overall bandwidth, energy, latency, and the shoreline bandwidth (across the edge of the chiplet). Another big consideration is whether it is a closed system: are you building both die, both sides of the communication link. Or is another group, or even another company, building one of the die. In the latter case, you pretty much have to use some standardized interface, not something proprietary. This all has major implications for overall system parameters, such as what process node, what packaging, what type of interposer, and so on.

Samsung and Cadence have worked together on the 40G D2D on Samsung 5LPE process. On the left, you can see how everything is implemented. The right shows the eye diagram and a photo of the actual test-chip on its evaluation board.


Cadence IP enablement on Samsung foundry processes is broader than just 40G UltraLink D2D communications in 5nm. Cadence provides advanced memory IP and high-speed SerDes IP in various nodes.

Kevin wrapped up with a final summary:

  • Better yield due to smaller die size
  • Volume cost advantage when the same chiplet(s) are used in many designs
    • Design reuse
    • Multi-core designs
  • Flexibility in picking the best process node for the end product
    • SerDes I/O and analog do not need to be on the “core” process node
  • Shortened IC design cycle time and reduced integration complexity by using pre-existing chiplets
  • Lower manufacturing costs by purchasing known-good die (KGD), if available

Leave a Reply

(Note: This name will be displayed publicly)