New 3D packaging technologies provide the ability to combine die from different processes and suppliers.
System integration is increasingly being done using 3D packaging technologies rather than integrating everything onto a huge SoC. One motivation is the ability to not just to split up a design in a single process, but to package die from different processes.
Sometimes there are economic reasons. Several presentations at HOT CHIPS had a partition of the design into the processor itself, and an I/O part of the design. The processor could be manufactured in the most advanced and expensive node, and the I/O in a less advanced and cheaper node (typically, it seemed, one generation behind). The image below is Intel’s Lakefield, with a base I/O die (in a non-leading edge process, I think 14nm), the processor in 10nm, and in-package DRAM on the top. This is all assembled using Intel’s 3D approach that they call Foveros.
The reason for doing this is two-fold. The most obvious is that the I/O interfaces don’t benefit from the more advanced node. And in the modern era, advanced nodes are more expensive per transistor, so the economic push is to hold back, not move the advanced node as aggressively as possible. But there is also a second more subtle reason. All the I/O (and other routine blocks) have already seen silicon, either in production or at least in test chips. If the I/O die is also done in the most advanced process, then test chips for things like high-speed SerDes become part of the critical path to getting the whole system out.
RF and analog benefit even less from being in the advanced node. In fact, not only do they not benefit, it is a positive disadvantage. It is very difficult to design analog circuits in FinFET processes. The reason is that FinFETs are quantized. Transistors have a uniform and fixed length, and the width is an integer number of fins. In planar processes, the analog circuit designer could pick the widths and lengths of the transistors. Often in an analog design, what is most important is the ratio between the sizes of critical transistors. But in FinFET you can’t have two transistors with an arbitrary ratio like that, so no analog design. It makes much more sense to keep analog design back in a planar process like 28nm, or perhaps even a less advanced node such as 65nm where perhaps the design (an ADC say) has already been well characterized and seen high-volume production.
I’m not an RF expert, but I understand that it is next to impossible to design RF in FinFET processes due to the high-capacitance of the FinFET transistors themselves. It’s possible that the high resistance of the interconnect is also an issue for RF.
Another area where it can be attractive to use separate die is for photonics. Even if some of the photonics is on the main die, it is unlikely that the lasers themselves can be. Usually they are InP (indium-phosphide). As it happens, the Intel keynote at Cadence’s recent Photonics Summit was on building two die solutions and then attaching the two wafers face to face. (See my post The Photonics Summit 2019: Hybrid Lasers.)
At HOT CHIPS, Ayar Labs presented their TeraPhy, which is a small optical chip that can be added into the package for an SoC to provide optical connectivity. See the diagram alongside.
Chiplets
So far the assumption in all the discussion about 3D designs with multiple die in the package is that the die are all designed by the same team, or at least the same company, with the exception of DRAMs which always come from specialized DRAM manufacturers. DRAM has to be manufactured at scale to be competitive, and “at scale” means a whole fab at a time.
But there is another possibility, which is that in-package components become available commercially. These are known as chiplets. There are several challenges to this. There are some technical ones, but they are the same as for all the other in-package integration that I’ve already discussed. But there are two further challenges, standardization and market. In fact, Cadence is involved in a program addressing some of this. (See my post ERI: CHIPS and Chiplets.)
If the same team is designing two die that have to go in the same package, they can pretty much choose any communication scheme they choose. But if the chiplets are standard in some sense, for example, a high-speed SerDes chiplet, or a WiFi chiplet, then the SoC has to use whatever interface the chiplet provides. To keep things simple, it is better if the interfaces are well-proven and standard. Inside a package, the distances are short and so it doesn’t make sense to use the same type of long-reach SerDes that would be appropriate to run across a backplane. Another advantage inside a package is that it is relatively cheap to have a lot of connections compared to running through a package onto a board (for example, wide-memory can have thousands of connections instead of trying to get all the data across in eight or nine lanes).
As it happens, Cadence just announced the UltraLink D2D PHY IP and a test chip (or test chiplet) to demonstrate it in silicon. (See my post Die-to-Die Interconnect: The UltraLink D2D PHY IP.) This has our 40Gbps SerDes. It has been designed to be very low power, and also maximize connectivity across the edge of the chiplet (sometimes called beachfront) without requiring expensive manufacturing processes due to very tight pitches.
The dream of proponents of the chiplet approach is that a marketplace for known-good-die chiplets comes into existence, and so just like you can purchase HBM in the open market, you will be able to purchase a wide range of chiplets. Design becomes more like board-level system design, with purchased standard components, and perhaps a single SoC designed as the heart of the system.
I’m a bit skeptical that this will happen, the problems of inventory seem hard to deal with. When I was at VLSI Technology we were always challenged by gate-array base inventory. The promise of a gate-array design is that the bases are all pre-diffused and held in a wafer bank. That worked fine for simple designs in very low volume. It was a hard tradeoff. Any wafer sitting in wafer bank is money tied up and depreciating (and, if a new process generation is coming up, perhaps becoming obsolete). On the other hand, the promise of gate-arrays was that wafer bank would be available, and the turnaround time for an order would be short (in those days, just adding three layers of metal to the banked wafer). And that’s before you consider that we needed a base wafer with various ratios of memory to gate fabric.
But the value proposition would be:
The first couple of bullets are the same for any system-in-package solution. The other three are highest if you can simply buy chiplets from a distributor, but they are also mostly true if the chiplets have to be manufactured especially for the particular system. The promise is that you can design systems like this, a 25.6Tbps switch with 112G SerDes chiplets, as opposed to having to integrate all the SerDes interfaces onto the big core SoC itself.
Thanks for the article. It reminded me of work we performed at SiliconPipe in 2003- 2004. The solution we proposed at the time we called OTT (off the top or over the top) High speed signals were taken directly off the top of the package in copper microstrip circuits originally but we also suggested photonic links should be possible. The lower speed signals power and ground were handled by the substrate. There were many other innovations at SiliconPipe and the IP was ultimately purchased by Samsung. It is a pleasure to see evidence that the ideas have finally been reduced to practice by others after all these years. Growing old has its benefits…