Chiplets: Where Are We Today?

The use of chiplet-based designs is expected to expand beyond high-performance parts to the broader market.

popularity

The 3rd annual Chiplet Summit was held in Santa Clara from January 21st to 23rd at the Convention Center. The conference continues to grow from its 1st year when it was held at the San Jose Doubletree Hotel (almost exactly 2 years ago). During his Chairman’s Welcome presentation, Chuck Sobey mentioned that there were 41 exhibitors at this year’s conference.

Chuck was also the moderator for the opening plenary session Chiplets: Where We Are Today. The speakers were Jim Handy of Objective Analysis and Jawad Nasrullah from Palo Alto Electron.

Jim Handy’s presentation was titled “The Chiplet Market Today and Where We’re Headed.” While people are familiar with Gordon Moore’s Law about the scaling of semiconductor integrated circuits made in his paper, “Cramming Components onto Integrated Circuits1,” a lesser-known statement made in that very same paper claims, “It may prove to be more economical to build large systems out of smaller functions, which are separately packaged and interconnected.” So, the creator of Moore’s Law saw the possibility of a future with chiplet-based designs 60 years ago.

Gordon Moore reviewed his paper in 19752 and wrote that what allowed the number of transistors to continually increase was: process, die size, and cleverness. Jim mentioned that one of Gordon Moore’s projections was wafer sizes going to 57 inches in diameter. In an interview with Moira Gunn at the 2007 Intel Developers Forum (IDF) 3, Gordon Moore stated that he used it as an example of how sometimes extrapolating out exponentials can give crazy answers. He also said that he never thought at the time that we would even see 300 mm wafers. There was an industry push about a dozen years ago to move to 450 mm foundries, but with the already increasing costs for building state-of-the-art fabs, the projected time to recover the crossover costs, plus the shrinking number of companies participating in leading-edge fabrication, the incentive to move to 450 mm wafers evaporated. Jim also mentioned that EUV machines are now selling for over $100M each and that we are probably stuck with ~800 mm2 (roughly 26mm x 33mm) die sizes for the foreseeable future. This is proving to be an important constraint when building large systems.

Jim mentioned AMD/Xilinx Versal chiplets with HBM and that rectangular chiplets have been used for over a decade. I introduced Xilinx’s CTO, Ivo Bolsens, at Mentor Graphics U2U in April of 2011 where he gave a keynote address and presented Xilinx’s 2.5D FPGA implementation using an interposer to stitch together FPGA slices. This enabled Xilinx to get large FPGAs to market sooner, with better yields, and as processes matured to shift production towards large chips without the additional packaging requirements. Figure 1 shows a number of AMD/Xilinx parts using multiple die since then.

Fig. 1: AMD/Xilinx chiplet-based parts.

Economics really drives the adoption of any new technology. Jim said that the huge demand for computing for AI applications has HBM selling for ~6x the price of DDR5, and production is sold out through 2025 in a fast-growing market. He also mentioned that Micron’s Hybrid Memory Cube (HMC) is a predecessor to HBM and mentioned the use of HBM in many of today’s high-performance parts like Nvidia’s Grace Ampere, Intel’s Ponte Vecchio and Gaudi, Sambanova’s AI Processor, and Juniper’s Express Silicon, among others.

These high-end parts mentioned above are all >= $10K. So, what about the low end? Clearly, chiplets are here now. The question is whether an ecosystem can emerge that allows more players to participate in chiplet-based designs beyond only the inclusion of HBM. Jim presented an example pf a very low-end application in a Humane Society Pen with heterogeneous integration, leaving a really big gap in the application space to be filled over time.

Fig. 2: Low-cost heterogeneous integration in a pen.

Another economic factor favoring a chiplet approach is that all the circuitry on an SoC doesn’t scale at the same rate as technology nodes advance. SRAM and analog circuitry are not scaling with the logic, and therefore, the cost of these components greatly increases in more advanced nodes. Figure 3 below shows a representation of how SRAM isn’t scaling with the logic anymore. The red line represents SRAM and logic scaling in sync, and the black line is closer to reality.

Fig. 3: SRAM isn’t scaling with logic.

With the introduction of hybrid bonding, there’s a lot of connectivity available between chips. Even if the SRAM was implemented in the same node as the logic, there are already connectivity benefits of going up instead of out. AMD has implemented processors in TSMC 5nm and then placed TSMC 7nm SRAM cache on top of the processor chips, giving them an advantage in both cost and connectivity.

Jim believes that the market and technology are coming together to support forecasting strong growth in chiplet revenue in the next five years, as shown below in figure 4.

Fig. 4: Jim Handy’s chiplet revenue forecast through 2030.

Jim expects hyperscalers to cut back on spending to about a historical 12% CapEx/Revenues level from recent highs of around 17%, but chiplets will move into lower applications and chiplet revenue will continue to grow.

Jawad Nasrullah’s presentation titled “A Lot of Chip Designing to Do: Chiplets to Systems” covered the use of chiplets for AI and the progression for building a “chiplet chassis.”

Figure 5 below shows a diagram representative of Nvidia’s Grace Blackwell, a part that includes two reticle-limited die and eight HBM3e stacks of 8-high 24 GB RAM. The two reticle-limited die operate as one unified CUDA GPU. This is an example of chiplets enabling the creation of systems that are effectively larger than the limiting reticle size.

Fig. 5: Diagram representative of NVIDIA’s Grace Blackwell.

There’s also a push to liquid systems, as we looked at in Liquid Cooling, Meeting The Demands Of AI Data Centers. Figure 6 shows the comparison between an air-cooled 32 GPUs system that can handle 100B parameter models and a water-cooled 72 GPUs system that can handle trillion parameter models.

Jawad said he never thought that he would see > 400 W chips in his lifetime. The increased cooling capability enables a higher density of computing and power goes from ~50kW per rack to ~120kW per rack and higher. Chiplets enable higher performance parts that also require cooling systems to efficiently remove more heat.

Fig. 6: Air-cooled vs. liquid-cooled systems.

Presently, most chiplet designs are done in-house and performed by large companies. Parts like Nvidia’s Grace Blackwell open the possibility of designing one chiplet and creating many variants. Grace Blackwell uses TSMC’s CoWoS-L to connect its chiplets. AMD is also shipping variations based on its chiplet designs.

Jawad said that a chiplet chassis could be offered as a backend service, creating chiplet sockets. Figure 7 below shows a flow diagram to aid in deciding a best-fit technology for an application. The chiplet chassis is composed of a base die mounted to an interposer sitting on a substrate. The interposer is shown also connected to HBM memories. This setup would enable 3rd parties to design application-specific XPUs to plug into the chassis, thus maximizing reusability of the HBM connectivity and packaging for the 3rd party chiplet designer and enabling them to focus more of their resources solely on the XPU portion of the design. Along these lines, Arm also announced their Chiplet System Architecture to help accelerate the creation of a chiplet ecosystem.

Fig. 7: Technology selection, chiplet chassis vs. all custom.

Jawad claimed 30% savings by switching to liquid cooling over air cooling and perhaps even more savings by using large wafer-scale interposers. Figure 8 shows wafer-scale interposers in light grey populated by dark grey chiplets and green substrate assemblies that could be stacked to implement a large computer system at ~100kW per box.

Fig. 8: Even higher compute density in a box.

Jawad indicated that he believed there were huge benefits to be had in reducing supply voltage levels and said that while some designs are happening around 0.6V, by and large high-performance application voltages have been stuck at about 1V. For 1kW+ parts, the resulting 1000 Amperes is not easy to handle. He encouraged people to go after the transistor and reduce Vdd.

References

  1. Gordon E. Moore, “Cramming More Components onto Integrated Circuits,” Electronics, pp. 114–117, April 19, 1965.
  2. G. E. Moore, “Progress in Digital Integrated Electronics.” Technical Digest 1975. International Electron Devices Meeting, IEEE, 1975, pp. 11-13.
  3. Fireside Chat with Gordon Moore, Live at IDF: Part 2, Connected Social Media, September 19, 2007 https://connectedsocialmedia.com/1017/fireside-chat-with-gordon-moore-live-at-idf-part-2/


Leave a Reply


(Note: This name will be displayed publicly)