One Chip Vs. Many Chiplets

Challenges and options vary widely depending on markets, workloads, and economics.

popularity

Experts at the Table: Semiconductor Engineering sat down to discuss the growing list of challenges at advanced nodes and in advanced packages, with Jamie Schaeffer, vice president of product management at GlobalFoundries; Dechao Guo, director of advanced logic technology R&D at IBM; Dave Thompson, vice president at Intel; Mustafa Badaroglu, principal engineer at Qualcomm; and Thomas Ponnuswamy, managing director at Lam Research. This discussion was held in front of a live audience at SEMICON West. To view part one, click here.


L-R: GlobalFoundries’ Schaeffer; IBM’s Guo; Intel’s Thompson; Qualcomm’s Badaroglu; Lam Research’s Ponnuswamy. Photo credit: Heidi Hoffman/SEMI

SE: Roughly 45% of the design starts for advanced chips today are done by large systems companies. They have very deep pockets, allowing them to develop customized chips and chiplets to better manage heat, reduce area, and to process specific data types faster than off-the-shelf processors. But the economics are very different for a Qualcomm chip going into a smart phone or a car. So how does that affect whether you go with planar SoCs or some type of advanced package?

Badaroglu: If you look at a data center, the cost of ownership formula is very different. If you introduce an advanced substrate, that can reduce your picjoules/bit. So at a system scale, you pay a little bit more for the unit component, but you have significant cost of ownership reduction. On the consumer side, you need to sell one chip to one consumer. The economies of scale are based on how you commoditize memory talking to advanced logic, and then improve some packaging technologies. For instance, rather than reducing the amount of silicon, you may introduce better I/O bandwidth, and then at the end you provide the same value to the consumer. In the mobile space, packaging and thermal are the most important. In the data center, it’s mostly advanced packaging and improving the system memory bandwidth. There, new materials are becoming a compelling solution, particularly for handling the thermal-limited performance metrics.

Thompson: The economics have changed dramatically. If you look at data centers today, most of these are zoned in a finite power input envelope. The economics are fundamentally very different when we’re delivering these efficiency benefits, because we’re opening up revenue capacity for the hyperscalers.

SE: There are a lot of challenges and options to sort through. There are new mechanical stresses, and there are new issues such as dishing with CMP at advanced nodes. How significant are these?

Ponnuswamy: Warpage has become a huge issue. The root cause is that we have dissimilar materials in the packages. Different CTEs cause stress, which results in warpage. As an equipment supplier, we have to compensate for the stress in the package. We call it shape management. Essentially, we deposit a film on the backside that compensates for the stress experienced in the package. Another issue that we see involves thin films. For example, with backside power delivery, we’re going down to really thin dimensions — less than 0.5m. In that scenario, yield at the edge becomes critical. And how do you handle substrates that are very thin? There are technologies we’re using to better position the wafer so the edge field is protected. And with TSVs, you have to engineer the copper in those TSVs to mitigate the stress.

Thompson: It boils down to lots of hard work and lots of engineering. You can play a lot of tricks with backside stress management, treating the bevel of the wafer, but these are trials and tribulations that lead to many sleepless nights. There is a gamut of different technologies that are deployed to make something like this work. I will add one comment, though. When talking about layer transfers of, let’s say, a (110) for material, or doing something on the backside, whenever you’re bringing wafers together the focus on defectivity becomes huge. A 1m adder potentially will translate into square millimeters of bad bonding surface as you bring these together. A lot of these technologies require significantly more work to be ready for prime time.

Schaeffer: One area where we have mastered this concern is the CMOS energy space. With backside power delivery, they have to manage the warpage to get the bonding right in order to not degrade the overall yield of the system. When you’re processing the logic die, you have to be aware of what you’re doing on the backside of the wafer throughout the processing and what impact it has.

Guo: IBM used to manufacture DRAM. The compensation techniques developed there, like using dielectrics to compensate for voltage, probably could still be useful for backside power delivery to streamline work or even for stacking transistors. That also could be an opportunity to avoid non-uniform structures or to compensate for them. Another option is die-to-die with overlay control, which used to be considered unmanageable. So in addition to process technology, you have the capability to do corrections in dielectrics, which is actually being utilized today. There are a lot of opportunities out there.

Badaroglu: For the design process, we’re seeing a convergence of PDKs and ADKs (assembly design kits) in order to make stress a deterministic problem rather than a random phenomenon. But to make it really useful, there needs to be more collaboration for that convergence. That has been applied in the memory for fault-tolerant design. If you eliminate a lot of the shared layout structure and create independent structures, repair and redundancy can generate multiple dppm lanes in the design, but there is some discipline necessary in the design process to accommodate that.

SE: Is a full 3D-IC realistic, or is it just too hard to deal with the thermal issues?

Thompson: Those are solvable issues, but they require a recalibration in the types of materials we’re bring in. You can put a copper chimney into your interconnects to deliberately pull heat out of the layer. That’s a common trick. But as we look at backside power, we need to engineer other materials to dissipate the heat and transfer it beyond the wiring, which could give you substantial benefits for heat dissipation. As you look at your ability to take heat out, it can help you de-throttle performance. It’s not just the heat transfer issue. It also helps you unlock performance.

Badaroglu: They help each other. 3D technology brings in higher bandwidth, meaning highly parallel, which allows you to reduce VDD and enable low-frequency operation. A plot of peak-to-average ratio of heat dissipation is improving. You have more uniform power, and that’s enabling an opportunity for 3D to come up with highly parallel structure, which are still heat, for 3D-IC integration.

Schaeffer: With FD-SOI substrates, thermal dissipation has been a manageable problem. At 20nm, the thermal conductivity has been sufficient for the applications for which it is required. But there are different transistors being used for different applications. FD-SOI is actually the lowest-power option for RF applications. So if you’re optimizing the PAE (power-added efficiency) in an RF PA, an FD-SOI transistor is the most optimal for that. For digital applications, a finFET or gate-all-around FET is the most efficient operation. And then there’s a whole system-level way, where we can manage the power aspects of the chips.

SE: There’s also water cooling, which IBM has been using for about seven decades. But where are we now? Are these giant pipes effective enough, or are we getting to the point where we’re going to see things like micro-fluidics or immersion?

Guo: It depends on the application. For IBM mainframes, water cooling is the chief mode for a package, but you cannot use that in your pocket or on a mobile phone or iPad. In those cases, the logic on a chip will benefit more from lower power in high-performance applications. But for high performance, the operating voltage and the operating power are at a different level, so the heat map may be very different from a mobile application. However, the material process innovation for the heat dissipation layer is applicable to both, and the innovations we’re talking about are applicable to both. System technology co-optimization is not only for power/performance, but also for thermal distribution. So backside power in mobile is used for density and cost, while in a high-performance application you’re really looking for performance, not so much the cost or density. But they have a common interest in reducing heat.

Badaroglu: We apply several techniques. For AI, there are a lot of thermal sensors distributed in the die. When you detect a thermal issue, you divert the workload to somewhere else. So you have an area with processors, and you use DVFS (dynamic voltage and frequency scaling), which is a standard technique, versus going parallel and reducing frequency to spread the same workload to multiple processors. In addition, you define some thermal envelopes. You have lookup tables, and then based on the thermal envelope, you know that your GPU, for example, is consuming less. And so instead of giving more of that power envelope to the CPU, everything is defined by the TDP (thermal design power), which determines the thermal allowed to do work as well as the cooling feedback that allows the processing. So at a low level, you are aware of what your thermal surroundings are at the system level, and you are able throttle down and throttle up the system quite frequently and dynamically.

SE: Regardless of whether you’re manufacturing monolithic chips or systems-in-package, the processes themselves add heat and can cause damage. How much of a problem is this?

Ponnuswamy: You’re dealing with a range of applications with different requirements, so optimizing your processes for each, whether it’s on the front end of line or the package, the front end of line is very high temperatures. And when you go to packaging, for instance with HBM, we’ve got to develop processes that happen at much, much lower temperatures, particularly for hybrid bonding for DRAM packages. This certainly is a challenge. We are constantly looking at engineering solutions. What is it that we can bring to the table, from a materials perspective, to address all of these? We have to work very closely with our customers. And this is the basic theme, right? Collaboration is critical. It will require a collaborative mindset to solve these issues.



1 comments

Dr. Dev Gupta says:

Ponnuswamy from Lam said something intelligent about warpage control by selective deposition of films. This was used at Intel way back in 2001 for Organic substrates to build Server (Xeon) MCMs with flip-chip-bonded bare dies of both the Processor & DRAM around it. Warpage was caused by dies of 2 different sizes (large 20 mm sq processor, smaller 8 mm sq DRAM dies) imposing unequal rigidity on the substrate, inducing warpage due to auto compensation in the substrate to reduce residual stress by warpage, which in turn affected co-planarity and assembly yields. To get it right almost the first time, a lot of advanced math was required, for mathematical co-optimization of both thermo-mechanical stresses (warpage ) and electrical performance.

Leave a Reply


(Note: This name will be displayed publicly)