Novel Assembly Approaches For 3D Device Stacks

ECTC progress report on enabling technologies, including cooling chiplets, 1µm hybrid bonding, RDL buildups, and co-packaged optics

popularity

The next big leap in semiconductor packaging will require a slew of new technologies, processes, and materials, but collectively they will enable orders of magnitude improvement in performance that will be essential for the AI age.

Not all of these issues are fully solved, the recent Electronic Components Technology Conference (ECTC) provided a glimpse into the huge leaps in progress that have been made over the past couple years since the rollout of ChatGPT shocked the tech world into overdrive. AMD, TSMC, Samsung, Intel, and many equipment suppliers detailed improvements in hybrid bonding, glass core substrates, cooling by microchannels or direct cooling, and heat removal with backside power schemes.

“The way AI is transforming the supercomputer/high-performance compute space is amazing,” said Sam Naffziger, senior vice president and corporate fellow at AMD, in a presentation on AI compute.  “The ChatGPTs and Geminis have ingested the entire universe of internet data and training the model, but high-quality textual data has been completely consumed. The way AI is getting smarter is through approaches called post-training and test-time compute, or chain-of-thought reasoning, and this is where models check each other, generate synthetic data and iterate on a response, and produce a much more thoughtful outcome.  And though every increment of intelligence is hugely valuable, it takes two or three orders of magnitude more compute for a linear return in intelligence. So the demand for compute is going to continue to escalate, and what that’s done is drive down cost, and this is what our industry is super great at. We refine our manufacturing processes. We generate higher and higher volumes, get the yields up, and costs go down. As this trend continues, innovation in chipmaking — and packaging in particular — will play central roles.”

Among the major developments detailed at ECTC:

  • Intel’s hybrid bonding down to 1µm pitch;
  • TSMC’s direct cooling of CoWoS, including 4 SoCs and 6 HBMs;
  • ITRI/Brewer Science’s 10-layer RDL with polymer/copper hybrid bonding;
  • Georgia Tech’s chiplet-as-coolant with liquid cooling through TSV/silicon columns;
  • Corning/Fraunhofer IZM’s glass waveguides for optical transceivers;
  • Samsung’s copper-based heat block for mobile processors and DRAM, and
  • Imec’s 3D multi-chip simulation of heat flux with hot spots.

Liquid cooling for hot chips
Liquid cooling at the chip level is beginning to take shape as forced air cooling reaches its limit. “We’re managing to cool up to 1,000-watt devices with high-speed fans, and fan power consumes approximately 20% of the server rack budgets while I2R losses account for 10% to 20%,” said Naffziger. “So now we’ve got 40% of our power just for delivering current and extracting heat. That’s obviously not the way to build an efficient compute system. That’s what’s driving the relentless move to direct liquid cooling, which has some power overhead from pumps and condensers, but dramatically less than high speed fans with huge heat sinks.”

At the conference, TSMC’s Yu-Jen Lien described a liquid cooling architecture, called silicon-integrated micro-cooler (IMEC-Si), which is being tested for reliability using a 1.6X reticle-sized test vehicle on organic interposer (CoWoS-R). Designed to mimic a 4-SoC, 8-HBM package, the cooler can dissipate more than 3,000 watts of uniform power using 40°C water flowing at 10 liter/min. This liquid cooling approach provides superior cooling (to 2.5 W/mm2 power density) relative to indirect liquid cold plate with thermal interface materials schemes, according to Lien.


Fig. 1: CoWoS with direct liquid cooling using 10 L/min water (below) dissipates more heat than CoWoS with TIMs, lid, and cold plate configuration. Source: IEEE ECTC [1]

TSMC’s assembly flow applies a protective layer to cover the copper pillar arrays on the SoCs’ backsides. The assembly was flipped onto a carrier wafer followed by C4 bumping. After flipping and protective layer removal the conventional CoWoS flow was followed by elastomer sealant dispensing around the perimeter of the SoCs. The sealant minimizes warpage and seals the chip-to-lid area. “After reflow, a manifold with a single inlet and outlet designed for uniform flow distribution across multiple cooling compartments was assembled onto the integrated system.” [1]

TSMC’s 3.3X reticle test vehicle with 4 SoCs and 6 HBM chips experienced a warpage range of 160-190µm, which causes changes in the flow rate and profile between the lid and SoC chips. The package passed helium leak testing and early reliability tests.

The need for direct chip cooling is so pressing that Georgia Tech is proposing a novel concept, the chiplet as coolant. “Imagine we design chiplets that become part of the off-the-shelf, open-source community, that are designed with different cooling capabilities, let’s say with different diameters, pitches, but also with different TSV designs,” said Muhannad Bakir, director of the 3D Package Integration Center at Georgia Institute of Technology. “We can build unique TSV structures, unique cooling structures, as well as unique other functions into that structure to help both thermal and power delivery. And so it really just becomes a hybrid bonded solution in the stack.” Bakir’s group demonstrated microfin pin heat sinks made of silicon with 5nm TSVs (see figure 2) that can cool >300W/cm2. [2]


Fig. 2: Microfluidic cooling includes silicon heat sinks with through-silicon vias for chip-chip connectivity. Source: IEEE ECTC [2]

Another approach to cooling puts a heating block atop application processors in Samsung’s novel architecture for mobile applications (see figure 3). [3] Kyung Don Mun and colleagues explored an asymmetrical memory and processor structure that provides design flexibility for placement of the processor, memory, and a copper-based heat path block.


Fig. 3: Switching from a symmetric memory on logic structure (left) to an asymmetric structure with a copper heat path block over the processor (right) improves heat removal for 2nm gate-all-around logic devices with a backside power delivery network. Source: IEEE ECTC [3]

The 2nm gate-all-around transistor structure of the application processor with backside power delivery network required a 20% improvement in thermal dissipation through the block. Samsung used Ansys’ finite element model to identify high risk areas and simulate warpage. “RDL pattern design optimization is especially critical for such heterogeneous package design, as thin RDL’s can be vulnerable to thermo-mechanical stress concentration and crack failures,” said Mun. Selecting redistribution layers with wider pattern width and longer pattern length reduced warpage. The molding material, two-sided RDL, and thermal interface materials were further improved for greater thermal conductivity and heat removal.

Hybrid bonding
Fine-pitch multi-layer organic redistribution layers (RDLs) are gaining traction as a viable alternative to silicon interposers and laminate substrates. This shift is driven by the RDL’s ability to offer high speed interconnects at a low cost. The Industrial Technology Research Institute (ITRI) and Brewer Science demonstrated five-layer stack build-ups of a polymer/copper RDL followed by copper-copper hybrid bonding, which targets high I/O, low return loss and low insertion loss in targets high-speed digital applications. [4]


Fig. 4: Redistribution layers of polymer/copper build up were followed by Cu-Cu hybrid bonding with controlled warpage. Source: IEEE ECTC [4] 

Following line/space RDL build up (4 to 10µm L/S) on a glass carrier wafer, the low-k polymer (2.5) was patterned using negative-tone photoresist and i-line exposure, followed by etching, pad filling with titanium barrier and copper, then planarization by copper CMP. Hybrid bonding used thermocompression bonding at 300°C (1.06 MPa), followed by carrier wafer debonding by UV laser. The polymer properties of low modulus, high thermal stability and low moisture absorption are designed to contribute to low warpage in multi-layer RDL stacks.

Pitch scaling of hybrid bonding using traditional dielectrics (SiO2-based/copper) has been scaling in recent years from 10µm (in manufacturing) to 1µm (in R&D). Adel Elsherbini, senior principal engineer at Intel, and his colleagues discussed some of the capabilities needed to enable such scaling. [5]


Fig. 5: Intel’s research results using hybrid bonding. Source: IEEE ECTC [5]

Their paper states that that system architecture generally determines whether wafer-to-wafer (W2W) or die-to-wafer (D2W) bonding is chosen. Wafer-to-wafer bonding’s key limitation is it requires same-size chiplet bonding. The technology is more mature and enables finer pitch. There are no size limits with die-to-wafer bonding, and only known-good dies are used. “For C2W applications, as HB pitch continues to scale to 1μm and beyond, the placement accuracy requirement pushes the limit of the current generation chip bonders. To ensure electrical continuity, the entire chiplet area needs to reach the same level of accuracy down to tens of nanometers. Intra-die accuracy control, similar to runout and distortion control of W2W bonding, becomes increasingly important,” according to the authors.

“Traditional placement accuracy criteria, such as chiplet-center or worst-corner misalignment, is no longer sufficient. D2W process control is getting more sophisticated during bonding to focus more on warpage control, die shaping and bond wave propagation control at every chiplet level. On the other hand, to quantify the intra-die bonding accuracy, new alignment mark strategy and better post bond accuracy measurement metrology are needed to understand the chiplet level distortion behavior from die-prep all the way to bonding.” The authors noted that infrared debonding enables the reuse of silicon carrier wafers for lower cost of ownership.

Removing heat with backside power
Backside power delivery is a novel interconnect scheme that builds a power delivery network on the wafer backside to dramatically reduce the voltage droop associated with power delivery to transistors. The interconnects on the topside are free to carry signals only, delivering a host of electrical benefits.

However, this new approach exacerbates hot spot problems relative to the standard interconnect stack (see figure 6). “If you look at it to the relative view of the front side, all the heat that’s generated in the transistors just goes straight into the silicon to the heat sink or cold plate,” says Dureseti Chidambarrao, senior technical staff member at IBM Research. “But there’s this additional unfortunate scenario where you love the backside power  — because you separated the power from the signal, so it’s a less complex way of making this — but we now have this challenge of trying to get the heat out of this kind of stack because the hot spikes and the hot circuits get trapped.”


Fig. 6: Backside power induces new heat flow patterns because the active devices are sandwiched between metal stacks. Source: Laura Peters/Semiconductor Engineering

IBM developed an anisotropic model to accurately calculate the heat transfer through the back-end-of-line stack that takes into account the material properties. This AI model ties the design to local power densities, workloads, and material properties in the interconnect stack. “You take the GDS file and it actually calculates together the average properties at multiple levels and multiple layers, such that you can get the right average properties [for heat transfer] at every given location. Now you have a way to calculate for every tile, and you can refine it further and further,” said Chidambarrao.

The importance of including such thermal considerations at the design phase cannot be overstated. “The package and chip are interacting and have become very tightly coupled, so it is a complete system technology optimization problem where you have to worry about thermal in design,” he said. “It has to happen in particular for backside power, and I’m not even imagining the worst thing — to put a backside power on a 3D chip. If that’s what you want to do, then solutions are obviously much more stringent.”

Backside power delivery already is being designed into chips. “We expect to see backside power first implemented in products next year,” said Herman Oprins, principal member of technical staff and R&D team leader of thermal modeling and characterization at imec. “Though backside power started as a passive structure, further on this also will be used to include the signal clock and other functionality. There are different approaches, but the essential part is that you need to connect the front side with the back side through nanoTSVs.”

NanoTSVs require the silicon be thinned to at least 300nm, and possibly to less than  100nm. In addition, detailed modeling is needed to comprehend the cooling needs for such devices.

“If you have a localized hot spot and ultra-thin silicon, your temperature will actually go up because you have less electrical volume (silicon) to spread it out, Oprins said. “On the other hand, you have the presence of the backside metal stack, so this dense metal array might help the heat spreading of a device.”

Imec previously showed that implementing backside power imposes a 10% to 30% thermal penalty (ECTC 2024). This year, Oprins’ group simulated the thermal effect of stacking logic on memory, or memory on logic, with BSPDN. Those simulations included both face-to-face hybrid bonding of chips, and back-to-face bonding, using a combination of the Boltzmann Transport Equation and Monte Carlo simulation. The simulations illustrate the difference in temperature increase between uniform chip heating and the impact of hot spots (see figure 5).


Fig. 7: Increase in temperature of devices with uniform device heating (left) versus with additional hot spots. Source: IEEE ECTC [7]

“The order of the logic and memory die in the stack presents a larger impact on the thermal performance,” said Oprins. “Logic-on-top results in a lower logic temperature due to its proximity with the cooling, but higher memory temperature due to high thermal coupling in the stack.” Multiple tiers of memory-on-logic showed that the thermal impact of BSPDN is reduced multiple die stacking. In this case, the logic on top configuration is thermally limited by the memory die temperature, while for the memory-on-top configuration, the logic temperature is the limiting factor. “Higher efficient cooling showed a significant improvement on the thermal performance of 3D SoC BSPDN enabling the logic-on-memory configuration with calibrated power dissipation,” the paper summarized. [7]

Oprins emphasized how critical liquid cooling is becoming. “Looking at the 3d architecture, if you were to stack five chips each, let’s say dissipating 100 watts and use conventional air cooling, then what you end up with is the maximum junction temperature much larger than 500°C, said Oprins. “If you integrate a cold plate the maximum junction temperatures are the order of 250°C. “However, somehow you can develop interlayer cooling within the stack, then all of a sudden there’s an opportunity to really bring down the temperatures to 48°C,” according to imec’s 3D stack simulations.

Co-packaged optics
The industry is undergoing a dramatic increase in demand for faster data network and device interface speeds. A key enabler inside data center racks brings the optical engine into the same package with GPUs and HBMs. “With co-packaged optics (CPO), we have the chance to integrate the electrical interconnections together with optical connections in a single package,” said CP Hung, an ASE fellow. “That is a new milestone for the industry. By moving the optical engine much closer to the processor, we are going from 200 Gb/s per fiber to 6.4Tb/s, a 32X increase in bandwidth.”

Despite CPO’s promise, unknowns remain. “CPO definitely will happen, and the momentum is definitely pushing it to happen sooner rather than later,” said Mark Gerber, senior director of engineering, marketing, and technical promotion at ASE. “With CPO there are sensitivities on the thermal side as well as the warpage side. Importantly, the industry would like to maintain the pluggable (ie., replaceable) aspect of optical engines that exist today. But while pluggables are easy to switch out, they are not an easy to master.”

At ECTC, ASE demonstrated its modular platforms for ASIC switches and ethernet/HBM co-packaged optics platforms.

Thermal simulation also is playing a pivotal role in selecting the architecture, processes, and materials of the heat dissipation stack in advanced packaging. “Historically, with monolithic chip integration, thermal simulation of the package design and heat sink was performed on a go/no-go basis,” said Tom Nordstog, staff engineer for thermal simulations at Amkor Technology. With multi-chiplet packaging, simulation is playing a more pronounced role earlier in the package design. “Thermal simulation is a risk/reward exercise performed to select the ultimate design. Ideally, thermal design of the package occurs before the chip design is set. We see the most aggressive customers incorporating thermal simulation at these early stages.”

Corning and Fraunhofer IZM proposed a scalable “planar 2D waveguide circuit that can reduce the required space, complexity, and cost for future generations of CPO solutions by reducing the need for fiber cable termination and manual assembly,” according to Lars Brusberg, Corning Research & Development. [8] Using a 460 x 303mm fusion-formed glass panel, the group fabricated single-mode board level interconnects with a waveguide layout designed to meet the optical interconnect requirements of connecting 1024 optical links from the faceplate to the CPO module for 102.4 Tb/s data center switch applications. Fraunhofer IZM engineers designed the process flow (see figure 8), which includes a thermal ion exchange process to integrate single-mode waveguides into the glass, matching the optical mode of a single-mode fiber at 1310nm wavelength. Once the mask was removed, a second reverse ion-exchange process step was performed to bury the core of the waveguides below the glass surface to reducing propagation loss.

“For fiber connectivity to the glass waveguide panel, MPO-16 adapters were assembled, and the glass waveguide circuit was integrated into a 1U rack chassis to demonstrate the low-profile of only 0.7 mm,” Brusberg said. This novel approach could pave the way for PCB-based optical transceivers.


Fig. 8: Process flow includes metal deposition, photoresist coating, waveguide imaging, and ion exchange to diffuse silver into the pattern. Source: IEEE ECTC [8]

References

  1. Y-J Lien, “Direct-to-Silicon Liquid Cooling Integrated on CoWoS Platform,” IEEE Electronic Components and Technology Conference, May 2025, in press.
  2. Yan et al., “Toward TSV-Compatible Microfluidic Cooling for 3D ICs,” in IEEE Transactions on Components, Packaging and Manufacturing Technology, vol. 15, no. 1, pp. 104-1,12, Jan. 2025, doi: 10.1109/TCPMT.2024.3516653.
  3. D. Mun et al., “A Novel Architecture for On-Device AI in Mobile Application with Enhanced Heat Dissipation,” IEEE Electronic Components and Technology Conference, May 2025, in press.
  4. -H. Lee et al., “Hierarchical Multi-layer and Stacking Vias with Novel Structure by Transferrable Cu/polymer Hybrid Bonding for High speed Digital Applications,” IEEE Electronic Components and Technology Conference, May 2025, in press
  5. Chang et al., “Thermal Modeling and Analysis of Equivalent Thermal Properties for Advanced BEOL Stacks,” in IEEE Transactions on Components, Packaging and Manufacturing Technology, doi: 10.1109/TCPMT.2025.3564833.
  6. Elsherbini et al., “Mid-BEOL Heterogeneous Integration Through Sub 1um Pitch Hybrid Bonding & Advanced Silicon Carrier Technologies for AI & Compute Applications,” IEEE Electronic Components and Technology Conference, May 2025, in press.
  7. R. Chowdhury, “Fast And Accurate Machine Learning Prediction of Back-End-Of-Line Thermal Resistances in Backside Power Delivery and Chiplet Architectures,” IEEE Electronic Components and Technology Conference, May 2025, in press.
  8. Brusberg, “Large-scale Glass Waveguide Circuit for Board-level Optical Interconnects between Faceplate and Co-packaged Optical Transceivers,” IEEE Electronic Components and Technology Conference, May 2025, in press.

Related Stories

Co-Packaged Optics Reaches Power Efficiency Tipping Point

3.5D: The Great Compromise

Hybrid Bonding Makes Strides Toward Manufacturability



Leave a Reply


(Note: This name will be displayed publicly)