How To Build Billions of Bumps

Hybrid bonding permits unprecedented connection density.

popularity

Key Takeaways:

  • Hybrid bonding can result in a package containing billions (and eventually trillions) of connections.
  • Building that many connections successfully requires extreme process uniformity across a wafer.
  • Inspection isn’t practical, and test benefits from internal test mechanisms.

Hybrid bonding allows unprecedented signal pitch, but fully populating dies and interposers with just 1µm between connections can enable billions of connections per chip. With that many, it’s no longer possible to inspect or test each connection individually.

Managing yield for such a process requires process uniformity and predictability, as well as architectures that provide modularity, testability, and redundancy.

“Hybrid bonding enables unprecedented interconnect density,” said Lakshmi Jain, director of product marketing for I/O library IP at Synopsys. “A full-size interposer tiled with chiplets and bonded at 1µm pitch can easily result in billions of internal connections. At this scale, manufacturing success is no longer about managing individual interconnects. It depends on architectural control that assumes some level of imperfection and is designed to tolerate it.”

µHybrid bonding joins 10µm- to 1µm-pitch bumpless interconnects. The 1µm pitch is extremely aggressive by today’s standards, but even that doesn’t reflect a physical limit. Thomas Pleschke, business development manager at EV Group, sees even tighter pitch coming. “Theoretically, we could bond two 300mm wafers with 200nm pad pitch with several trillions of connections,” he said.

Planar silicon processing has the benefit of building many connections in parallel, so whether it’s a million or a billion doesn’t really matter. What does matter is the uniformity of the process. Variations across a wafer could have a devastating impact. Meanwhile, adding infrastructure to the I/Os allows testing before and after bonding to ensure a reliable connection.

It’s surprisingly easy to get to a billion
The key to getting billions of connections is the 1µm pitch that hybrid bonding enables. Each millimeter can then host 1,000 “bumps.” (Calling them bumps is convenient, but the whole point of hybrid bonding is that you lose bumps and bond pad to pad.)

An example package illustrates how this happens. For the purposes of this exercise, we’ll assume that interposers can be hybrid bonded to substrates (and it doesn’t change the conclusion if we don’t).


Fig. 1: Processor package for counting bumps. If the HBM4 stacks are 16 high, bump count can exceed 26 billion. (Source: Bryon Moyer/Semiconductor Engineering)

We look to existing dies, where possible, just to give a die size. It doesn’t reflect the number of connections actually available on that die, but we’ll use the dimensions to figure out how many connections are possible with hybrid bonding for that size die.

The package shown contains eight Intel Nova Lake processor dies, each measuring 14.8 × 6.6 mm2, which amounts to more than 97 million connections per die, and a total of 781 million connections across eight units. Twelve 16-high HBM4 stacks contribute 11 × 11 mm2 per DRAM die, yielding over 23 billion connections. A single I/O chiplet is modeled after one of AMD’s, also with 11 × 11 mm2. The interposer itself, if sized at three reticles, would be roughly 3(28×33 mm2), for a total of just over 2.5 billion connections.

Add these together and you get a grand total of 26.7 billion connections. Most of those come from HBM4, so even if the interposer can’t be hybrid bonded to the package substrate, you’re still in the tens of billions of connections.

That’s cheating a little bit since the HBM4 stacks can be tested at the die level and then at the stack level before being bonded to the interposer, but that doesn’t change the fact that billions of connections will soon be possible within a single package.

With that many connections, how can engineers be confident those are all operational? Hybrid bonds are small and packed closely together, so inspection after bonding seems impractical. Testing each connection, even for opens and shorts, would take measurable time — if you have a way to access each connection individually, which is difficult.

Ensuring good connections requires two conditions — processing must be extremely consistent across the wafers to ensure uniform etching and filling of the bond pads, and built-in testing infrastructure is necessary to make testing tractable.

From the start, build it right
One of the biggest challenges to making advanced semiconductors is process variation. For an individual die to work, all the bond pads on that die must be built exactly the same. That means growing the oxide, creating holes, filling the holes, and backing off the metal to allow the oxide to bond first when forming the bond. Any single failed connection can ruin the die.

Hybrid bonding is notorious for imposing tough conditions for success. Surfaces must be pristine on both oxide and metal in order for them to bond as if they were a single piece of oxide or copper. “Hybrid bonding has challenging requirements related to surface preparation,” noted Pleschke. “A surface roughness of less than 0.5nm is usually required. The plasma process gas, RF parameters, and time are critical process parameters.”

Uniformity across a large die is hard enough. But decent wafer yield demands extremely high cross-wafer uniformity. Such uniformity doesn’t guarantee the integrity of all connections, but it makes it far more likely, taking some pressure off testing.

“Good control of copper dishing, including height, shape, and uniformity during the CMP [chemical-mechanical polish] process, is essential,” said Pleschke. “[The copper] is typically recessed 3 to 5nm, with uniform size and distribution (Cu pad expansion ~1nm /1µm Cu thickness/50°C).”

Parallel processing
Planar processing is helpful in that all dies and pads are processed in parallel — as long as variation remains in check. “Manufacturing billions of interconnects is possible only because the entire semiconductor processing line, consisting of lithography, deposition, and etching, is designed to operate in parallel and at wafer scale,” said Chee Ping Lee, managing director for advanced packaging at Lam Research. “Dielectric deposition enables the initial binding between the wafers during hybrid bonding. Then we plasma drill holes in this dielectric material with highly repeatable and well-defined sidewall profiles that maintain the accuracy of the initial lithography patterning. Finally, we fill billions of holes with metal in parallel to create a single perfect interconnect.”

Lee further illustrated the scale of what’s being achieved. “A useful analogy is creating rainfall uniformly over the entirety of the landscape of the United States with such precision that buckets placed one meter apart fill at exactly the same rate,” he said.

Wafers are increasingly being ground thinner to reduce stack height — especially for HBM — and to shorten connections. Temporary bonding materials (TBMs) attach these wafers to carriers for stability.

“The roadmap to high-bandwidth memory using hybrid bonding requires ultra-thinning of wafers down to tens of microns for shorter signal path after stacking,” said Amit Kumar, applications engineer at Brewer Science. “This causes several material performance constraints, key among them being the need for mechanical and thermal stability for multiple stack bonding cycles, extremely low TTV (total thickness variation) for uniformity, and particle-level cleanability for the temporary bonding material.”

The dielectric matters
Adjacent hybrid-bonded pads are separated by a dielectric. Signal integrity can suffer once the signals come too close to each other. Using a dielectric with a lower dielectric constant can help.

“When the I/O density increases by an order of magnitude, the distance between metal conductors shrinks,” noted Kumar. “To maintain signal integrity, a very low dielectric constant at high frequencies is required of the dielectric material.”

Reduced pitch can also add stress to the dielectric. “The dielectric must sustain higher stress and offer stronger bonding energy than the dielectric used in larger‑pitch architectures,” said Pleschke. “It must withstand increased stress associated with reduced dielectric spacing between copper pads.”

In addition, copper can migrate through some dielectrics. “At smaller dimensions, copper diffusion becomes an additional reliability concern that must be addressed through appropriate dielectric selection, such as SixNy, SiON and SiCN,” Plesche added.

Unfortunately, these dielectrics exhibit higher dielectric constant than SiO2 (3.9 to 4.2), with SiON coming in at 3.9 to 7.5, SiCN at 4.0 to 9.0, and Si3N4 at 6.0 to 7.5.

Inspection is impractical
“Both quality assurance efforts and tooling requirements rise as a result of shrinking hybrid bonding pads,” said Pleschke.

Given the size and density of these connections, optical inspection is no longer an option. “At the scale enabled by hybrid bonding, it is no longer technically possible to inspect every individual contact pad,” noted Lee. “This represents an enormous challenge for metrology equipment suppliers.”

Others agree. “Defects at these densities are typically electrical and localized, appearing as weak or open bonds, marginal behavior, or small clusters of failures that may not be visually detectable,” added Jain.

Since individually inspecting each connection isn’t practical, testing becomes the next best way to eliminate failing dies. The challenge is that each die can be tested on its own prior to bonding, but then it must also be tested after bonding to ensure a solid bond.

Must test instead of inspecting
Instead, such chips will need their own built-in self-test (BiST) to validate the connections. This involves test engines and redundancy for the purposes of repair in case a bad connection is found.

One way of dealing with that challenge is to define clusters of I/Os, each of which has the necessary infrastructure to be self-sufficient. These clusters can then be replicated to scale up the desired number of connections.

Synopsys’s 3DIO IP is an example of such an approach. Each cluster provides 16 lanes (with one pad per lane per direction). In addition, each cluster has its own clock tree, allowing data rates from 4 to  6 Gb/s using double-data-rate (DDR) clocking. It also comes with VDD and ground connections, as well as ESD protection.

“Instead of qualifying every individual interconnect, the 3DIO PHY groups interconnects into small, repeatable clusters, making each cluster independently testable. Embedded built-in self-test supports pre-bond and post-bond testing, enabling early, deterministic electrical detection of bonding issues,” said Jain. “Because the PHY is protocol-free and does not rely on link training, timing behavior can be directly observed during manufacturing test without protocol constraints.”

Most important for this topic, however, is that each cluster has a BiST engine, redundancy, and repair capabilities. This allows the die to test its own connections, and it can do so before and after bonding.

These clusters are configurable through a compiler, and the amount of redundancy available can be set according to the needs of the application. That redundancy allows dies with failed connections to be salvaged by swapping in a spare through the repair mechanism.

Redundancy and repair are important
Any defects identified during test will likely be random. That provides an opportunity for repairs that can rescue a failing die that would otherwise be discarded. Such a mechanism requires redundant pads that can be swapped in should any of the primary pads have a bad connection.

“In real manufacturing environments, most yield loss comes from localized, spatially sparse defects rather than systemic issues,” said Jain. “As a result, the amount of redundancy required is highly dependent on the specific process technology and foundry defect characteristics. There is no single correct redundancy ratio that applies across all implementations.”

Synopsys’ cluster approach means that such resources can be provided at the cluster level, easing scalability as more clusters are added. “At the PHY level, we support manufacturing tests that can identify defective lanes or clusters, allowing repair, remapping, or redundancy strategies to be applied based on actual silicon data,” Jain said. “By operating at the cluster level, the architecture provides the granularity to absorb realistic manufacturing defects while avoiding unnecessary over-design.”

Maintaining reliability
Reliability might appear to be a mixed bag. On the plus side, you’re making smaller components and connections, which should make them easier to build reliably. On the other hand, you’re trying to build billions of them, and it’s easy to imagine that some of them may be unreliable just from statistics.

But, in fact, the smaller, shorter connections win out, improving reliability beyond what’s available through microbumps. “Ultra short, hybrid-bonded copper-to-copper connections exhibit lower resistance and capacitance than traditional microbump interfaces, improving signal integrity,” said Lee. “The exact bit-error improvement margin may vary from system to system based upon a variety of factors. However, hybrid bonding provides more uniform interface and causes significantly less signal degradation when compared with earlier die integration schemes.”

Jain concurred. “”When margins are properly designed and validated across process, voltage, temperature (PVT), and aging conditions, the probability of random bit errors becomes extremely small and limited to rare statistical events,” he said. “As a result, an extremely low BER [bit error rate] can be achieved directly at the PHY, without depending on higher-level protocols to mask errors through retries or ECC. This intrinsic PHY level reliability is critical for scaling dense hybrid bonded interconnect fabrics.”

It only gets harder from here
Our selection of 1 µm reflected a simple number, not a physical limit. As noted above, theoretically, companies have the capability to pattern pads with 200nm pitch. Eventually, pad pitch scaling may require a change in the pad shape.

“As pad sizes and pitches continue to shrink, the copper surface percentage increases and copper density may require optimized (hexagonal) pad placement and dummy pads for uniformity reasons,” said Pleschke.

This is also complicated enough that collaboration is important. “Hybrid bonding is a multidimensional cross-skill challenge involving material, semiconductor, and mechatronics,” said Pleschke. “Close collaboration across the entire value chain is needed, from research and development through to production.”

As hybrid bonding becomes more prevalent and the number of connections increases, additional challenges will presumably emerge. Yet more new ideas will be necessary to ensure that billions and trillions of connections can be built predictably, reliably, and with high performance.

Related Articles

Challenges In Scaling Chips To 2nm And Below

AI Accelerators Usher In New Era For IC Test

Advanced Packaging Limits Come Into Focus



Leave a Reply


(Note: This name will be displayed publicly)