Variation Making Trouble In Advanced Packages

Disaggregating SoCs into multiple chips introduces potential issues that may take years to show up.


Variation is becoming increasingly problematic as chip designs become more heterogeneous and targeted by application, making it difficult to identify the root cause of problems or predict what can go wrong and when.

Concerns about variation traditionally have been confined to the most advanced nodes, where transistor density is highest and where manufacturing processes are still being fine-tuned. This is why design rules are more restrictive as new nodes are introduced, then relax over time as those processes mature. But as new multi-chip/multi-chiplet architectures — including chips developed at more than one process node — replace or supplement feature shrinks as the best way to improve PPA, the number of sources and impacts of variation is growing.

Variation has many causes, and it comes in many forms. It can show up in everything from lithography to cleaning and polishing, or even in the gases used for etch or deposition. It also can manifest itself in different sources of noise that can affect signal integrity. And it can show up in the interconnects between chips in a package, or within the packaging itself.

“When it comes to heterogeneous chips in a package, the form factor of the package (x,y,z) becomes a major source of variation — usually due to the bigger size of the substrate — which causes a series of process challenges,” said Choon Lee, CTO of JCET. “Two of the biggest ones are warpage management and reliability management.”

There also is variation in the bonding/debonding and interconnects used in advanced packaging. “For example, there are mass reflow, thermo-compression, and laser-assisted bonding interconnection options in wirebond, or a combination of wirebond plus flip-chip with all kinds of passives,” Lee said. “For each process, there is much variation in terms of temperature history, stress residue, and possible invisible micro-cracking.”

In some cases, that variation can be additive. So while the stochastics in EUV may not cause a problem in a single chip, combined with other chips/chiplets in a package and other sources of variation, they can impact yield or affect long-term reliability of a device.

“For us, the problem is the variation in the assembly,” said Rosie Medina, vice president of marketing at QP Technologies. “Sometimes customers deal with issues like temperature drift, and we have to help them build multiple lots to characterize that device. Then they know what are the optimum operating parameters.”

To make matters worse, many of these designs are being targeted at specific domains. An AI chip that serves as the central brain in a vehicle is going to be a lot different than a design for a server or an AR headset, and it’s likely that neither will achieve the kinds of volumes that have driven chips developed at the most advanced nodes in the past.

“The ones that are being produced in the billions of units go through many cycles of test chips,” said Aki Fujimura, CEO of D2S. “You have to differentiate between the problems that are stochastic, meaning they’re just going to scale equally, and the ones that are systemic, meaning there might be something about this particular design that is vulnerable. You have to wade through all these differences and eliminate the ones that are systemic, and then control the stochastics.”

A long and varied history
Variation in packaging is hardly new. In fact, it’s been well understood for decades. But at advanced nodes and in advanced packaging, its impact is growing and widening. Mike Kelly, vice president of Advanced Packaging & Technology at Amkor, noted that as operating voltages are reduced, the impact of variation grows because the tolerances are tighter. “The older the node, the more tolerant it is of things like voltage variations,” said Kelly. “That’s why there’s so much interest in heterogeneous packaging. People are trying to segregate the issues and the opportunities at the silicon level. A long-reach SerDes I/O device doesn’t need 3nm transistors.”

Still, this is much more complicated than just putting together LEGO blocks. “As we go to more custom packages, with more I/Os, finer pitches, and less real estate to put those devices, that lends itself to more customization,” said QP’s Medina. “We still see a lot of ‘jelly bean’ packages, which are the old legacy-style packages. But anything new is almost always custom. You can’t just stick it in a standard configuration.”

Fig. 1: Advanced packaging example showing different layers of die stacking and various components. Source: Coventor, a Lam Research Co.

Fig. 1: Advanced packaging example showing different layers of die stacking and various components. Source: Coventor, a Lam Research Co.

And heterogeneity comes with some tradeoffs. The key is how to minimize the impact. “We have a customer in production today in silicon photonics that is using a photonic IC wafer, and we’re bonding the entire wafer to the logic wafer and then singulating that,” said Gregg Bartlett, senior vice president of technology at GlobalFoundries. “Variation is super important — how much bow you have in the wafer if you’re doing a copper-to-copper bonding, making sure the planarity is such that when we do the bonding we don’t end up with voids in the electrical connections at the end. It’s not that it’s new sources of variation. Certainly, we have to deal with that, but normally it’s on a monolithic basis. It’s the fact that you now have two things combined, and you want to independently control them and optimize both of them. The solution is a combinatorial technology. We have to prove out the reliability of it. And there can be failure mechanisms associated with those sources of variation.”

What may go unnoticed at 90nm or 45nm, may be problematic when that same chip or chiplet is put into an advanced package with other devices at either the same or different nodes.

“With through-silicon vias, you wind up with this silicon substrate with copper deep down into it, and then we reveal that TSV on the backside,” said Bartlett. “With the expansion of the copper, once you reveal that, you can get some mushrooming off the back of it. So there are all of those issues that you have to come to grips with from an integration standpoint, and those are fairly visible at 50-micron sized features. But the failure mechanisms and reliability issues are not discoverable until we actually put the entire system together — the photonic IC and the logic chip — and start running it through temp cycling and such. That’s where the benign reliability latent issues manifest themselves. What’s happened is it’s all at at much higher level of integration than a chip.”

This presents a challenge for chipmakers, particularly those working at the most advanced process nodes, because the other options are becoming untenable. Just cramming more transistors developed at the latest process node onto a single SoC isn’t necessarily the best way forward anymore. The power and performance benefits from scaling have been shrinking since 28nm, and the cost has been rising steadily at each new node after that.

“The trend that we have seen lately is that fewer and fewer companies are able to monetize the value of the most advanced-scale technologies,” said David Fried, vice president of computational products at Lam Research. “There are fewer customers at 5nm than there were at 7nm, and there were fewer at 7nm than at 10nm, because a smaller number of companies can extract value from the large capital investments needed to develop these new products. You are going to see that trend continue. If you cannot capitalize financially on the value of scaling, be it power, performance, area, or yield, then you shouldn’t scale. This decision has to be made at the product level. Certain products are going to be analyzed by their owners looking at fixed costs and recurring costs, and the owners will decide that the business side works better if you stay at 7nm and don’t jump to 5nm.”

What can go wrong’
This is where advanced packaging fits in. Rather than developing a reticle-sized SoC at the most advanced nodes, an advanced package can use one or more smaller (and presumably less expensive) logic chips developed at those nodes, interconnected to chips or chiplets developed at other process nodes. But unlike with planar chips, there are so many possible permutations available that it’s difficult to develop design tools that can take all of them into account and to understand what’s going on inside the package at the necessary level of granularity.

“If you’ve got 3D processes, then inspecting the surface of the wafer doesn’t really tell you a lot, because most of the conductivity is perpendicular to the plane of your image,” said John Kibarian, president and CEO of PDF Solutions. “Electrical in-line inspection is super-important. And then when you scratch that a little deeper, you build on an eight-square centimeter GPU a few hundred billion to a trillion contacts and vias. If you have stuff that’s failing once in a billion times, that sounds pretty good. But if you have 100 billion contacts on a wafer, 10 of those have failed. So you need very minute statistics. There were always things happening at that statistical level, but you never cared because it was only one chip in 100 million. But now it’s a much bigger deal.”

Tooling is still playing catch-up. “It’s clear that we need these advanced packaging technologies,” said Andy Heinig, group leader for advanced systems integration in Fraunhofer IIS’ Engineering of Adaptive Systems Division. “But the system designer has all these different options for package types, and to integrate functionality into a package. Each of these packages has advantages and disadvantages, and the package prices differ. So it’s important for the system designers to decide which is the right type of package for a specific application, because if it’s the wrong decision it’s difficult to go back into the design process because a lot of detailed design parts are now in place.”

Putting chips together into an advanced package doesn’t necessarily make things easier or less expensive, and there are potentially multiple sources of variation that can creep into the chips/chiplets, including the way they are assembled in a package, and even the way they are used in the end device.

Variation can include die shifting, warping of individual die when they are bonded together at the wafer level, and variations in materials such as thin films and substrates. Some of these variations are so tiny that inspection doesn’t always spot it.

“Everybody’s got a different scheme,” said Rama Puligadda, CTO at Brewer Science. “Performance, power and area/cost are what people are after, so how can I get the best combination? There are many possible ways. We see reconfigured wafers in different mold compounds, and those mold compounds are not the same everywhere. They are very flexible in some cases, very rigid in others, highly stressed and warped in one case and not in another. We have to deal with all of that. There is different variability in the process schemes, and with our material and process technology.”

That variation can be compounded by other variation, as well. “If you start with warpage, then you deal with that with a certain limitation technology or bonding technology,” Puligadda said. “But they still didn’t get it to zero, and now they’re doing something else on top of that, adding another layer to that complexity.”

Uneven aging
One of the more challenging aspects of variation is uneven aging of components in a package. This can vary by end application, by individual use case or location, and by the process nodes at which different chips or chiplets were developed. But it becomes especially challenging for packages that are developed for applications such as automotive, where these devices are expected to work to spec for 10 to 20 years.

“Everything ages differently,” said Prashant Goteti, senior principal engineer at Intel. “If we take lifespan as an example, everything ages differently already. So you need to be able to manage that. And you need to bring in adaptive abilities to be able to do this. There’s no way to put that genie back in the bottle. We’re in a world where we’re going to have systems-in-packages with tens, if not hundreds, of chiplets in them. So we’ve got to figure this out. And the way to figure it out is to make it smart and adaptive, and to bring that into the lifecycle management system.”

Packaging, with all the different possible permutations, makes it much harder to design once and repeat those steps for all chips. And that has an impact on reliability, performance over time, and even decisions about how systems should be partitioned and prioritized.

“As with every question about reliability, the answer is, ‘It depends,'” said Rob Aitken, distinguished architect at Synopsys. “Thermal differential can be a problem from a packaging reliability standpoint. Does this thing behave as advertised because I didn’t sign it off on a corner where this part is cool at that time?”

Homogeneous computing is becoming cost-prohibitive at the most advanced process geometries. Just putting more transistors on a die no longer results in significant improvements in performance, power and area/cost. But building complex, customized packages with multiple chips or chiplets can be a complex and costly undertaking.

As the packaging industry matures and expands its focus, the industry may settle around certain  architectures and platforms with many choices of well-characterized components. But variation will continue to be a concern because there is simply more going on within those packages. And with increasingly dense packages, variation will create some unintended effects that need to be recognized and dealt with at all stages of the design through manufacturing flow, and even into the field. This will require a combination of better EDA tooling, more uniform packaging approaches — essentially the new platform — and a lot more traceability and monitoring of the various pieces involved.

Advanced packaging is the future. But it’s going to require a massive and collective effort by the entire chip industry to make this work consistently, reliably, and fast enough to hit market windows in very specific domains.

Further Reading:
Strategies For Faster Yield Ramps On 5nm Chips
Smart software finds more EUV stochastic defects and missing vias, improving wafer yield.
Big Changes In Materials And Processes For IC Manufacturing
Broad set of changes in semiconductor manufacturing, packaging, and materials, and how that will affect reliability, processes, and equipment across the supply chain.


Dr. Dev Gupta says:

Reducing manufacturing variations require investment in R&D which is not happening anymore for Adv. Packaging

Dr. Appo van der Wiel says:

Heterogeneous assembly is saying good bye to top-down design. The heterogeneous system design will set important requirements to the CMOS design. But as heterogeneous assembly is not standardized it is difficult to specify these CMOS requirements up-front. Where do you begin? It takes (too much) time to go through several design cycles.

Leave a Reply

(Note: This name will be displayed publicly)