Chiplet Reliability Challenges Ahead

Determining how third-party chiplets will work in complex systems is still a problem.

popularity

Assembling chips using LEGO-like hard IP is finally beginning to take root, more than two decades after it was first proposed, holding the promise of faster time to market with predictable results and higher yield. But as these systems of chips begin showing up in mission-critical and safety-critical applications, ensuring reliability is proving to be stubbornly difficult.

The main driver for the chiplet approach is the drop-off of power, performance and area (PPA) benefits from scaling. It’s more expensive and more time-consuming to develop chips at each successive process node, and the reasons for doing so are fading. TSMC’s 5nm finFET process “offers 15% faster speed at same power or a 30% power reduction at same speed with 1.84X logic density of the 7nm node,” said Geoffrey Yeap, senior director of advanced technology at TSMC, in a paper at the recent IEDM conference.

Prior to the finFET process nodes, chipmakers scaled transistor specs by 0.7X, enabling a 40% performance boost for the same amount of power and a 50% reduction in area. That formula no longer applies.

That reduction in PPA, coupled with a demand for more customized solutions in end markets, has pushed chipmakers and systems companies to examine alternative approaches. Advanced packaging is a key part of this strategy, offering faster time to market with modular customization, almost unlimited area, and new configuration possibilities for improved throughput, thermal dissipation, and better management of other physical effects.

Intel, Marvell and AMD all have working silicon using this approach, and the rest of the industry is gearing up to follow suit, particularly for edge applications where customization and time-to-market are essential. The expectation is that if companies can develop customized solutions fastest for increasingly narrow market segments, they will take leadership roles in those segments. More modular approaches that can utilize specialized accelerators and algorithms are viewed as key requirements for achieving this goal.

This has made chiplets the big buzzword in the design world, and foundries and OSATs are ramping a number of different modular architectures using proven third-party hardened IP.

“A lot of the chiplet-level work is being done by the big systems companies,” said Vivek Chikermane, distinguished engineer at Cadence. “If you’re a mid-tier company, though, you may only make one chip. To make that work, more standardization is required. That’s why TSMC is looking at this for its ecosystem. There is some standardization that comes with the packaging. This puts the onus on the package integrator.”

Will they work?
Still, ensuring that chiplets will work as expected throughout their expected lifetimes, with increasingly heterogeneous, multi-vendor architectures is not trivial.

“If you look at what TSMC is doing, they get trusted IP and they have a scoring system from a reliability perspective, and in the next evolution that will probably be available in chiplet form,” said Matthew Hogan, product director at Mentor, a Siemens Business. “What you’ll probably see is chiplets will become a commodity, where you have chip ‘wings’ inside of packages using standard interfaces.”

These so-called wings would work like add-ons for a pre-designed module, using standard interconnects for attaching to the main module. And while they would not provide as large an improvement in performance and power as a ground-up, multi-chip design, this approach does offer custom acceleration of specific algorithms and significant time-to-market improvements.

What goes wrong
The number of new ideas pouring into advanced packaging has turned into a flood over the past year or so. Moving off a single planar die into a package has opened up all sorts of possibilities.

“We’re starting to hear more about on-board optics,” said Rita Horner, senior staff product marketing manager at Synopsys. “The idea is to package electrical and optical together to get beyond 100 gigabits per second (Gbps) because you cannot afford packet loss. On a single chip, you also have to deal with thermal issues and even more complexity. Rising temperatures mean a device will age much faster. To get to 200 Gbps, you will need on-board optics. But how do you maintain that?”

The answer isn’t entirely clear. More work needs to be done on the design, verification and modeling side, and testing needs to happen continually throughout a product’s lifecycle. Moreover, all of this needs to be revisited and updated on a regular basis as algorithms change, and because devices within a package may age and degrade at different rates.

Placing chips in multi-chip packages can help with physical effects, but only if the architecture takes into account all of the various components in the package. It also can help to deal with rising data density, particularly in AI/ML chips, where there currently is not enough processing capability to fit on a single reticle-sized die. In fact, some of the chips being developed today are being “stitched” together.

Advanced packaging doesn’t eliminate physical effects, but it does provide some additional options on the z axis to separate different components, such as two processing elements, to minimize heat. The problem for chiplet makers is they don’t necessarily know what their chiplet will be next to, and that can affect how it should be characterized for such effects as various types of noise, as well as heat. All of those need to be understood better and standardized.

An IDM would have a working knowledge of the proximity effects around different chiplets they have developed. But for others, these may be black boxes and numbers on a spreadsheet.

Improving reliability
“With chiplets, you’re really looking at a small system,” said Evelyn Landman, CTO of proteanTecs. “When you do wafer sort, you can see how each chip performs. And when you do final test, we also can see what’s going on with packaging. But in the field, you need to look at everything because anything can happen. So you need to be looking not only at the interfaces, but also inside the chip.”

This becomes particularly important because with third-party chiplets, characterization may vary.

“Today, whenever you use chiplets, you need to put the same communication IP on both sides,” said Landman. “Otherwise it won’t work. We are mostly agnostic about the PHY. All we require is that our Agents are inside the interface. There are a lot of options today. Everyone is doing something similar, but none is exactly the same. They’re using different base platforms with different speeds, different widths, different chips and different processes. As long as the PHY is the same, it doesn’t matter.”

The idea of chiplets isn’t new. Multi-chip modules have been in use since the 1980s. But hooking together specialized dielets/chiplets/tiles using a standard interconnect is different, and the market needs and the technology are both converging.

“This has been talked about for a long time, and now it’s turning into reality,” said Doug Lefever, president and CEO of Advantest America. “This puts a higher degree of importance on the backend operation. The backend needs to look a lot more like the frontend in terms of processes, and the analytics that go along with that, and making sure the equipment is matched. That has to happen because packaging and chiplets are now part of the differentiation. This is all going to happen quickly, too. As opposed to a new processor, which is multi-year cycle, you may see a new package come out in a matter of months and be put together with an architecture enabled by chiplets. It will require more sophisticated manufacturing systems — which is why we signed a deal with PDF Solutions — as well as dealing with the sheer velocity of this.”

In the past, understanding what went wrong often was the result of a post-mortem on a field failure years after a device was released into the market. The challenge for chiplets is to identify problems earlier in the cycle, even before the devices leave the fab, and preferably before they are packaged together.

“This is exactly the kind of thing that is enabled by end-to-end analytics combined with test data,” said John Kibarian, CEO of PDF Solutions. “When chips come back, you need to understand what’s functioning, and what’s not functioning and why. Then you need to look at all the other chips that were processed on that set of tools, which is why you need traceability in the assembly flow. What other chips were on that wirebond on that day?”

This also requires storing large volumes of data for extended periods of time, and that can be multiplied by the number of chiplets that were manufactured separately, sometimes using entirely different process geometries.

But it’s not just about using different chiplets in a package. Even planar chips are being developed with dielets or tiles. Cerebras, which garnered attention by putting everything into a planer wafer-sized dize, uses side-by-side dielets connected over a 100 petabit per second interconnect, which allows for extremely efficient routing between the channels. If something fails, those dielets are turned off.

Test
Use cases vary from market to the next, and from one device to the next. Nevertheless, there are some common problems that need to be addressed. Testability is one challenge, particularly once these mini-systems are packaged together, because there are only so many test leads that can extend out of a package. As a result, test has to happen in stages, from individual die to the fully assembled package.

HBM, one of the best-known package technologies, offers some insight into the kinds of problems facing chiplet makers.

“Much of what we do is either early characterization through our engineering systems business or the disposition in the fab, or just coming out of the fab,” said said Mike Slessor, president and CEO of FormFactor. “That determines whether a die should move on and become part of the HBM stack, for example, or be packaged into a standard logic substrate.”

As with DRAM, the challenge will be figuring out what is good enough in some of these applications, a problem that is not entirely solved because two marginally good die may create one bad package.

“You’re testing to make sure that each of these component die are functionally good, or good enough to be repaired in the final package,” Slessor said. “And because they’re being fabricated on fairly advanced nodes — at least 1x or 1y nanometer DRAM nodes, the yields are not great. And so it’s a simple functional characterization of making sure that the die that go into the stack are as close to good as they can get. I’m reluctant to use the term ‘known good die’ because it conveys the notion of a perfect thing, and nothing in the semiconductor industry is perfect. There’s a balance of cost versus risk that people constantly play with, and for DRAM there is some level of repairability and redundancy. So you see all of those different knobs being exercised. But HBM for sure has impacted not just the volumes of our DRAM probe card business, but also the spec requirements as they continue to tighten them up.”

So far, most of the 3D stacking has been memory on top of logic, or with HBM, memory on memory. “For that, you can test the interconnect between the logic and the memory,” said Cadence’s Chikermane. “But we do anticipate logic on logic, and that changes everything. In 2.5D and fan-out, you can do probing. With a 3D stack, that’s impossible. You need to design in and build in test bus access.


Fig. 1: Test access architecture for testing individual, assembled and packaged dies. Source: Synopsys

Design for testability and traceability
And this is where things get significantly more complicated from a reliability standpoint. With 3D, design for test (DFT) also includes design for testability. There has to be a way of testing after packaging, and re-testing throughout the device’s lifetime. A big challenge with chiplets is understanding all of that and creating standards, particularly as more third-party chiplets are used.

“With a variety of vendors developing chiplets, this becomes a truly democratized market,” said Mentor’s Hogan. “But now you have to decide who defines the electrical interfaces, what is the standard of care, how much ESD protection is required, and during assembly, how you transport it to the OSATs and assembly houses. There are still a lot of questions about who owns what part. We saw that with HBM, where they were required to have an exact pin-out. Similar things have to happen with chiplets.”

There also needs to be much better characterization of chiplets.

“A lot of these are like black boxes,” said Simon Rance, vice president of marketing at ClioSoft. “The industry is going to have to collaborate on a standard way of defining all the necessary characteristics in metadata format, and how and where to capture and store it. I see this as being similar problem that we saw with the hardware/software interface IP management that spawned the SPIRIT Consortium and eventually became IP-XACT and IEEE1685. ClioSoft needs something similar defined for characterization and third-party chipsets in order for us to provide the tracking solution to the industry.”

While the individual chiplets can be tracked today, the industry needs to agree on a standard way to tag all of this information, Rance said.

That will go a long way toward understanding how to floor-plan a device in three dimensions, and improve the overall reliability of the device.

“You really need to understand what is the correct technology for ‘this portion’ or ‘that portion’ of a package,” Hogan said. “The problem is you don’t know what other chiplets will be next to that chiplet. You need to understand the electrical and physical requirements to make this work.”

Security
Another element that enters into the advanced packaging picture is security. In a safety-critical or mission-critical application, security equals reliability. Chiplets inherently are not less secure than any other IP, but how they are attached, how the data flows through a device, and where the chiplets are sourced all need to be considered.

“With any type of secure data communication, there are always concerns about both confidentiality and authenticity of the data,” said Scott Best, technical director of anti-counterfeiting products at Rambus. “This is true if the two parties are communicating across the world, or if they’re communicating across a 2.5D interposer in a heterogenous SoC. So the number of attack surfaces — non-invasive, semi-invasive, fully-invasive — is not ‘solved’ by chiplets, but it’s not made that much worse, either. In general, SoCs formed with chiplets are ‘easier to attack’ than a monolithic SoC, but more secure than traditional chip-to-chip interfaces running across a PCB. A security concern would be that data used to communicate on very fine pitch (and difficult to reach) on-chip buses is now being communicated through ‘top-layer’ chip-to-chip interfaces, so those new interfaces present new targets for, say, power-analysis or man-in-the-middle (MITM) attacks.”

Because chiplets increasingly will be developed by third parties, they also open the door for hardware Trojans. “It’s nearly impossible for an adversary to ‘inject’ a malicious circuit onto a production graphic database system (GDS) mask without an insider attack by the GDS-owner (i.e., the chipmaker),” Best said. “But in a chiplet environment, it is much more difficult to confirm non-malicious behavior of a chiplet that’s part of the heterogenous solution. For example, does that 100G Ethernet PHY chiplet have an embedded hardware Trojan that the SoC will never know about? It might, by design, with no ‘insider attack’ necessary.”

Conclusion
As the industry begins to address third-party chiplets, it also will need to address the reliability aspects of a complete system. This requires more modeling, more testing, and more simulations, and it will require historical data.

Most experts believe the chip industry has no choice but to head in this direction. But shifting gears from 50 years of shrinking everything onto one die, including entire PCBs, to stacking separate dies in an advanced package and ensuring they will work reliably for the projected lifetime of that packaged system is hardly a simple next step. It will require innovation, lots of documentation, and many more standards for how chips are characterized, tested, inspected, measured and handled throughout this process. So far, we have just scratched the surface.

Related Stories
The Good And Bad Of Chiplets
IDMs leverage chiplet models, others are still working on it.
Chiplet Momentum Rising
Companies and organizations racing to define interfaces and standards as SoC scaling costs continue to rise.
Smaller Nodes, Much Bigger Problems
Ansys’ chief technologist digs into looming issues with device scaling, advanced packaging and AI everywhere.
eFPGAs Vs. FPGA Chiplets
Which approach works best where.



2 comments

Frederic Tilhac says:

Thx, very interesting article and I do share this vision as well beeing in this industry since 30 years.

Songtao Liu says:

Very insightful article. The chiplet solution will be more welcomed than ever.

Leave a Reply


(Note: This name will be displayed publicly)