Making Heterogeneous Integration More Predictable

Engineering teams, methods, and modeling need to be rethought. One size doesn’t fit all, and defects are inevitable.


Experts at the Table: Semiconductor Engineering sat down to discuss problems and potential solutions in heterogeneous integration with Dick Otte, president and CEO of Promex Industries; Mike Kelly, vice president of chiplets/FCBGA integration at Amkor Technology; Shekhar Kapoor, senior director of product management at Synopsys; John Park, product management group director in Cadence‘s Custom IC & PCB Group; and Tony Mastroianni, advanced packaging solutions director at Siemens Digital Industries Software. What follows are excerpts of that conversation.  To view part one of this discussion, click here. Part two is here.

[L-R] Dick Otte, president and CEO of Promex Industries; Mike Kelly, vice president of advanced packaging development and integration at Amkor Technology; John Park, product management group director in Cadence's Custom IC & PCB Group; Shekhar Kapoor, senior director of product management at Synopsys; and Tony Mastroianni, advanced packaging solutions director at Siemens Digital Industries Software.
[L-R] Promex’s Otte; Amkor’s Kelly; Cadence’s Park; Synopsys’ Kapoor; Siemens’ Mastroianni.

SE: Can you partition redundancy in heterogeneous designs so it doesn’t impact performance and power as much as it does in an advanced-node planar SoC?

Otte: It depends on the function being performed. If you’re transmitting data from one point to another, the addition of two or three more channels to the 32 you’re using for information will provide one or two error correction bits. It will not slow anything down other than the last cycle, where you’re looking at the information to output. But if it’s one of the sequential processes, then putting in redundancy can slow it down. The only way to add redundancy there is to duplicate the central processor and run three or four of them in parallel, and that starts to be pretty costly. So it depends on the kind of correction, and the kind of error you’re trying to fix. A lot more work needs to be done to evaluate the methodology for getting redundancy built into systems. The good news is that we’ve now got billions of transistors available, so we ought to be able to make use of them to provide this redundancy and correction capability.

Mastroianni: HBM has redundant channels built in, and there are test methods that can reallocate those on the fly in BiST. And redundant memory cells have been around for some time, and they’re relatively low overhead. For the processor, there are architectures with lots of replicated structures. AI is a good example of that, where you have thousands of small processors. If you can build repeated structures into your architecture, so that when something fails it can fail gracefully, then you can disable it. There are tools that allow you to build that testability and repairability around some of those defects. These systems are getting larger, and there are going to be defects. It’s not ‘if,’ but ‘how many?’ For some applications you have to pay the overhead, but there are different techniques and tools, and that will have to be part of the process. There is no one-size-fits-all. There are lots of different techniques to build redundancy into software and processors, and those are critical for many applications.

Park: It depends on the application. Redundancy comes at a cost, which might impact how you cool it, because maybe you’re adding liquid cooling. Where I see a lot of this redundancy is in AI/ML, and stacking logic wafers onto memory wafers. They’re building in a bunch of redundancy. It sits in a mainframe where it can be liquid cooled, and it works in that environment. But if you try to do something like that for a smartphone, it will burn up and melt in your hand. The end product is going to determine how much redundancy you can bring in.

Kelly: A packaging approach for redundancy is to put redundant signaling between die. Let’s say one of your final wear-out modes is a broken trace in RDL. How can you add two or three so that what was a slightly less-than-perfect manufacturing outcome never becomes a product failure? That doesn’t come for free, though, because you need 20% more signals to do this on critical lines in high-risk areas. Maybe that’s another layer of material you need in your module. And all of that costs money.

Mastroianni: With 2.5D applications, most of those high-bandwidth connections are done with standard protocols, so they have redundancy built into it. When you get into 3D, that’s not the case. Chips are talking directly to other chips, so you have to build that into the design. You’re not talking about hundreds of thousands of connections. It’s hundreds of millions, and you have to deal with this because you’re not going to have 100% reliability.

Kelly: From what you’ve seen in 3D copper hybrid bonding, where you literally have hundreds of thousands of interconnections, are there still single electrical signals for a single bond? Or isn’t anyone going to do that?

Park: There are two different types of 3D. We have bump 3D, with micro-bumps that we’ve been doing in memory and CMOS for 10 years-plus, so there’s nothing new there. The new stuff is the hybrid bond, where people are talking about 2 micron pitches. But if you look at how people are actually using that hybrid bond today, it’s L3 and L4 cache. They’re taking SRAM that would be too far away, and putting it directly above the processing logic using hybrid bonds. That’s a big, wide bus, and there’s redundancy built into it.

Mastroianni: Most of that is really about power today. You don’t need 100% there. But when you get to true 3D, where those signals are talking to each other, it’s going to be an issue.

Kapoor: Redundancy is coming into the picture with all these hybrid bond 3D connections. But for a second single signal, are you building redundancy bits? That is not unheard of. Even for testability reasons, you need to have extra lanes. And you might have to do more testing for stacked cache and logic. Luckily, you have many more bonds available.

Mastroianni: They’re so small that you probably can make a connection with a half-dozen hybrid bonds for each signal crossing. That’s partly built into the libraries and tools, and it’s still a work in progress. As we build these devices, we’re going to see new problems and we will have to figure them out.

SE: Where are the gaps today in putting heterogeneous designs together? Is it the EDA tools, the processes for test, measurement, and inspection, or is it that people need to interact differently to understand how the different pieces go together?

Mastroianni: It’s all of that.

Park: The basic problem is that the world of system design and ASIC design are converging. A monolithic ASIC designer didn’t have to worry too much about thermal in the past. Now, if you’re stacking things, that’s your number one concern. If you’re designing a monolithic chip, chiplet-to-chiplet signal integrity compliance is a foreign term. But now, when you break things into chiplets, you have a UCIe or BoW interface that requires signal integrity tools. On the board side, we’ve been doing that for 30 years, but now who does it? You can talk to design teams and ASIC designers, and they’ll tell you something is the package designer’s job, and the package designers will tell you it’s someone else’s job. So who does what?

Kapoor: Absolutely, and one of our customers said it very well. There are no 3D engineers being born. It’s all 2D engineers who overnight have to become 2.5D and 3D engineers. In the SoC world there are methodologies, reference flows, and PDKs that have been developed over the years, and we’ve become used to them for doing the design. It’s not necessarily about retooling everything. When you bring all these pieces together, you cannot look at it as a packaging guy’s problem or a silicon engineer’s problem. You have to start looking at it together. There are more components, more issues to handle with the methodologies and the standardization. In the last year and a half, there has been a lot of progress in the industry with UCIe, and TSMC and Samsung 3D code coming into the picture. But you still need more to address this many pieces. With 3D designs, we used to do 2D designs and assemble them and integrate them into a package. Now we’re talking about co-design mostly. I need to know what package I’m targeting, and take that into consideration if I’m doing any optimization early on, whether that’s floor-planning or bump location. Is the IP I’m going to use optimized for face-to-face, or face-to-back? That brings in the IP design aspects earlier on. And then, moving toward this new paradigm of complete 3D co-optimization, 3D place-and-route comes into the picture. With cache on top of logic, which part of your functionality is split between the two dies? That will require some guardrails, because without them the scope is so huge that you won’t be able to do everything that you want. You’ll never get the chip out the door. Those methodologies are a huge area of focus, in addition to tooling enhancements and having more unified solutions.

Otte: One area that needs to have a little more attention than it’s gotten so far is design of these interposers and substrates that all of these complex I/O die go down on. There are various technologies for building them. There are silicon devices that will get you down to 1 micron lines and spaces. There are organics here that are more like 1 mil lines and spaces. And we’ve got to put more energy into learning how to design these things so you can assemble them and avoid warpage and co-planarity issues that make assembly so hard. More attention needs to be paid to that in the future.

SE: A lot of this depends on much better flow of data from the front-end to the back-end, and then back to the front-end. But everyone is petrified of sharing data because they’re afraid their secret sauce recipe is going to be stolen. How do we manage this data more effectively?

Park: One of the first things we did was to create an aggregation platform that everyone can access — the package designer, the ASIC designer, the signal integrity person — because you need that collaboration/aggregation platform. And you don’t work down at the transistor or standard cell level. It’s a higher-level abstract representation. But that’s the difference here. With a monolithic die, I can get my RTL synthesizer, do my place-and-route. This is multiple things coming from multiple places, and all the EDA providers recognize the need for that base system-level planning solution. It becomes THE component to get everyone package designers talking to chip designers, talking to board designers, to the electromagnetics people and the signal integrity people, and bring them together on that common platform. There’s been a lot of progress on the EDA side, and we’ve created some solutions.

Mastroianni: You’re talking about system designers or RTL guys working on their own databases and using their own tools. You have the package designers, the chip designers, the thermal engineers, and, and they’re all using different tools. And they may be in different organizations all over the world. But all this data is required to do the design and the analysis, so you really need to have a very robust data management system. We call it IP lifecycle management. Those tools really manage at the meta level, on top of the disk management systems. So each of the design teams is going to manage their own data, but that needs to be aggregated. You can do that with metadata, but you also need to make sure you’re tracking all the dependencies. If something changes in one piece, that information needs to be reflected throughout. Someone may make a minor change, and that may impact an interposer layout. Or you may have a new piece of software IP that got updated and it needs to be put on a chiplet. It’s not just having all the data management. It’s also the dependencies and the digital threading. All of that data is critical. But it’s also an astronomical amount of data. And then, once you get into traditional product life management cycle, it’s the next level that sits on top of that and manages the BOM once you get to manufacturing.

Kapoor: From a design point of view, there are two broad areas in developing heterogeneous systems where you need to leverage uniformity for efficiency reasons. One is between the design teams and the ecosystem — whoever is providing the PDKs, the rules, and any models. That is an area where there’s clearly effort underway to add uniformity, because standardization and interoperability are key to getting the whole ecosystem buzzing and moving efficiently. The other area is within the design flow, where there are many disaggregated pieces. We would like to integrate them all together. We talked about package, design, and the system level. You can start with C-level models and RTL models, and then there are specific analysis-related models and simulation models. From an implementation point of view, the fundamental challenge is to do all of this in a much more integrated way. There can be one model used to transfer the data, and then broadly split the implementation models and then electrical verification models. We have been offering an implementation platform, but now we need to look at it from a system analysis point of view and integrate all of that together. That’s an emerging area. We need interoperability, and then flow efficiency by grouping of these models.

Kelly: There’s been a modest effort over the last five years to come up with the rough equivalent of a PDK at the package level, which is really just concerned with clearances, parts spacing, design rules, layer count, limitation, and things like that. It helps the front-end guy doing the silicon architecting, especially if it’s multiple chiplets where they’re trying to figure out some sort of a coarse layout to estimate what kind of interconnectivity is required to get the bandwidth between the dies. That drives layer count, or line-and-space. And everybody’s interested in cost right at the front. Some of those PDK-like tools help the designers in the first cycle to gain a rough view of the packaging. Comprehensive design rules can take into account things like line-and-space, and help them define the cost of the package and make tradeoffs against other tradeoffs in the silicon world. It’s very rough, obviously, and being able to integrate that kind of intelligence into a cockpit where you do everything is still a ways off.

Otte: In our case, we do a lot of final testing, often utilizing a tester that is engineered by our customers and which understands in depth the functionality. They usually have remote access to that, so as we test devices on a daily basis, they can see the test results. The test failures are of two types. One is assembly errors. The second involves fundamental things that are bad within the die, or something else that’s marginal in the device. One of the challenges for that engineering organization, and for our engineering organization, is to figure out which is which. Who is responsible for each of these failures? And who has to take action to minimize them to raise overall yields? We feed that information back to our process guys, who fix it, and we’re all in the same building. On the other hand, we occasionally have run into a situation where it’s a fundamental error in the firmware, and somebody has to go and modify the code that has been downloaded into the memory of a device to eliminate that class of failure. That’s the other extreme. And there’s everything in between, of course, that goes on with these highly complex devices.

Read part one of this discussion:
Heterogeneous Integration Finding Its Footing
Read part two of this discussion:
What Can Go Wrong In Heterogeneous Integration

Leave a Reply

(Note: This name will be displayed publicly)