Why Chiplets Don’t Work For All Designs

Getting this wrong can increase power and cost, while reducing performance.

popularity

Experts at the Table: Semiconductor Engineering sat down to discuss use cases and challenges for commercial chiplets with Saif Alam, vice president of engineering at Movellus; Tony Mastroianni, advanced packaging solutions director at Siemens Digital Industries Software; Mark Kuemerle, vice president of technology at Marvell; and Craig Bishop, CTO at Deca Technologies. What follows are excerpts of that discussion, which was held in front of a live audience at the recent Design Automation Conference. To view part one of this discussion, click here.


[L-R] Ed Sperling, moderator; Mark Kuemerle, Marvell; Craig Bishop, Deca Technologies; Tony Mastroianni, Siemens EDA; Saif Alam, Movellus.

SE: What kinds of problems do you encounter when working with chiplets?

Kuemerle: There are lots of challenges. You don’t build something with chiplets just because you want to use chiplets. You have to find the right application that really needs it, because it does drive a lot of complexity. We end up adding area to support interfaces. And we add complexity just to do signal routing on the package if you create multiple blockages in different places through the use of templates. It’s not always a good payoff for doing that, so you’ve got to make sure you’re solving a real problem.

Mastroianni: In a previous life, I’ve also done 2.5D designs. One of the things that surprised us is that normally when you do a chip design, you would characterize the package. And once you knew the power on the chip, the problem was pretty much solved. But when you’re putting multiple chiplets in a package, that profile is very different. We had a problem where we followed guidelines from the foundry on how close to place the HBM to the chip based on reliability work that had been done out there. But what we didn’t know was we got thermal coupling from that ASIC. That heated up the HBM and knocked it out of spec. So you really need to start simulating the specs, whereas with homogeneous design it’s not as critical. You don’t have to do warpage or thermal analyses, or deal with a whole other set of problems.

Kuemerle: We’re also pushing things closer together so we can reduce interface power as much as we can. And as soon as we do that, we create huge thermal challenges.

Mastroianni: And when you start stacking vertically, the thermal issues get even worse. Thermal is a big challenge.

SE: Is that any different with chiplets than it is with heterogeneous integration on a single die? Power and thermal are still major issues.

Kuemerle: You can create new thermal problems just by building a multi-chip system. Whenever you add interfaces, you’re adding power to communicate between two chips that would have been monolithic. If you’re using enough bandwidth, you can create hotspots just moving data in between parts.

Bishop: For one AI customer, we were evaluating a design with them that was split into 16 chiplets. And when you added up the interface power between all those chiplets running at a super-high bandwidth, and at not that much less than 2mm die-to-die spacing, with a UCIe protocol, it was still a good 10% or more of the total system power just moving
data from chiplet-to-chiplet in an increasingly confined space.

Kuermerle: And your signal integrity and thermal integrity problems get worse.

Alam: There are some protocols — Bunch of Wires, UCIe, and there are non-standard interfaces. But if you have multiple interfaces, you’re burning a lot of power. You have to account for that thermal activity because of the interfaces for these chiplets.

Mastroianni: But the alternative is having those chips on a board.

Kuemerle: Yes, that’s a lot worse, but they are spread out, at least. So it is a real tradeoff from a thermal point of view.

Mastroianni: You have to find the right size. You can’t make the chip too big, or the yield will be worse. And you can’t make it too small or the overhead dominates. When you’re doing partitioning, you have to include a lot of different factors.

Kuemerle: And there’s definitely a lot of applications where it totally makes sense. I’ve been a chiplet evangelist for a long time. I don’t want to come off as a naysayer. There are a lot of places where you need them. But we just have to be careful that we don’t turn everything into a chiplet. Just because you can doesn’t mean you should.

SE: Do chiplets need to be characterized in more ways than we typically do with other chips? There are factors like noise and variation and potentially uneven aging.

Mastroianni: That’s a big issue. Just looking at a chip, you do have process variation within the chip. But if you’re talking about different chips from different technologies, you have huge variance, and you have to be able to deal with it. The high-speed interfaces manage some of that. And there are techniques such as speed binning. What if you get a slow part and a fast part? So for chiplets, you may want to do different binning. And you may want to do dynamic voltage scaling.

Bishop: And that’s just margins on your chiplet. But now you’re buying this advanced packaging technology to put them together, and that has its own variation and its own corners. And when you’re talking about these very big packages that are integrating a lot of silicon, those processes can vary significantly within your large package size. So even if you have the same bin for all your chiplets, you still may have an issue.

Alam: Say you have heterogeneous chips. How are you going to target a timing margin? Are you going to target a slow-slow part versus a fast-fast part. Or are you going to bin it? How are you going to do that if you have 20 chiplets? It becomes more and more challenging.

Mastroianni: It is, but using standard interfaces is a way to deal with that.

Kuemerle: A critical piece of that is test. We’re putting more and more function together in a package. Now you’ve got lots and lots of dollars with the silicon and interposers and connectivity, and if we don’t have near-automotive-grade test on every one of those die, we’ll end up having a big challenge — 0.9n, where n is the number of chiplets that you have to solve for. That definitely drives a huge focus on test so you can actually make some parts to bring home.

Bishop: That’s not just known good die.

Kuermerle: Exactly.

Mastroianni: It’s chiplets in a chain.

SE: We’re also used to looking at variation in terms of the chips or chiplets, but it also varies by foundry, right? So now you have to take all of this into account early in the design phase.

Bishop: Having multiple foundries also influences whether you’re able to do that integration at all. If you have chiplets coming from competing foundries, and you’re relying on the packaging services from one of those foundries for your assembly, you may find yourself in trouble very quickly. ‘Oh, I can’t bring in a chiplet from this other foundry and integrate it into the packaging solution that I normally use.’

Mastroianni: There are a lot of fixes today in 2D to deal with this. But when you’re talking about 3D, even if it’s all in the same technology you’re going to have variation. And if you’re talking about different technologies, all bets are off. That’s going to be one of the biggest challenges — being able to do timing optimization and power optimization with different technologies. We’ll get there, but it’s not going to happen overnight.

SE: And most people assume when you’re getting a 5nm chip that it’s the same as another 5nm chip. But the reality is that you’ve got various processes even at 5nm with fairly significant variation, right?

Kuemerle: Yes, absolutely. It’s a massive problem to solve. For a lot of the standardization work, if we can align on bump patterns and bump pitches and metallurgy, that’s really going to help enable us to mix and match all these different technologies.

Mastroianni: The whole over-the-wall flow of system design, IC design, package design is a serial process these days. Everything has to be done concurrently — the tools, the disciplines, and the cooperation between the different design teams.

Bishop: The days of your packaging group that you never talked to being in the other building are probably coming to an end.

Kuemerle: We haven’t had those days in a while.

SE: We alluded to this, but how much overhead is there from using chiplets versus putting everything onto a single die?

Kuemerle: From my experience, it can be significant. And that goes into the cost side, as well. Die-to-die interfaces are getting smaller, but as they get smaller people use more of them. That’s a common design problem. If you give somebody an improvement, and they’re going to want more bandwidth. You can see significant die area overhead, on the order of 10% to 15%, just by bringing in interfaces. And when you add the overhead of interposers or connectivity, from a packaging point of view, it starts adding up. If you’re not putting chips right next to each other, you could have gaps needed for underfill, and that drives package sizes up, as well. So they can be pretty significant.

Mastroianni: If you can fit your entire system on one die, you’re going to get the best power, performance and unit cost. But there are limits. You can only build a chip so high, and at reticle size you hit the wall. You have yield issues, as well.

Alam: This is a real problem for large systems, and the tradeoff at a certain point is that it either makes sense to go chiplets or it doesn’t. You don’t want the overhead of interfaces and power if you don’t need it. But at that point there’s also another consideration. If you have monolithic die, when you run holistic timing, you have to do that within a certain window. That means the number of iterations you can do goes down. So a lot of times you break it into partitions in the clock. But that’s another consideration. There are other sign-off requirements that slow down as you get larger and larger modules.

Bishop: In a previous life, I was a software guy, and in the last 10 years the software people have done this transition not for performance and overhead reasons, but for organizational reasons. They take what they call monolithic software, just like monolithic SoCs, and they split it. They call them micro-services. We call them chiplets. But it’s a very similar idea where you partition this big thing into smaller pieces, and then you integrate it back together. And they have a very similar problem where you pay an overhead for these remote procedure calls and for stitching these back together. But a big motivator behind it was when you have a huge organization all working on this together at the same time, it’s much more efficient to divide your organization, and divide and conquer, so that organization can ship software independent of my timeline. That chiplet organization can tape out independent of my chiplet timeline. And on a very large system, that can be a huge impact on your whole design organization.

Alam: When you have a large monolithic SoC, you have 20 IPs going in, but they’re not all coming in at the same time. So if you can isolate the IPs and work on the early ones first and have the late ones lag, that’s what you want. This is not the motivation, but it is key to the design.

Kuemerle: That’s why you see a lot of multi-technology integration happening, as well. If you’ve got IP that you’ve already been able to prove in an n minus one node, you can go ahead and use a more advanced node for your core functionality and be confident that your SerDes IP or your memory interface IP is going to be functional and well wrung out.

View part one of this discussion: Preparing For Commercial Chiplets.

Related Reading
Chiplets: Deep Dive Into Designing, Manufacturing, And Testing
EBook: Chiplets may be the semiconductor industry’s hardest challenge yet, but they are the best path forward.
The Race Toward Mixed-Foundry Chiplets
The challenges of assembling chiplets from different foundries are just beginning to emerge.



Leave a Reply


(Note: This name will be displayed publicly)