Multi-die assemblies offer more flexibility, but figuring out the right amount of customization can have a big impact on power, performance, and cost.
The semiconductor industry is buzzing with the benefits of chiplets, including faster time to market, better performance, and lower power, but finding the correct balance between customization and standardization is proving to be more difficult than initially thought.
For a commercial chiplet marketplace to really take off, it requires a much deeper understanding of how chiplets behave individually and together. There needs to be a consistent way to connect chiplets to each other and to various other components, to characterize them so they can be re-used across multiple designs, and to package and test them. On top of all of that, there needs to be a way to accomplish all of this more easily at the very outset of the design process. And while this has some similarities to the soft IP market, the shift to what is essentially a collection of hardened IP requires more structural and thermal analysis, more physics, and a deeper understanding of how everything will be packaged and ultimately used.
“Each chiplet is a standalone piece of silicon, but it’s also a subsystem inside that major system, and this is a unique creature because it is not like a subsystem you have in an SoC,” said Moshiko Emmer, distinguished engineer at Cadence. “It has to be independent to some extent. You’re going to tape out it separately. You’re going to have silicon back. You want to at least test and debug it thoroughly before you integrate it to the major system, which means that it must have some standalone capabilities, or all the control functions, so there is some architectural sophistication that is required here.”
Standards are sparse today when it comes to multi-die assemblies, most of which are being developed today by large systems and HPC processor companies using mostly internally developed chiplets. That is expected to change in the next few years, but it will depend on the proliferation of more standardized chiplet integration schemes so that not everything needs to be developed from scratch.
“If you look at architectural standards, for example, the Arm chiplet system architecture (CSA) is an important factor for the architecture communication between two chiplets,” Emmer said. “UCIe is the physical interface that allows us to do this communication, and you can design a 2.5D and 3D chip with chiplets without UCIe. The problem with lack of standardization is that you can build custom solutions, like the big hyperscale companies are building, which gives them a lot of flexibility because they can do whatever they want as long as it physically connects and aligns to some architectural spec that they define. They can communicate between two different chiplets. They can do it 3D. They can do 2.5D. They can do different types of integrations if it’s multiple chiplets.”
Standards will help democratize this approach. “Standardization allows for economies of scale,” he said. “You can have many more players getting into the game. We have many companies playing the silicon game, especially compared to, say, 20 years ago, and you can see something similar happened with software. Software was driven by big companies, and then everybody had a computer, like two fraternity kids in university setting. They sat in a garage and invented Google. You don’t see that in silicon enough. It’s much harder because you need more money. On the other hand, chiplets with standardization allow smaller players to get into the game, as well as big players that are not doing silicon today.”
Chiplets also open the door to more industry partnerships. “In theory this is a great idea, because if I don’t need the cutting-edge process technology to do certain functions, then I could build a chiplet on an older process technology,” said Steven Woo, fellow and distinguished inventor at Rambus. “One example is with memory standards. DDR4 will be out in the market for 10 years, so the range of speeds is well-defined, and after a while it’s not going to get any faster. So, I don’t really need the cutting-edge process technology to build the memory controller and the interface and all that. Maybe I can put that on a chiplet and leave it at an older process node. And then, since the standard specs are not changing, why do I need to do anything?”
An ongoing challenge is how to connect everything together in a standardized way that is almost certainly going to work, but without excessive overhead. “It’s not like there are a lot of standards out there that the industry is adhering to widely,” Woo said. “Of course, there are BoW, UCIe, along with a lot of other proposals. But when the industry finally gets together and settles on one or two, that’s what will enable a more general kind of chiplet marketplace. If you’re a vertically integrated company like an Intel or an AMD, you can put in whatever makes sense for you. But if you’re talking about a chiplet marketplace, you’ve got to have these standards in place.”
Choosing which standards to use can make a difference as to how a device is architected and which tradeoffs are made. “Designs for 2D (organic substrates) and 2.5D (silicon interposers/bridges) horizontal chiplet connections use similar die-to-die interfaces, such as BoW and UCIe, and established thermal/mechanical analysis tools,” said Kevin Donnelly vice president of strategic marketing at Eliyan. “However, the interfaces for vertical 3D connections are completely different, with vastly simpler electricals but much more rigid physical form factors, and hugely challenging thermal and mechanical constraints. For example, early designers of custom HBMs are frustrated by the lack of ability to include desired logic in base dies due to the thermal constraints of the DRAM stacks above.”
This adds a new twist to chip design. “In the past, it was important to think about those things, but it wouldn’t necessarily be on the drawing board from Day One,” Woo said. “Now these things are on the drawing board from Day One, and it affects things like your package and how many I/Os you have available, because I/Os have just become so much more important to consider. Also, what we see generation to generation now is the physical effects are becoming key drivers of architectures. So physical limitations — things like thermals, power delivery, I/O counts — are in many ways physical limitations on the placement and how you’re going to do things like cool it. This means you really have to think about those things ahead of time or you can get yourself into big trouble down the line. It’s not like the industry didn’t work together in the past, but this is causing the industry to work together even more to make sure the architects up front understand what’s going to be available in two years when it comes to market. Technology-wise, if advanced packaging is not the most important thing going forward, it’s definitely one of the most important things driving and enabling a lot of the positive change in the industry.”
That puts extra emphasis on early feasibility and exploration. “In the past, PCB designs could just go through the motions,” said Keith Lanier, director of product management at Synopsys. “People working on the system architecture level were doing it with spreadsheets. Maybe they had MATLAB models or something like that, and they had their own way of figuring out if the system was going to work or not from the architectural level. Those days are well behind us. We have much better tools to be able to look early on and have a physically aware, functional architecture design. The key is that even before you write a single line of RTL, you have to start looking at the workloads you need to apply to the system. You need to use your functional architecture to drive physical, then bring the physical data back to adjust the functional architecture early on.”
One of the advantages of chiplets is they can be used to adapt designs to specific use cases and workloads. But for mainstream applications that rely on standardized chiplets, that flexibility and customization will depend on how strictly chiplet standards are written.
“Data centers have different requirements compared to automotive or industrial implementations of chiplet systems,” said Andy Heinig, head of department for Efficient Electronics at Fraunhofer IIS/EAS. “Automotive and industrial applications don’t need to be the most energy efficient, while the data center needs to be very, very efficient. But protocols like BoW and UCIe are not efficient enough. So, if you do your own implementation, it can be much more efficient because you don’t have to support things you don’t need. This is a big problem from the data center perspective.”
Chiplets developed by large systems companies are designed for maximum performance or efficiency, not interoperability with devices outside their target application. But the rest of the market typically wants chiplets that are interoperable and cost-effective, which puts them at a power and performance disadvantage.
“Currently, it seems the development — especially on the UCIe side — ends up in very expensive IP,” Heinig said. “They have a lot of modes they have to support. With some of these higher communication layers, if you think that you can use PCIe over UCIe, then you need PCIe IP, and that’s very expensive. That makes the whole communication IP very expensive, and this is something we see currently. We were expecting in the beginning that the die-to-die interface would be on the low-cost side so everybody could use it, but now you get this very expensive IP, which makes it very difficult for industrial applications to use. This is also true in automotive, where they are very cost-driven and really look into whether the IP fits their requirements from a cost perspective.”
BoW is potentially lower cost, but it lacks the interoperability breadth of UCIe. “We see this currently as a chicken-and-egg problem,” Heinig said. “We need more demonstration of prototypes so we can figure out what is actually necessary, because certain of these developments were done in PowerPoint with some requirements that somebody was writing down about what they expected in the future, but not really coming from a clear applications perspective. This is what we see on other protocols. They were developed step by step, generation by generation, and only what was necessary was put in. And here our feeling is that a little bit of everything was put in the standard, and to sort it out afterward is very hard.”
Choosing protocols is an important decision when it comes to chiplets. “UCIe has protocols for the board, and even though the chiplets are close together — even if it’s just 4 or 5 millimeters away — that’s still a reasonably big wire to drive from chip-to-chip if you want high-speed communication,” said Marc Swinnen, product marketing director at Ansys. “What we’re seeing is UCIe seems to be the most commonly used one, but BoW is being used, as well, along with some others. EDA vendors are starting to roll out UCIe development/utility kits specifically for targeting those designs. But proprietary ones such as NVLink from NVIDIA are still being used, and they are very much an important part of the whole design of the chiplet ecosystem.”
A big challenge now is to weigh as many tradeoffs as possible at the outset, and then develop a plan that optimizes whatever is needed for a particular application.
“Customers and users who are working on designing these in the industry are constantly doing the tradeoffs,” said Suhail Saif, director of product management at Ansys. “It is a day-to-day thing. They want to hit a sweet spot where they don’t compromise too much on the performance and the capacity and over-optimize on power, but at the same time keep the power envelope in check so the rest of the system is not burdened by it. That is a constant challenge, where even the power optimization solutions in the industry are always maintaining that fine balance between not optimizing enough and over-optimizing. The EDA solutions have the capabilities of providing the design team with suggestions that would drastically improve power, but what they lack is being able to cleverly analyze the impact on performance or functionality of the design because they really focus on power as a singular metric. That is always a challenge.”
Flexibility in chiplets
One of the big attractions of chiplets is the flexibility they can provide for design teams. Being able to swap out one chiplet for another without redoing the entire multi-die design is a big win for time to market and targeting specific workloads and applications, but so far that capability has been limited to a few chipmakers.
“The microprocessor companies have definitely taken advantage of the flexibility that a reusable piece gives,” Ansys’ Swinnen said. “There are parts of the system they don’t have to redesign between different products because they do the same thing. They just plop down the same chiplet. That’s being used for multi-CPU versions. You can have an 8-, 12-, or 24-CPU version where you just add more chiplets. The reusability is an important part of it, and the flexibility it gives you in a product design is certainly something they take advantage of. So, there’s a lot of interest in making sure it’s as reusable as possible. There’s always the tradeoff around whether you build it completely custom from the start or you re-use it. Look at Apple. They have an Arm license, but they don’t take Arm’s pre-designed version of the Arm architecture. They design it themselves and optimize it to the max. But most people are better off just taking the one that’s been optimized by Arm as soft IP. It’s always a tradeoff. You can always push it to the edge and re-do it yourself for the ultimate in optimization speed, but is it worth the time and effort for the benefit it brings you in the cost? Or, are you better off just re-using the chip, even if there’s a certain cost to having a reusable piece?”
Other concerns, apart from performance and power, include reliability and security. “In terms of reliability, look at the USB interfaces,” Swinnen said. “Nobody’s going to design their own USB. It’s securely designed, and you know that you don’t want to take the risk in having to verify your own design through all the possible permutations. It’s safer to take an existing design because you know that it works. The chiplet market is not fundamentally different than the IP market in its concept. The details are more complicated and there are more issues to look at, but I don’t see a fundamental reason why we can’t overcome those, just as we did with IP. The reasons we went to IP still hold for chiplets.”
How many chiplet standards are needed is unclear, however. “That’s certainly in full discussion and development now,” Swinnen said. “The standard has to be more than it is today. There’s the signal interfacing standard. There is going to have to be a thermal standard, which was never the case with IP blocks until now. There’s going to have to be a mechanical orbit standard. There’s also a thickness standard. You see that even with today’s 2.5D stack, where some of the chips are thicker than others, and they have to put in little dummy pieces of silicon on top of them just to make a smooth surface for the heat sink to connect to. So there are more issues that need to be standardized than under regular IP, but it’s just a continuation of the same principles, only with more physics.”
What comes next
In the short term, there is some low hanging fruit that can be addressed to move the chiplet approach along involving security, test, power, and clocks. The longer-term issue involves figuring out how different chiplets will interact with each other.
“For some of these, you have to make the chiplets more autonomous,” said Pratyush Kamal, director of central engineering solutions at Siemens EDA. “Eventually the boundary between the chiplet and the classic definition of chips will blur. We tend to trap ourselves by saying that a chiplet is an entity that requires advanced packaging. It needs to sit with another chiplet. But a chiplet can sit outside of a package, too. When I look at the chips of today, they are designed to work independently. Once all the standardization is in place for chassis, that’s how the chiplets of tomorrow will slowly start to look. I’ve just changed my definition of ‘chiplet’ to include two things. One, it needs to have a higher bandwidth interface with another chiplet — higher than what you typically would have. This is bandwidth that is comparable to its on-die performance, on-die buses. Two, either it’s dependent on, or it’s responsible for, managing resources of another chiplet and the chassis resources. That is what makes it a chiplet in the end, because even when we do the standardization tomorrow, the tasks will still be delegated and decided by a few chiplets, not all of them, and there will be hierarchy to it, to the architecture.”
This means in creating the microarchitecture, provisions may be needed for some things that may not exist today and that can be added in later. “This could be allocating additional register space, locating additional one-time programmable memory space for some of these applications that may come later,” Kamal said. “Once you move into the software layer, everything is flexible, and you can do so much. But when the chip is still being manufactured or being still tested, there is no software loading. Think very carefully when you’re designing your chiplet. How would you be talking to another chiplet in the absence of software in bare metal mode? How will you be doing it without any programming? That’s very critical.”
Related Reading
Chiplets Add New Power Issues
Well-understood challenges become much more complicated when SoCs are disaggregated.
Signal Integrity Plays Increasingly Critical Role In Chiplet Design
Chiplet design engineers have complex new considerations compared to PCB concepts.
Leave a Reply