Multi-Die Design Pushes Complexity To The Max

Continued scaling using advanced packaging will require changes across the entire semiconductor ecosystem.


Multi-die/multi-chiplet design has thrown a wrench into the ability to manage design complexity, driving up costs per transistor, straining market windows, and sending the entire chip industry scrambling for new tools and methodologies.

For multiple decades, the entire semiconductor design ecosystem — from EDA and IP providers to foundries and equipment makers — has evolved with the assumption that more functionality can be added into chips and packages, while improving the power, performance, and area/cost equation. But as the ability to pack all of this functionality into a single die or package becomes more difficult, the complexity of developing these devices has skyrocketed.

With advanced packages estimated to pack in 100 trillion transistors in the not-too-distant future, keeping a tight rein on power, performance and area/cost (PPA/C) requires significant shifts at every turn in the design-through-manufacturing flow.

“Today the industry is not ready, but we are going toward that,” said Sutirtha Kabir, senior architect for R&D engineering at Synopsys. “What do we see as steps between today and that year, whether it’s 2030 or sooner? Assuming that you take an SoC and you fold it [a simple 3D-IC analogy], assuming that all you’ve done is put them into two die with the same functionality, but nothing else has changed, your transistor count hasn’t changed, but what you have done in this process is add the interface between these two chips, whether it’s bumps or HBIs (hybrid bond interconnects).”

Designs previously done on a single chip become more complicated just by virtue of the fact that functionality is now spread across multiple chips or chiplets. “It’s basically how much more difficult is it to do the task that was previously done,” said Ron Press, senior director of technology enablement for Tessent silicon lifecycle solutions at Siemens EDA. “Remember the famous quote from Bill Gates in 1981, ‘640Kb of memory ought to be enough for anybody.’ That was appropriate then. Complexity is what drove the advent of EDA. Once a task gets too difficult to perform with traditional methods, then some type of abstraction and automation is necessary. From early electronics, this drove programming languages compiled to designs in silicon and many EDA tools. So the definition of complexity is always relative to the current state-of-the-art capabilities.”

This is compounded by higher data rates, which constitute yet another metric for complexity.  “If you look at data rates versus time, for 2G, 2.5G, 3G, 4G, 5G, the data rates they support scale about the same as Moore’s Law, which is another confirmation [of growing complexity],” noted Chris Mueth, director of new markets management at Keysight. “2G phones way back when were a collection of components — transistors, little modules, and discrete components. The phones were jam-packed with electronic components, and there wasn’t a lot of room for additional functionality. But now everything’s integrated. The modules are almost the size of an IC chip way back when, and that has everything inside of them. 3D-ICs are going to take it to the next level.”

That also raises the verification challenges significantly. “At 2.5G a phone might have 130 specs, whereas a 5G phone might have 1,500 specs you have to verify,” Mueth said. “There are now a lot of different bands, a lot of different operating modes, different voltages, digital controls, all kinds of stuff, and you have to verify every single thing before you ship it because the last thing you want to do is miss something when it’s already in a phone.”

This all adds up to massive increases in complexity, and it’s wreaking havoc on long-held chip design methodologies.

“Single die designers used to worry about these things, but it was more of a packaging issue,” said Synopsys’ Kabir. “Let the packaging folks worry about that. The die design team would just work up to the pins. RDL bump connections would somehow happen. But now, because the signal-to-signal connection is through these bumps between dies, the die designer has to worry about that. What we are seeing this year is that we started with millions of bumps, and very quickly the bumps now number around 10 million, with the expectation that in two to three years multi-die designs will contain 50 million connections with HBIs.”

Others agree. “In the many years that I’ve worked in this industry, I’ve always felt that we were solving the most complex problems at that time,” noted Arif Khan, senior product marketing group director for design IP at Cadence. “Moore’s Law held for monolithic systems, until it came up against the reticle limit and process limitations. Transistor density did not scale linearly as process technology advanced, while our appetite for increasingly complex designs continued unabated, pushing us against the physical limits of the image field in lithography (the reticle limit). NVIDIA’s GH100 design is estimated to have over 140 billion transistors, with a die size of 814mm2 in 4nm.”

Fig. 1: A complex generic design flow. Source: Cadence

Fig. 1: A complex generic design flow. Source: Cadence

Shrinking in multiple dimensions
As advanced process technologies become more complex to fabricate, the wafer cost outpaces the historical norm. When coupled with the declining transistor scaling with each new generation, the cost per transistor at each successive leading-edge node is higher than the previous generation.

“This creates a conundrum for design, where it is much more expensive to design and manufacture in newer process nodes,” Khan said. “Larger designs naturally yield fewer dies per wafer. When random defects are factored in, the yield fallout is greater when the die size is larger, a greater fraction of a smaller denominator is unusable, unless those dies can be repaired. As process technology evolved beyond 5nm, extreme ultraviolet technology hit the limit with single patterning. High-numerical aperture EUV, which is coming in to play now, doubles the magnification and allows smaller pitch sizes, but has the effect of shrinking the reticle size by half. Consequently, the ever-more complex and larger designs of today have no choice but to disaggregate, and chiplet technology is the holy grail.”

At the same time, there is more focus on adding new features into designs, where the chief limitation was the reticle size. This adds a whole new level of complexity.

“Everything was clock speed and performance in the good old days of IBM mainframes and Intel/AMD x86 servers,” observed Ashish Darbari, CEO of Axiomise. “Thanks to the Arm architecture, from the late ’90s onward, power became the dominant push in the industry, and with chips being squeezed into smaller form factors such as mobile phones and watches and miniaturized sensors, performance along with power and area (PPA) determined the quotient of design complexity. 72% of ASICs are reported to manage power actively, and power management verification is a growing challenge, as reported by the Wilson Research report from 2022. However, with the rapid adoption of silicon in automotive and IoT, functional safety and security dominate the design complexity. You cannot design a chip without thinking of PPA — and one or both, safety and security.”

According to Harry Foster’s Wilson Research report, 71% of FPGA and 75% of ASIC projects look at security and safety concurrently. With the advent of Meltdown and Spectre (2018), and a continued array of chip security flaws, including GoFetch in 2024 — security issues are proving to be a direct outgrowth of design complexity. This is made worse by the fact that security vulnerabilities often originate from performance-enhancing optimizations such as speculative pre-fetch and branch prediction.

“To enable low-power optimizations, designers have used selective state retention, clock-gating, clock dividers, warm and cold resets, and power islands that pose verification challenges around the clock and reset verification,” Darbari said. “Multi-speed clocks introduce challenges around glitches, clock domain crossing, and reset domain crossing.”

While compute performance always has dominated the design landscape, it’s now just one of many factors, such as moving and accessing a growing amount of data generated by sensors and AI/ML. “HBMs are one of the cornerstone items of AI/ML chips, which is pretty much where our industry is heading,” Darbari said. “If you look at the broader spectrum of design complexity going beyond the PPA, safety and security, we should note that in the era of hundred-cores on a single die and AI/ML, we are revisiting design challenges of high-performance compute along with minimizing power footprint, as well as optimizing arithmetic (fixed point/floating point) data formats and correctness. Moving data around faster at low power, using high-performance NoCs, introduces deadlock and livelock challenges for designers. RISC-V architecture has opened up the floodgates for anyone to design a processor, and this has led to crafty designs that can work as CPUs as well as GPUs, but the fundamentals of design complexity in terms of PPA, safety and security, along with deadlock, livelock, and compute and memory intensive optimizations, are going to be as much relevant to RISC-V as it is for the era before RISC-V. Plenty of work over the last six years has gone into establishing compliance of RISC-V micro-architectural implementations against the RISC-V instruction-set-architecture (ISA) using simulation for bring-up testing and formal methods to prove compliance mathematically. RISC-V verification, especially low-power, multi-core processor verification, will open up a pandora box of verification challenges, as not many design houses have the same level of verification competency as the more established ones. The Wilson Research report suggests that for ASICs, 74% of the designs surveyed have one or more processor cores, 52% have two or more cores, and 15% have eight or more processor cores — something we see more of in our experience of deploying formal verification.”

How to solve complexity challenges
Approaches to solving complexity have evolved through automation and abstraction that keep building on previous generations of capabilities.

“Over time, more tradeoffs and optimizations are embedded in EDA tools so the user can give less complex ‘intent’ commands and let the tools do the difficult and tedious work,” Siemens’ Press said. “Innovations were necessary to deal with some of the complexity, such as how to communicate between devices and sort data. In the test community, scan was a method to turn designs into shift registers and combinational logic. Scan enabled automatic test pattern generation so an EDA tool could make high-quality test patterns without someone needing to understand the functional design. As data and test time became too big, embedded compression was used to make it more efficient.”

Darbari agreed. “Test and verification have evolved from the days of architectural verification suites of the ’70s and ’80s to constrained random, formal verification, and emulation. Each of the new verification technologies copes with different abstraction levels of designs and, if used correctly, can be complementary. Whereas emulation can reason about functionality and performance at the full-chip level, constrained random and formal are great technologies at the RTL level, with formal being the only technology to build proofs of bug absence. We see an increase in the use of formal verification for architectural verification, as well as in finding deadlocks, livelocks and logic-related bugs.”

Complexity comes in other flavors, too. “You can define complexity by application domains and where things happen in the flow,” said Frank Schirrmeister, vice president of solutions and business development at Arteris. “You can define complexity in terms of the system you’re going to build. Obviously, when you think about systems, you can go back to the old V diagram that gives you a sense of the complexity. Then, you can define complexity along the lines of technology nodes and process data. Also, there is the very traditional definition of complexity, which has been addressed by moving up in levels of abstraction. But what comes next?”

Fig. 2: Complexity growth in SoCs (left) and NoCs (right). Source: Arteris

Fig. 2: Complexity growth in SoCs (left) and NoCs (right). Source: Arteris

The answer is chiplets, but as chiplets and other advanced packaging approaches pick up steam, there are a number of issues designers must contend with.

“Chiplets offer a modular solution to this problem of increasing complexity,” said Cadence’s Khan. “For instance, a complex SoC designed in process node ‘N’ has many subsystems — compute, memory, I/O, etc. Going to the next node (N+1) to add other performance/features will not necessarily provide significant benefits given the limited scaling improvements combined with the other factors (development time, cost, yield, etc.). If the original design is modular, only those subsystems that benefit from process scaling need to be migrated to advanced nodes, while other chiplets remain in older process nodes. Disaggregating the design to match each subsystem to its ideal process node addresses one key aspect of development complexity. In the first go around, there is the overhead of designing for a disaggregated architecture, but subsequent generations benefit significantly with reduced development cost and increased choice of SKU generation. Leading processor companies such as Intel (Ponte Vecchio) and AMD (MI300) have already taken this route.”

This ability to customize chiplets for the ideal power, performance, area/cost is especially important for managing cost and time to market. “New features can be added without redesigning entire chips, enabling designs to hit the market window while maintaining a product refresh cadence that would otherwise slow down with the development and productization time required in advanced nodes,” Khan said. “Nirvana is a chiplet marketplace envisaged by companies such as Arm, proposing a chiplet system architecture to standardize chiplet type and partitioning choices (within their ecosystem). SoC designers will still need to customize designs for their secret sauce, which provides the differentiation in their implementations. Automation will be a key driver in reducing complexity here. The complexity of communication between chiplets has been mitigated to a large degree by die-to-die standards such as UCIe over the past few years. But there are additional implementation complexities that designers must surmount as they move from a 2.5D IC flow to a 3D-IC flow. How is logic partitioned between the various chiplets to provide an optimal partition with direct die-to-die connections with stacked die? The next frontier is to take this complex problem from the user partitioning domain to automated, AI-driven design partitioning. One can envisage a future in which the AI processors of a given generation are the workhorses being used to design the chiplet based processors of the next generation.”

At the same time, chiplets have introduced a new dimension to verification — verifying chip-to-chip communication based on the UCIe protocol, while also understanding the complexities of latency and thermal issues.

Put another way, chiplets are another evolution in growing and scaling designs, Siemens’ Press said. “As with many previous technologies, standards that enable more plug-and-play approaches are important. Designers shouldn’t be dealing with escalating complexity, but rather methods that remove difficult tradeoffs. In the area of scan test, packetized scan delivery can remove an entire level of complexity, such that the chiplet designer only needs to optimize the chiplet design-for-test and patterns. There are plug-and-play interfaces and self-optimized patterns delivery, so the user doesn’t need to worry about the cores or chiplet embedding or I/O pins to get the scan data to the chiplet. The idea is to simplify the problem with plug-and-play methods and automatic optimization.”

How best to manage complexity
Given the number of considerations and challenges to multi-die design, it’s difficult to say complexity will be managed easily. There are, however, some approaches that help.

Axiomise’s Darbari noted that shifting left on verification with the intent of using more advanced technologies, such as formal verification, would make a huge difference. “The use of formal verification early in the DV flows ensures we catch bugs quicker, find corner case bugs, build proofs of bug absences, establish deadlock and livelock freedom, and get coverage to find unreachable code coverage. Simulation using constraints and random stimulus must only be used when formal verification cannot be employed.”

But there’s another side to this. In many cases, complex problems can’t be solved for a whole package of chiplets. “You have to cut it down into pieces,” said Synopsys’ Kabir. “Solve the small pieces, but make sure that you’re solving the bigger problem. In multi-die, that’s the biggest challenge. We’re still looking at, ‘This is a thermal problem. No, this is a power problem.’ But it’s the same die you were designing yesterday. There are examples of when the chip comes back in the lab, they’ve found the timing is off because the thermal or power effects on the timing were not taken into account correctly. The models, and standard libraries are not predicting those, and it can cause a meaningful upset. As a result, designs are being done with so much margin. How can we squeeze this? That also means that multi-physics need to be looked at, along with timing, along with construction.”

Breaking down complex problems into manageable pieces is something chip design engineers are still grappling with. “This is a new beast that I see a lot of people struggling with, and it’s just one of the challenges of complexity, even without going to atomic levels,” Kabir said. “What is the design flow for this? What’s the chicken? What’s the egg? Which problem you solve first? And not only that, how do you make sure that it stays solved as you’re going through, and all your different chips are coming together? No single company knows how to do that, and we have to figure it out collectively. Everybody’s going to bring different solutions, and that’s where there’s a lot of opportunity for AI/ML, tools, all coming together.”

Keysight’s Mueth agreed. “It’s absolutely a multi-discipline challenge. Your digital designer has to talk to your RF designer, who has to talk to your analog designer; a chip guy to packaging guy; a thermal analysis, vibration analysis. It’s a multi-discipline world in hierarchy, because now you have your systems and systems of systems. You have components underneath. It’s really complex. There are like four different dimensions, and then you have to look at it across the engineering lifecycle. It’s amazing people can get anything done sometimes.”

That may be an understatement. While complexity is exponentially increasing, the workforce is not increasing commensurately. “Engineering tenure in the United States is 4.5 years, on average. In Silicon Valley, it’s 2.5 years,” Mueth added. “And when they exit, they exit with all that design knowledge, tribal knowledge, company knowledge, and you’re left with holes. So, you’d really like to have ways to digitize your processes to lock them in, and lock in that IP that you developed. You’ve got to find a way to scale or close the gap between the workforce and the complexity, which includes looking at new ways of automating processes. We already see a lot of scrambling to develop mega platforms. But we already know that mega platforms don’t encompass everything. They can’t. There are too many variations, too many applications. The solution is a combination of application-specific workflows, peripheral engineering management, and peripheral processes, because engineers do not spend 100% of their time simulating, nor even designing. They spend most of their time on peripheral processes, and those are woefully not automated.”

Related Reading
3D-IC Intensifies Demand For Multi-Physics Simulation
New challenges are driving big changes throughout the design flow, from tools to job responsibilities.
Commercial Chiplet Ecosystem May Be A Decade Away
Technology and business hurdles must be addressed before widespread adoption.
Chiplets: 2023 (EBook)
What chiplets are, what they are being used for today, and what they will be used for in the future.


Mahdoum says:

Excellent article.

Leave a Reply

(Note: This name will be displayed publicly)