More than Moore is off to a good start, but the next steps are a lot more difficult.
Building chips in three dimensions is drawing increased attention and investment, but so far there have been no announcements about commercial 3D-IC chips. There are some fundamental problems that must be overcome and new tools that need to be developed.
In contrast, the semiconductor industry is becoming fairly comfortable with 2.5D integration, where individual dies are assembled on some kind of substrate that is used to interconnect them. Many new technologies are being developed, and these are coming from multiple directions. Tools and flows are being created by EDA companies to help automate and optimize aspects of these flows, with additional verification tools required to deal with new physical effects that are found. Slowly, as problems get solved, the costs will come down and more people will adopt it.
But this is only chapter one of More than Moore. No longer is the industry solely focused on increased levels of integration. It now is tackling opportunities involving disaggregation within the package. To provide long-term gains at a similar rate to Moore’s Law, chips must go vertical. Heterogeneous 3D-ICs are the real goal, with 2.5D being a learning technology with training wheels.
There are good reasons why full 3D was not attempted first. “The top three problems are thermal, thermal, and thermal,” says John Park, product management group director in the Custom IC & PCB Group at Cadence. “We can stack these things all day long, and you see examples of L3 and L4 cache getting stacked on logic. That is only possible because cache doesn’t generate a lot of heat. We also see examples where they take full wafers and stack them, but those require exotic fluid-cooled packages. For the right type of environment, we are already seeing multiple levels of stacking, but power becomes a challenge. Power is tied with thermal. How do you get power to it? And how do you dissipate all the heat being generated as you start building this chimney stack?”
That’s made worse by the fact that the target market, at least today, is generative AI in data centers. “Especially when we’re looking at the data center infrastructure space, the power consumed by these processes is humongous,” said Sudhir Mallya, senior vice president of corporate marketing at Alphawave Semi. “Stacking the processor with other chiplets is a technology problem that has not been solved. That’s why we’re still seeing a lot of 2.5D. With a high-bandwidth memory (HBM) stack, all the memories are identical in size, and the power is identical. So from a thermal management and reliability point of view, that’s much easier to solve compared to 3D-ICs, where you have different sizes of chiplets and different power coefficients.”
Shekhar Kapoor, senior director, product line management at Synopsys, points to additional challenges with 3D stacking. “Complex as it is, 3D represents the future, and the ecosystem must evolve to enable it. Complexity can be further simplified by two key factors: a common language and clear rules. Commonly accepted terms for the components of a 2.5D or 3D design bring uniformity to the proceedings and make it easier to construct a system with multiple partners.”
Memory on logic has been the poster child for both 3D development and 2.5D integration. “HBM is DRAM stacks placed on top of a controller and connected to a processing system via a 2.5D interposer,” says Marc Swinnen, director of product marketing at Ansys. “The power and performance of an HPC architecture is often gated by the time and energy required to transport data in and out of memory. Typically, these performance factors improve as the memory is placed physically closer to the computation unit. HBM brought the memory inside the package, but it can be brought even closer to the processors. Closer memory has typically meant smaller (less capacity) and more expensive. With 3D, you can place a large-capacity memory chip right above the logic and connecting them through thousands of very short micro-bumps in the z axis. This seems like a very attractive solution, which is being explored by design teams.”
Memory on logic may well be chapter two of More than Moore, but chapters three and beyond start with logic on logic. “True 3D is when you are turning it into a place-and-route problem,” says Tony Mastroianni, advanced packaging solutions director at Siemens Digital Industries Software. “Taking a large netlist and letting the tool do all the planning and implementation of each of the chiplets.”
Fig. 1: 3D-IC concept. Source: Siemens EDA
“Memory on logic is relatively straightforward; stacking logic on logic requires 3D awareness at the system level to achieve optimality,” said Synopsys’ Kapoor. “Starting from the system level, partitioning the design into different floors, and synthesizing to technology process nodes and materials bring new challenges as well as tremendous performance and power gain opportunities.”
HBM has been a learning experience. “Even after several iterations on this product, the manufacturing costs are extremely high,” says Andy Heinig, head of department for efficient electronics at Fraunhofer IIS/EAS. “From a design perspective, the HBM is less complex because the placement of the TSVs is very homogeneous, and the positions are also very clear. In real 3D systems, the position of each TSV must be optimized. The routing resources within a chip are extremely high compared to the z-direction, which means the directions of the TSVs. This imbalance of resources requires partitioning strategies that are not available today, as there is a dependency on the system architecture. Only optimized system architectures can be partitioned by the tools in the right way. On the other hand, there are no standards in this area. This means that all parts of a real 3D system must be designed by a team, which means that only systems with a large volume can be designed in terms of NRE costs.”
This is more than just dividing logic between dies. “What if you have two dies that are face-to-face? One die has six metal layers, and the other has eight,” says Cadence’s Park. “Potentially you have 14 metal layers to share. Super advanced routers could consider utilizing all of those routing channels to connect two flops on the bottom die. If I run out of routing channels on the six layers, I may need to go up and use a routing channel on that other die and stitch that back in. There are many things necessary to automate and create higher-performing 3D-ICs.”
That also elevates optimization to a very complex system-level problem. “What if you are optimizing for costs? Die size becomes variable,” says Siemens’ Mastroianni. “Although you can build a reticle-size die and stack those, if you’re optimizing the design for cost, you may want to use smaller die. So how do you decide how big it is, and how to partition that logic?”
Floor-planning needs to go to the next level. “We are in the early stages that allow you to automatically optimize where the hotspots are,” says Park. “These are test designs, where people are looking at the next generation of logic-on-logic stacking. The tools we are developing are looking at the heat map of each of those, and starting to optimize for that. We can’t have these overlapping heat stacks that create chimneys. So we might place the hotspot on the bottom die to the northwest corner, and the other die to the southeast corner, and move those around.”
Thermal is a direct result of power, which is a result of activity. “The thermal energy released in the circuit is very dependent on the short- and long-term activity profiles,” says Ansys’ Swinnen. “For example, a short burst of intense computational activity may not raise the temperature enough to be of concern. But if that burst repeats every few milliseconds, then the overall temperature will sawtooth higher and higher until it reaches failure after many cycles. Typically, activity sets from logic simulation are too short to satisfy the needs of the much longer time constants that govern heat conduction. This is a difficult problem, compounded by the fact that there are usually many usage scenarios with very different activity patterns.”
Fig. 2: Thermal profiling in multi-die 3D-IC. Source: Ansys
New abstractions may be required. “One of the approaches that we’re talking about is predictive modeling,” says Siemens’ Mastroianni. “If you do detailed analysis it takes too long. You want to make those decisions up front. If you have simplistic models that run faster, that are close enough, you can start iterating and making a lot of early decisions before you start nailing down your architecture. That is outside of the place-and-route tools. We’re even looking at things like thermal and mechanical stress pre-layout, just having power estimates so we’re designing that up front. As long as we keep the whole power under critical levels, the place-and-route tool doesn’t really have to try to solve that part of the problem. You’re constraining it up front.”
Park agrees. “You can’t wait until place-and-route is done, stick it all together to find out it’s going to burn up. Thermal tools have moved to the planning stage. Or we could time things in a certain way. Within the design, we could shut down part of the chip with when something else is happening nearby in the 3D stack. We have thermal sensors. Will we ever get to the point where you can blindly do all this? No, but I think we’re getting close to the point where using the tools, coupled with people with expertise, we can start to scale up to look at four or five die in a design.”
There are some big challenges. “It is not just the size of the problem that changes, it is the nature of the problem that changes,” says Swinnen. “The challenge is that we have a chip team, a package team, a system team, where they deal with different scales, different tools, different languages, different formats. They all come crashing together with 3D-IC. They have a multi-scale problem, and the tools are not all ready for that. There are several orders of magnitude from the device level of the transistors up to the system level.”
So why push into 3D-ICs? “We made a huge leap by going from discrete packaging to 2.5D, where you’re transferring signals over an interposer,” said Alphawave Semi’s Mallya. “That significantly lowered impedances and resistances. But even then, things like UCIe and die-to-die create signal integrity challenges and limit the speed you can get out of these things and the number of parallel blocks you can put together. With 3D, the bandwidth will be tremendous and you get rid of the interposer.”
Packaging and stress
Exactly what 3D systems will look like remains uncertain. “If you look at a technology like an Intel EMIB, they do the die-to-die connections on a little embedded bridge,” says Park. “Then they do the die-to-the-outside-world on the laminate. You have to look at using micro-bumps for the die-to-die connection, and C4 bumps in the other areas. They are fanning out to a more solid connection and to have a more reliable product. This is why you often see tiers of packaging, because if we design a die and it’s at a C4 flip-chip pitch, we have a lot of flexibility. We can do that on a standard package. We can do that on a silicon interposer. But if we design a little chiplet and we put it at 45-micron pitch, that limits our flexibility in how we package it. We have to go to some sort of silicon bridge or a silicon interposer. In the early planning stages, when you are figuring out your die-to-die interfaces, it can work either way. You can get a PHY that’s for standard packaging at 130-micron pitch, and you can get it for advanced packaging at a 45-micron pitch.”
This ties together issues of reliability and thermal. “Heat is terrible for product reliability and longevity,” says Swinnen. “Not only do materials degrade faster at high temperatures, but thermal cycling (and differential thermal expansion in the 3D-IC assembly stack) leads to mechanical stresses and warpage. These are recognized as contributing to both of the top-two killers of electronic systems in the field — thermal failure and failed electrical connections. Having hundreds of thousands of micro-bumps on a 10-micron pitch is wonderful for system density, but these are very delicate connections that cannot withstand shear stresses or carry much current. System reliability is a serious problem for complex 3D chip stacks. 2.5 integration has the advantage of limiting the mechanical interactions to just chip-to-interposer. 3D stacks have significantly more complex interdependencies.”
But does that get worse for 3D stacking? “It’s actually more challenging for 2.5D, because if you have a large silicon interposer sitting atop of a large substrate, those are huge and you had different thermal coefficients of expansion,” says Mastroianni. “That is why you have warpage issues. If it’s a single die, or even a stack die, you’re limited by reticle size, so you’re never going to have a chip bigger than a reticle. You don’t have those extreme things. And it’s all silicon, which has the same thermal coefficient. Now you still have thermal expansion, and you are going to have different temperatures throughout the slice, so you have to do the analysis.”
It may get worse for heterogeneous stacking. “The benefit of stacking, if these are all CMOS designs, is we do have a nice CTE match,” says Park. “When you stick a die on an interposer, on a package, we don’t have a nice clean CTE match. Even though we’re going to go denser and tighter pin density as we build a stack, we have better CTE matches across those. But if you start mixing the technologies of the material, where the CTE may not match as well, that adds additional problems. If we are just mixing nodes, I don’t think that’s going to be a big technical challenge.”
This all ties in with the vast optimization space. “One way you deal with warpage type issues is with your connectivity structures,” says Mastroianni. “You can control your pitch, your spacing, and you want nice uniform things on the interface. Big gaps can cause things to warp, but this can be dealt with mechanically in how do you design your bump structures.”
Some of these problems cannot be avoided. “Photonics is largely a collection of point tools today, which means that much of it tends to be fairly manual,” says Chris Mueth, business development, marketing, and technical specialist at Keysight. “Being very physics-based in the structures that they model and simulate, they are different from many of the things they integrate with. Getting that to play in an electronic optical system requires your electrical engineers to work with your optical engineers. They have to get integrated, and these problems have to be solved and well understood. That is not easy and you will probably see a lot of work in this area to break down those silos. That has to happen before we can even think about integrating this into the system level floor planning and optimization tools.”
Additional demands create new challenges. “One of the goals of the DARPA three-dimensional heterogeneous integration program is integrating diverse technologies,” says Mastroianni. “One application is putting 6G-type speeds, 100 gigahertz right on top of logic. You can’t treat that as a separate die. You’re going to get electromagnetic coupling between them, so you cannot analyze them standalone. You have to analyze the composite die to do that analysis. That requires a different set of tools. Electromagnetic coupling, is something that’s going to be much more challenging.”
Heat extraction
The industry only recently has developed tools that can effectively analyze thermal. “There are thermal analysis tools that work at the die level, so we could do the analysis,” says Mastroianni. “However, they are not fast enough to put in the loop of a place and route program. So how you mitigate it is going to be the challenge. It is very context-dependent, and that heat is rising, so you can’t just sell a standalone wafer that’s going to stack up with other stuff because they all have to play together.”
There is a limit to how much heat a standard packages can remove. “It is very difficult to cool a 3D stack without spreading it apart to make room for cooling fluids,” says Swinnen. “But this reduces the benefits of the assembly. The solutions have been to adopt expensive cooling schemes, including liquid cooling, and to embed thermal sensors across the chip that throttle back the clock frequency if it gets too hot. The slower clock implies a downgrade of the performance characteristics. So using your chip heavily causes it to slow down to prevent thermal runaway. Overall, power management is the number 1 limiting factor for achievable 3D circuit density.”
Others agree. “It’s a DARPA hard problem,” admits Mastroianni. “Thermal is probably the biggest challenge for automation and tools. DARPA understands that it is a big challenge, so there’s going to be a lot of money and research put into solving that problem.”
[Editor’s Note: A second part of this article will delve into other issues that have to be solved including tool flows, dealing with connectivity, timing, and variation.]
Related Reading
3D-ICs May Be The Least-Cost Option
Advanced packaging has evolved from expensive custom solutions to those ready for more widespread adoption.
True 3D-IC Problems
Stacking logic requires solving some hidden issues; concerns about thermal dissipation may be the least of them.
Leave a Reply