Intel, TSMC, and Samsung are developing a broad set of technologies and relationships that will be required for the next generation of AI chips.
Intel Foundry, TSMC, and Samsung Foundry are scrambling to deliver all the foundational components of full 3D-ICs, which collectively will deliver orders of magnitude improvements in performance with minimal power sometime within the next few years.
Much attention has been focused on process node advances, but a successful 3D-IC implementation is much more complex and comprehensive than just scaling digital logic. It requires new materials, and different ways of handling thinner substrates and putting them together. It involves different backside power delivery schemes, various types of bridges, interface standards for multi-die communication, and new interconnect technologies and approaches. And it will require substantial changes in EDA tools and methodologies, digital twins, multi-physics simulation, as well as reorganizations of engineering teams and flows, and the infusion of AI at multiple stages of the design-through-manufacturing flows.
3D-ICs have been on the foundries’ internal roadmaps for more than a decade, but it wasn’t until the rollout of ChatGPT two years ago and the subsequent build-out of AI data centers that full die-on-die stacking really gained momentum. Since then, the focus has been on big improvements in both power and performance, and the best way to get there is by disaggregating SoCs, parallelizing massive numbers of compute elements, and reducing the distance, resistance, and capacitance that signals encounter as they are shuttled back and forth between different processing elements and memories.
Vertical benefits
The goals here are well understood, but some of the technologies needed to get there are still under development. This explains why all of the foundries have announced plans to spend somewhere around $100 billion each in coming years to bring 3D-ICs into volume manufacturing. There are lots of problems to solve, and most of them need to be solved up front and proven in silicon in order to make this work. Just relying on the power, performance, and area/cost benefits of planar scaling is no longer sufficient from a technological or economic standpoint.
“The transistor technology and advanced packaging integration have to go hand-in-hand in order to provide our customers with a complete product-level solution,” said Kevin Zhang, senior vice president of business development and global sales at TSMC. “A 3D fabric technology portfolio has become really important for us.”
It’s well documented that signals travel faster in a planar system-on-chip than between different dies in some type of system-in-package. But while digital transistors still scale, SRAM and wires do not. And at the most advanced nodes, packing everything together onto single reticle-sized die frequently results in low yield and a significant drop in first-time silicon success.
In response, systems companies and leading-edge processor vendors have begun decomposing SoCs and turning them into assemblies of chiplets in advanced packages. Yield is higher for a small, narrowly focused chiplet than a large SoC, and the design costs per chiplet are lower. And in theory, there is no limit to how many can be assembled into a customized package to improve performance.
However, the performance of those multi-die assemblies falls off sharply when data needs to be moved back and forth between memories and processing elements. This is the proverbial memory wall, and it’s a function of distance and the speed at which signals travel over wires. High-bandwidth memory (HBM) works well enough for L3 cache. It’s much faster than standard DRAM due to its wider channels (2,048 lanes with HBM4), which helps reduce resistance and capacitance. But SRAM is still faster, making it the memory of choice for L1 and L2 cache. SRAM typically is configured with six transistors, which dramatically improves access versus DRAM, which uses one transistor and one capacitor. That capacitor is needed to address charge leakage, which sometimes happens spontaneously when DRAM heats up.
Hybrid approaches help, as does stacking more layers of HBM. Samsung, SK hynix and Micron are the only companies manufacturing HBM. Samsung has used that as a springboard to begin customizing HBM for specific workloads. But the optimal solution is more of both HBM and SRAM, and the latest roadmaps from the foundries show a complex mix of different memories with very tight interconnect pitches to facilitate data movement.
Intel’s most recent architectures shows layers of 14A logic stacked directly above a layer of SRAM tiles.
Fig. 1: Intel’s 3D-IC concept with 14A chiplets packaged on top of SRAM with EMIB bridge technology connecting it to I/Os and surrounded by HBM for L3 caching. Source: Intel
“Everybody talks about the memory wall,” said Kevin O’Buckley, senior vice president and general manager of Intel Foundry. “As we’re scaling more and more cores, and driving compute performance higher and higher, keeping the beast fed is the priority. 3D is an example of the way we can use a substantial portion of the die area for SRAM without sacrificing all that area for the compute that’s still required.”
This approach requires a completely different way of putting chips together, though. So does logic-on-logic, which has been on the drawing board for years, but which was largely sidelined due to thermal issues. The goal here is to be able to double the transistor density by adding another level of processing elements and memories and have them behave as a single system.
“We start with face-to-back integration to bring two dies together,” said TSMC’s Zhang. “We also are developing face-to-face, allowing a customer to maximize the interconnect density between the two dies. If you look at the hyper-bonding pitch when we stack the die together, it will continue to shrink from 9 microns to 6 microns, and all the way down to 5 microns and below. The integration will have face-to-back and face-to-face to address different applications.”
Fig. 2: TSMC’s 3D-IC roadmap showing different integration strategies. Source: TSMC
In a presentation last spring, Taejoong Song, Samsung Foundry’s vice president of foundry business development, showed a roadmap featuring logic-on-logic mounted on a substrate, combining a 2nm (SF2) die on top of a 4nm (SF4X) die, both mounted on top of another substrate. This is basically a 3D-IC on a 2.5D package, sometimes called 3.5D. Song said the foundry will begin stacking an SF1.4 on top of SF2P, starting in 2027.
Fig. 3: Samsung’s roadmap for 3D-ICs. Source: Samsung
Vertical limits
Regardless of the layout, thermal dissipation remains the biggest challenge, and it is the most- cited reason why progress on 3D-ICs was so slow. Much has changed since then, and the performance and power demands of chipmakers at the leading edge require a concerted effort to address this issue.
While exact delivery dates for this technology remain fuzzy, all three foundries now display 3D-ICs prominently on their roadmaps. At least part of the solution may be a combination of logic developed at the latest node and N-1or N-2. But the goal is much tighter integration so it behaves as one system, connected through high-speed interfaces to other key components that have been stripped out of the planar SoC.
Multiple solutions for removing trapped heat have emerged over the past few years, not all of which are ready for mass production. Among them:
Designing for data
Increasing the number of transistors in a multi-die assembly also increases the wiring congestion. Advanced place-and-route tools have been able to automate much of that, but they don’t address the problem of getting power to all the transistors, which is essential to maintaining performance. This is why all three of the big foundries either have developed, or are developing, backside power delivery (BPD):
Moving the power delivery network to the outside of a chip shortens the distance that power needs to travel, and it makes it simpler to route signals through various metal layers in a chip. So instead of convoluted wiring, that can now be much more straightforward, particularly between dies that are filled with through-silicon vias and connected with hybrid bonding.
“You’ve got the capability to have thousands and thousands of TSVs between die,” said Mick Posner, senior group product director at Cadence. “That’s fantastic, but they all need 0.003 picojoules per bit, which is tiny. However, when you stuff them all into 1mm2, it adds up. You’ve need hotspot analysis, and managing that power envelope along with whatever else that compute-heavy die is doing will be a challenge. Power density is going to be high already, and we’re already seeing that thermal expansion will pop a stack of die apart. There are lots of challenges. But there’s also the ability to pack performance. And because you can only go so wide, now you’ve got to go up. So why not build a skyscraper?”
That’s the general idea. Yet to reap the full benefits of die stacking, those layers need to be much thinner in order to reduce the distance that signals need to travel. Also, not all of the layers need to be stacked. For example, HBM may designed to surround a 3D-IC logic stack, with high-speed connections to I/O and other memory.
To really speed this up, some of those connections are likely to be optical interfaces and co-packaged optics. All of the major foundries have co-packaged optics on their roadmaps because light is capable of moving data at blazing speeds with less power and heat build-up.
Fig. 4: TSMC plans to incorporate co-packaged optics with its 3D-IC model. Source: TSMC
Fig. 5: Intel’s optical roadmap. Source: Intel
“Optical interconnects offer significant advantages over the traditional electrical I/O,” said Naga Chandrasekaran, chief technology and operations officer and general manager of Intel Foundry, in a recent presentation. “In terms of shoreline density improvements, it offers benefits with bandwidth, latency, and power efficiently. When we can take optical interconnects and bring it to a chip-to-chip level, along with Intel’s advanced packaging capabilities, this solution is going to provide significant benefits in how we can scale up and scale out AI-based solutions. It will provide more denser and advanced interconnect capabilities. Also, in the compute space, we can provide lower latency and higher throughput by having the co-packaged optics solution.”
Like most things in 3D-ICs, this is harder than it sounds. For one thing, light doesn’t go around corners, so waveguides cannot have any right angles. They also need to be smooth, because any roughness has the same effect as line-edge roughness in electrical interconnects. On top of that, light reacts to heat, potentially causing it to shift further than expected under unpredictable workloads.
“The reality of a compute system now is it’s not contained on a board,” said Intel’s O’Buckley. “In most cases, it’s not even contained in a rack. If you look at what some of the largest system companies on planet Earth today are doing, like the hyperscalers or the NVIDIA’s developing their AI systems, connectivity is just as important as compute in allowing them to scale their performance metrics. Copper has been the backbone of our industry for generations and generations, while optics was the thing that connected towns. Now, optics allow terabits of bandwidth to move rack-to-rack coherently, which is critical. Where that connection happens used to be at the switch level. But because of the coherency and the latency these systems require, we’re now discussing driving optics directly to that compute cluster rather than having to go through switches. That’s where the industry is headed, without question.”
At least a partial solution to this is intelligent placement of optical components. “A lot of it comes down to where your laser source is,” said O’Buckley. “Some of the innovation in the optics space right now is that there are elements like MUXing that tend not to be particularly temperature-sensitive. You can put them pretty close to compute. And then for your laser source and some of your sensing devices, you can move those a little further away. Doing some of the optics in that way allows you sort of disaggregate the laser, and that’s something that some companies are choosing to do.”
TSMC’s Zhang said photonics also can be used to reduce the heat in a chip. “In the near future we will see customers using integrated silicon photonics to bring the signal out to connect chip-to-chip. We all know the photon is far more efficient when you talk about signaling than the electron. The electron is wonderful for compute, but signal-wise, the photon is better.”
Zhang said another key option is an integrated voltage regulator, which will further improve power efficiency. “This is very important because the customer, or future AI product, wants to integrate multiple logic and multiple HBM together. Those consume power. If you look at the advanced AI accelerators today, we’re talking about easily 1,000 watts. In the future it will be a couple thousand watts. It’s very difficult to bring the power supply into such a package, so by having an integrated voltage regulator you can lower the current requirement, because the number of bumps is limited. You can’t just send in that much current.”
That in turn reduces the overall heat in the package.
Process scaling
It may seem a bit counterintuitive, but maximizing the performance benefits of 3D-ICs requires continued process scaling. The reason is less about the performance of the transistors — although chipmakers certainly can make good use of that — than it is about dynamic power density. Smaller transistors are more power-efficient, which helps reduce heat and lower energy costs in large data centers. In addition, the transition from finFETs to gate-all-around FETs reduces static leakage, which also generates heat that can become trapped in a package.
Consider TSMC’s forthcoming A14 node, the foundry’s next full node after 2nm. “The scaling benefit of A14 is very substantial compared to our prior generation,” Zhang said. “It’s up to a 15% speed enhancement, power reduction of 30%, and logic density of 1.23X. Overall chip density is at least 1.2X, so this is a very, very substantial technology. The technology also features NanoFlex Pro technology. This is really the result of design technology co-optimization, allowing designers to design their product in a very flexible fashion to achieve an optimum power and performance benefit. This technology is going into production by 2028.”
Zhang noted that the first version of that node will not include backside power delivery, which will not be added until second A14 version in 2029.
Fig. 6: TSMC’s process roadmap. Source: TSMC
Intel’s RibbonFET is that foundry’s name for a GAA FET, which includes some customization options for the “ribbon.”
Fig. 7: Intel’s process roadmap. Source: Intel
Samsung, meanwhile, introduced its GAA technology at the 2nm node.
Fig. 8: Samsung’s process roadmap. Source: Samsung
There are still the usual issues with scaling, of course. Thinner dielectrics can break down more quickly, causing cross-talk and other potential signal disruptions. The same is true for thinner dies in a 3D-IC stack, which loses the insulating properties of a thicker substrate as well as accelerated TDDB. These kinds of issues will have a big impact on how the industry designs and assembles these devices, making routing even more complex and requiring significant more simulation, emulation, verification, and debug efforts.
“3D-IC is the only way you can scale to hundreds of billions and to trillions of transistors,” said Sassine Ghazi, president and CEO of Synopsys, during a recent presentation. “But the moment you start scaling to that level of complexity, the only way you can achieve your performance or power goals is by being efficient at the interconnect level, and architecting that multi-die system efficiently. Dies may be coming from different process technologies, or even different foundries. You have to verify and validate an architecture to deliver this advanced package.”
Future applications
The initial applications of 3D-ICs will be inside of AI data centers, but once the processes are firmed up and the bugs worked out, this approach can be applied more broadly and with more targeted combinations of components. Whether everything requires a full 3D-IC, or only some of the core pieces of these technologies, is still to be determined. Nevertheless, the technology issues being addressed in stacked dies will have broad applications.
“We think there’s lots of room for mobile innovation,” said TSMC’s Zhang. “One device we think is a future opportunity for us to grow our business is augmented reality glasses. These glasses are transparent, small form factor, and they allow you to wear them all day long. In order to have a full day of battery usage, have all the compute power, you really need advanced silicon. You need a lot of sensing devices. You need connectivity, so lots of silicon content.”
The same is true for humanoid robots, he said. “The auto industry wants to go to autonomous driving. You could think about a car as merely a first step to build a robot. A car is a simple robot. It just takes you from place A to place B. But in the future, if you really want a robot to react with a human and help you to do a daily chore and handle lots of things humans don’t want to do, you need to build these so-called humanoid robots. If you go inside these robots, you see lots of silicon. First of all, you need to have intelligence. You need to have good AI capability. You need advanced silicon to power the intelligence. You also need to have good sensing capability, good power deliverables. And you need lots of integrated controllers to provide the capability to function under different conditions.”
Fig. 9: Silicon requirements for a humanoid robot. Source: TSMC
Conclusion
Different foundries are at different points in developing all of the necessary pieces required for 3D-ICs. No foundry can solve all of these issues at one time, and the chip industry is somewhat more forgiving these days. With ongoing geopolitical disruptions in the supply chain, chipmakers are looking for multiple sources and multiple technology options.
“We’re faced with a challenge and an opportunity and a dilemma all at the same time,” said Mike Ellow, CEO of Siemens EDA. “How do we take early and career engineers and allow them to address the multitude of new designs that they have to deliver, and have silicon for it? The world is dependent on a resilient, robust, distributed advanced-node silicon supply chain. On top of that, we need a set of AI-infused technologies that connect together the broader ecosystem in order to allow all design content to be created.”
Related
Backside Power Delivery Nears Production
Breakthrough approach delivers better scaling and power efficiency, but at the cost of new processes like wafer thinning, bonding, and advanced debug.
Advanced Packaging Fundamentals for Semiconductor Engineers
New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.
What’s Next For Through-Silicon Vias
Fab tools are being fine-tuned for TSV processes as demand ramps for everything from HBM to integrated RF, power, and MEMS in 3D packaging.
Hybrid Bonding Makes Strides Toward Manufacturability
Companies are selecting preferred flows, but the process details are changing rapidly to meet the needs of different applications.
Linear Pluggable Optics Save Energy In Data Centers
New OIF electrical standards will enable interoperability, adding another option for faster and more efficient data movement.
Leave a Reply