Intel Vs. Samsung Vs. TSMC

Foundry competition heats up in three dimensions and with novel technologies as planar scaling benefits diminish.

popularity

The three leading-edge foundries — Intel, Samsung, and TSMC — have started filling in some key pieces in their roadmaps, adding aggressive delivery dates for future generations of chip technology and setting the stage for significant improvements in performance with faster delivery time for custom designs.

Unlike in the past, when a single industry roadmap dictated how to get to the next process node, the three largest foundries increasingly are forging their own paths. They all are heading in the same general direction with 3D transistors and packages, a slew of enabling and expansive technologies, and much larger and more diverse ecosystems. But some key differences are emerging in their methodologies, architectures, and third-party enablement.

Roadmaps for all three show that transistor scaling will continue at least into the 18/16/14 angstrom range, with a possible move from nanosheets and forksheet FETs, followed by complementary FETs (CFETs) at some point in the future. The key drivers are AI/ML and the explosion of data that needs to be processed, and in most cases these will involve arrays of processing elements, usually with high levels of redundancy and homogeneity, in order to achieve higher yields.

In other cases, these designs may contain dozens or hundreds of chiplets, some engineered for specific data types and others for more general processing. Those chiplets can be mounted on a substrate in a 2.5D configuration, an approach that has gained traction in data centers because it simplifies the integration of high-bandwidth memory (HBM), as well as in mobile devices, which also include other features such as image sensors, power supplies, and additional digital logic used for non-critical functions. All three foundries are working on full 3D-ICs, as well. And there will be hybrid options available, where logic is stacked on logic and mounted on a substrate, but separated from other features in order to minimize physical effects such as heat — a heterogeneous configuration that has been called both 3.5D and 5.5D.

Rapid and mass customization
One of the biggest changes involves bringing domain-specific designs to market much more quickly than in the past. Mundane as this may sound, it’s a competitive necessity for many leading-edge chips, and it requires fundamental changes in the way chips are designed, manufactured, and packaged. Making this scheme work demands a combination of standards, innovative connectivity schemes, and a mix of engineering disciplines that in the past had limited interactions, if any.

Sometimes referred to as “mass customization,” it includes the usual power, performance, and area/cost (PPA/C) tradeoffs, as well as rapid assembly options. That is the promise of heterogeneous chiplet assemblies, and from a scaling perspective it marks the next phase of Moore’s Law. The entire semiconductor ecosystem has been laying the groundwork for this shift incrementally for more than a decade.

But getting heterogeneous chiplets — essentially hardened IP from multiple vendors and foundries — to work together is both a necessary and daunting engineering challenge. The first step is connecting the chiplets together in a consistent way to achieve predictable results, and this is where the foundries have spent much of their effort, particularly with the Universal Chiplet Interconnect Express (UCIe) and Bunch of Wires (BoW) standards. While that connectivity is a critical requirement for all three, it’s also one of the main areas of divergence.

Intel Foundry’s current solution, prior to fully integrated 3D-ICs, is to develop what industry sources describe as “sockets” for chiplets. Instead of characterizing each chiplet for a commercial marketplace, the company defines the specification and the interface so that chiplet vendors can develop these limited-function mini-chips to meet those specs. That addresses one of the big stumbling blocks for a commercial chiplet marketplace. All the pieces need to work together, from data speed to thermal and noise management.

Intel’s scheme relies heavily on its Embedded Multi-Die Interconnect Bridge (EMIB), first introduced in 2014. “The really cool thing about an EMIB base is you can add any amount of chiplets,” said Lalitha Immaneni, vice president of technology development at Intel. “We don’t have a limitation on the number of IPs that we can use in design, and it won’t increase the interposer size, so it’s cost-effective and it’s agnostic of the process. We have given out a package assembly design kit, which is like your traditional PDK for the assembly. We give them the design rules, the reference flows, and we tell them the allowable constructions. It will also give them any collaterals that we need to take it into our assembly.”

Depending upon the design, there can be multiple EMIBs in a package, complemented by thermal interface materials (TIMs), in order to dissipate heat that can become trapped inside a package. TIMs typically are pads that are engineered to conduct heat away from the source, and they are becoming more common as the amount of compute inside a package increases and as the substrates are thinned to shorten the distance signals need to travel.

But the thinner the substrate, the less effective it is at heat dissipation, which can result in thermal gradients that are workload-dependent and therefore difficult to anticipate. Eliminating that heat may require TIMs, additional heat sinks, and potentially even more exotic cooling approaches such as microfluidics.

Both TSMC and Samsung offer bridges, as well. Samsung has embedded bridges inside the RDL — an approach it calls 2.3D or I-Cube ETM — and it’s using them to connect sub-systems to those bridges in order to speed time to working silicon. Instead of relying on a socket approach, some of the integration work will be pre-done in known-good modules.

“Putting together two, four, or eight CPUs into a system is something that very sophisticated customers know how to go out and do,” said Arm CEO Rene Haas, in a keynote speech at a recent Samsung Foundry event. “But if you want to build an SoC that has 128 CPUs attached to a neural network, memory structures, interrupt controllers that interface to an NPU, an off-chip bus to go to another chiplet, that is a lot of work. In the last year and a half, we’ve seen a rush of people building these complex SoCs wanting more from us.”

Samsung also has been building mini-consortia [1] of chiplet providers, targeted at specific markets. The initial concept is that one company builds an I/O die, another builds the interconnect, and a third builds the logic, and when that is proven to work, then others are added into the mix to provide more choices for customers.

TSMC has experimented with a number of different options, including both RDL and non-RDL bridges, fan-outs, 2.5D chip-on-wafer-on-substrate (CoWoS), and System On Integrated Chips (SoIC), a 3D-IC concept in which chiplets are packed and stacked inside a substrate using very short interconnects. In fact, TSMC has a process design kit for just about every application, and it has been active in creating assembly design kits for advanced packaging, including reference designs to go with them.

The challenge is that foundry customers willing to invest in these complex packages increasingly want very customized solutions. To facilitate that, TSMC rolled out a new language called 3Dblox, a top-down design scheme that fuses physical and connectivity constructs, allowing assertions to be applied across both. This sandbox approach allows customers to leverage any of its packaging approaches — InFO, CoWoS, and SoIC. It’s also essential to TSMC’s business model, because the company is the only pure-play foundry of the three [2] — although both Intel and Samsung have distanced their foundry operations in recent months.

“We started from a concept of modularization,” said Jim Chang, vice president of advanced technology and mask engineering at TSMC, in a presentation when 3Dblox was first introduced in 2023. “We can build a full 3D-IC stacking with this kind of language syntax plus assertions.”

Chang said the genesis of this was a lack of consistency between the physical and connectivity design tools. But he added that once this approach was developed, it also enabled reuse of chiplets in different designs because much of the characterization was already well-defined and the designs are modular.


Fig. 1: TSMC’s 3Dblox approach. Source: TSMC

Samsung followed with its own system description language, 3DCODE, in December 2023. Both Samsung and TSMC claim their languages are standards, but they’re more like new foundry rule decks because it’s unlikely these languages will be used outside of their own ecosystems. Intel’s 2.5D approach doesn’t require a new language because the rules are dictated by the socket specification, trading off some customization with a shortened time to market and a simpler approach for chiplet developers.

The chiplet challenge
Chiplets have obvious benefits. They can be designed independently at whatever process node makes sense, which is particularly important for analog features. But figuring out how to put the pieces together with predictable results has been a major challenge. The initial LEGO-like architecture scheme floated by DARPA has proven much more complicated than first envisioned, and it has required a massive and ongoing efforts by broad ecosystems to make it work.

Chiplets need to be precisely synchronized so that critical data is processed, stored, and retrieved without delay. Otherwise, there can be timing issues, in which one computation is either delayed or out-of-sync with other computations, leading to delays and potential deadlocks. In the context of mission- or safety-critical applications, the loss of a fraction of a second can have serious consequences.

Simplifying the design process, particularly with domain-specific designs where one size does not fit all, is an incredibly complex endeavor. The goal for all three foundries is to provide more options for companies that will be developing high-performance, low-power chips. With an estimated 30% to 35% of all leading-edge design starts now in the hands of large systems companies such as Google, Meta, Microsoft, and Tesla, the economics of leading-edge chip and package design have changed significantly, and so have the PPA/C formulas and tradeoffs.

Chips developed for these systems companies probably will not be sold commercially. So if they can achieve higher performance per watt, then the design and manufacturing costs can be offset by lower cooling power and higher utilization rates — and potentially fewer servers. The reverse is true for chips sold into mobile devices and commodity servers, where high development costs can be amortized across huge volumes. The economics for customized designs in advanced packages work for both, but for very different reasons.

Scaling down, up, and out
It’s assumed that within these complex systems of chiplets there will be multiple types of processors, some highly specialized and others more general-purpose. At least some of these will likely be developed at the most advanced process nodes due to limited power budgets. Advanced nodes still provide higher energy efficiency, which allows more transistors to be packed into the same area in order to improve performance. This is critical for AI/ML applications, where processing more data faster requires more multiply/accumulate operations in highly parallel configurations. Smaller transistors provide greater energy efficiency, allowing more processing per square millimeter of silicon, but the gate structure needs to be changed to prevent leakage, which is why forksheet FETs and CFETs are on the horizon.

Put simply, process leadership still has value. Being first to market with a leading-edge process is good for business, but it’s only one piece of a much larger puzzle. All three foundries have announced plans to push well into the angstrom range. Intel plans to introduce its 18A this year, followed by 14A a couple years later.


Fig. 2: Intel’s process roadmap. Source: Intel Foundry

TSMC, meanwhile, will add A16 in 2027 (see figure 3, below.)


Fig. 3: TSMC’s scaling roadmap into the angstrom era. Source: TSMC

And Samsung will push to 14 angstroms sometime in 2027 with its SF1.4, apparently skipping 18/16 angstroms. (See figure 4)


Fig. 4: Samsung’s process scaling roadmap. Source: Samsung Foundry

From a process node standpoint, all three foundries are on the same track. But advances are no longer tied to the process node alone. The focus increasingly is about latency and performance per watt in a specific domain, and this is where stacking logic-on-logic in a true 3D-IC configuration will excel, using hybrid bonds to connect chiplets to a substrate and each other. Moving electrons through a wire on a planar die is still the fastest (assuming a signal doesn’t have to travel from one end of the die to another), but stacking transistors on top of other transistors is the next best thing, and in some cases even better than a planar SoC because some vertical signal paths may be shorter.

In a recent presentation, Taejoong Song, Samsung Foundry’s vice president of foundry business development, showed a roadmap featuring logic-on-logic mounted on a substrate, combining a 2nm (SF2) die on top of a 4nm (SF4X) die, both mounted on top of another substrate. This is basically a 3D-IC on a 2.5D package, which is the 3.5D or 5.5D concept mentioned earlier. Song said the foundry will begin stacking an SF1.4 on top of SF2P, starting in 2027. What’s particularly attractive about this approach are the thermal dissipation possibilities. With the logic separated from other functions, heat can be channeled away from the stacked dies through the substrate or any of the five exposed sides.


Fig. 5: Samsung’s 3D-IC architecture for AI. Source: Samsung

Intel, meanwhile, will leverage its Foveros Direct 3D to stack logic on logic, either face-to-face or face-to-back. The approach allows chips or wafers from different foundries, with the connection bandwidth determined by the copper via pitch, according to a new Intel white paper. The paper noted that the first version would use a copper pitch of 9µm, while the second generation would use a 3µm pitch.


Fig. 6: Intel’s Foveros Direct 3D. Source: Intel

“The true 3D-IC comes with Foveros, and then also with hybrid bonds,” said Intel’s Immaneni. “You cannot go in the tradition route of design where you put it together and run validation, and then find, ‘Oops, I have an issue.’ You cannot afford to do this anymore because you’re impacting your time to market. So you really want to provide a sandbox to make it predictable. But even before I step into this detailed design environment, I want to run my mechanical/electrical/thermal analysis. I want to look at the connectivity so I don’t have opens and shorts. The burden for 3D-IC resides more in the code design than the execution.”

Foveros allows an active logic die to be stacked on either another active or passive die, with the base die used to connect all the die in a package at a 36 micron pitch. By leveraging advanced sort, Intel claims it can guarantee 99% known good die, and 97% yield at post-assembly test.

TSMC’s CoWoS, meanwhile, already is in use by NVIDIA and AMD for their advanced packaging for AI chips. CoWoS is essentially a 2.5D approach, using an interposer to connect SoCs and HBM memory using through-silicon vias. The company’s plans for SoIC are more ambitious, packaging both memory on logic along with other elements, such as sensors, in a 3D-IC at the front end of the line. This can significantly reduce assembly time of multiple layers, sizes, and functions. TSMC contends that its bonding scheme enables faster and shorter connections than other 3D-IC approaches. One report said Apple will begin using TSMC’s SoIC technology starting next year, while AMD will expand its use of this approach.

Other innovations
Putting the process and packaging technology in place opens the door to a much broader set of competitive options. Unlike in the past, when big chipmakers, equipment vendors, and EDA companies defined the roadmap for chips, the chiplet world provides the tools for end customers to make those decisions. This is due, in no small part, to the number of features that can be put into a package versus those that can fit inside the reticle limits of an SoC. Packages can be expanded horizontally or vertically, as needed, and in some cases they can improve performance just through vertical floor-planning.

But given the vast opportunity in the cloud and the edge — particularly with the rollout of AI everywhere — the three big foundries, as well as their ecosystems, are racing to developing new capabilities and features. In some cases, this involves leveraging what they already have. In other cases, it requires brand new technologies.

For example, Samsung has started detailing plans about custom HBM, which includes 3D DRAM stacks with a configurable logic layer underneath. This is the second time around for this approach. Back in 2011, Samsung and Micron co-developed the Hybrid Memory Cube, packaging a DRAM stack on a layer of logic. HBM won the war after JEDEC turned it into a standard, and HMC largely disappeared. But there was nothing wrong with the HMC approach, other than perhaps bad timing.

In its new form, Samsung plans to offer customized HBM as an option. Memory is one of the key elements that determine performance, and the ability to read/write and move data back and forth more quickly between memory and processors can have a big impact on performance and power. And those numbers can be significantly better if the memory is right-sized to a specific workload or data type, and if some of the processing can be done inside the memory module so there is less data to move.


Fig. 7: Samsung roadmap and innovations. Source: Semiconductor Engineering/MemCon 2024

Intel, meanwhile, has been working on a better way to deliver power to densely packed transistors, a persistent problem as the transistor density and number of metal layers increases. In the past, power was delivered from the top of the chip down, but two problems have emerged at the most advanced nodes. One is the challenge of actually delivering enough power to every transistor. The second is noise, which can come from power, substrates, or electromagnetic interference. Without proper shielding — something that is becoming more difficult at each new node due to thinner dielectrics and wires — that noise can impact signal integrity.

Delivering power through the backside of a chip minimizes those kinds of issues and reduces wiring congestion. But it also adds other challenges, such as how to drill holes through a thinner substrate without structural damage. Intel apparently has solved these issues, with plans to offer its PowerVia backside power scheme this year.

TSMC said it plans to deliver backside power delivery at A16 in 2026/2027. Samsung is roughly on the same schedule, delivering it in the SF2Z 2nm process.

Intel also has announced plans for glass substrates, which can provide better planarity and lower defectivity than CMOS. This is especially important at advanced nodes, where even nano-sized pits can cause issues. As with backside power delivery, handling issues abound. The upside is that glass has the same coefficient of thermal expansion as silicon, so it is compatible with the expansion and contraction of silicon components, such as chiplets. After years of sitting on the sidelines, glass is suddenly very attractive. In fact, both TSMC and Samsung are working on glass substrates, as well, and the whole industry is starting to design with glass, handle it without cracking it, and to inspect it.

TSMC, meanwhile, has focused heavily on building an ecosystem and expanding its process offerings. Numerous industry sources say TSMC’s real strength is the ability to deliver process development kits for just about any process or package. The foundry produces about 90% of the most advanced chips globally, according to Nikkei. It also has the most experience with advanced packaging of any foundry, and the largest and broadest ecosystem, which is important.

That ecosystem is critical. The chip industry is so complex and varied that no single company can do everything. The question going forward will be how complete those ecosystems truly are, particularly if the number of processes continues to grow. For example, EDA vendors are essential enablers, and for any process or packaging approach to be successful, design teams need automation. But the more processes and packaging options, the more difficult it will be for EDA vendors to support every incremental change or improvement, and potentially the greater the lag time between announcement and delivery.

Conclusion
The recent supply chain glitches and geopolitics have convinced the United States and Europe that they need to re-shore and “friend-shore” manufacturing. The investments in semiconductor fabs, equipment, tools, and research are unprecedented. How that affects the three largest foundries remains to be seen, but it certainly is providing some of the impetus behind new technologies such as co-packaged optics, a raft of new materials, and cryogenic computing.

The impact of all of these changes on market share is becoming harder to track. It’s no longer about which foundry is producing chips at the smallest process node, or even the number of chips being shipped. A single advanced package may have dozens of chiplets. The real key is the ability to deliver solutions that matter to customers, quickly and efficiently. In some cases the driver will be performance per watt, while in others it may be time to results with power as a secondary consideration. And in still others, it may be a combination of features that only one of the leading-edge foundries can provide in sufficient quantity. But what is clear is that the foundry race is significantly more complex than ever before, and becoming more so. In this highly complex world, simple metrics for comparison no longer apply.

References
1. Mini-Consortia Forming Around Chiplets, March 20, 2023; E. Sperling/Semiconductor Engineering
2. TSMC also is the largest shareholder (35%) in Global Unichip Corp., a design services company.



3 comments

Robert N. Blair says:

Ed, Very good summary ….
RNB.

Kenny Hatton says:

Nice Job On This…thank you

Dr. Dev Gupta says:

To open up old wounds ! HMC vs HBM circa 2011. Or why HMC, in spite of being a couple of years ahead, lost to HBM. In short, higher consumption during operation at server farms. To save the cost of package of a wide parallel bus HMC had opted for SerDes ( much higher clock rate ), which turned out to burn more power — a no-no when you have millions of these modules in a server farm and then have to cool them.

HMC went that way per advice from a certain 3 lettered Co. on the East Coast that had pioneered advanced packaging (chiplets and all in mainframes ) in the 1970s, but had lost interest in manufacturing hardware in the 1990s as the CMOS tsunami hit. In the mid-oughts, a few in their nearly defunct Labs were dabbling in 3D stacking of dies and using the aura that they once had, trying to peddle naive newcomers. So HMC chose to use SerDes and lost the market. Now, 12 yrs later, they are using HBM w/its parallel bus, but the market share is quite low compared to the leader.

Current relevance. The same set of charlatans from that 3 lettered company have now wormed their way into the leadership of CHIPS ( since to the denizens of D.C., totally innocent of semiconductor manufacturing, but given charge to dispense CHIPS funds, that 3 lettered Co. still holds an aura even though they had to get out of the hardware manufacturing biz a decade or so ago. By following their advice, the makers of HMC were set back by almost a decade. What is in store for CHIPS w/these poseurs in control?

Leave a Reply


(Note: This name will be displayed publicly)