Creating A Moore’s Law For AI Scaling

Scaling AI becomes the grand challenge of the Intelligence Era

popularity

Key Takeaways:

  • AI scalability will require full-stack co-optimization, not just bigger data centers.
  • AI workloads require a 10X compute efficiency gain over 10 years, making collaboration across algorithms, architectures, devices, packaging, and communication fabrics essential to deliver a 10X improvement in compute efficiency over the next decade.
  •  Edge AI chips are moving to leading-edge nodes at a faster pace. Physical AI calls for an adaptive architecture, with a focus on low latency, open architectures, reusable platforms, and fault tolerance. 

One of the greatest challenges of the AI era is its lack of scalability. Processing 10X the number of AI algorithms today essentially requires 10X the amount of data center hardware.

Solving that challenge has global implications.

“AI data centers are now literally measured in gigawatts, and the playbook has been consistent,” said Patrick Vandenameele, CEO of imec, at the firm’s recent International Technology Forum (ITF). “First, we scale compute to its maximum die size, then scale up many GPUs operating in parallel inside a rack, acting as one gigantic computer’s engine, then scale out, connecting many racks into a single training and inference factory. But at the same time, power consumption increases almost linearly. It does not really scale.”

As workloads shift from training to inferencing with the implementation of multi-agentic AI systems, researchers estimate these systems will require 150X the compute power of existing large language models, making scaling even tougher.

“The fundamental question becomes, ‘Can we scale AI without exceeding infrastructure and planetary limits?’ This becomes even more urgent as we enter the era of physical AI, robots, autonomous systems, and smart environments,” said Vandernameele.

He posits that only co-optimization across the stack and among ecosystem partners will meet the scaling needs of AI, which will require an astounding 10X gain in compute efficiency over just 10 years. “The good news is that the main contributors, algorithms, device scaling, communication fabric, and architectural advances are multiplicative. When we co-optimize them, the product exceeds the performance needs of AI, and that’s the opportunity. This is the reason we need to collaborate across the full stack.”

Other industry leaders agree. “We are now living in a generative AI era using chatbots, image generators, video creators, and more. We will soon see an agentic AI that can take specific tasks and execute exactly what we want,” said Jaihyuk Song, corporate president and CTO of Samsung.  “AI models will eventually shift towards physical AI systems, such as humanoid robots that have been widely displayed as CES lately, which can truly perceive the world and act appropriately. Samsung is delivering AI solutions by co-optimizing everything from design to packaging.”

“Fortunately, we are seeing convergence across all our product domains (DRAM, NAND, logic, advanced packaging),” said Song. He highlighted five core technologies, including bonding, high-performance transistors, stress warpage control, fine patterning, and low resistance interconnects. One recent material breakthrough is the development of IGZO vertical channel transistors for DRAM. “Switching to an oxide semiconductor channel can reduce the off-current by several orders of magnitude. In addition, we have confirmed that incorporating ferroelectric materials can reduce the operating voltage, as well.”

In HBM, hybrid bonding will enable higher die count stacking (12H, 16H, 20H), while reducing thermal resistance. Song did not disclose a timeline for hybrid bonding introduction for HBM. But interestingly, shifting some of the core logic from the GPU to the HBM base die is enabling higher speed with improved power efficiency. The company is further developing a data-sharing platform with multiple equipment suppliers to enable better equipment health, predictive modeling, and yield projection capability.  It also is collaborating with Nvidia on physics-informed AI models for plasma etching, thermal modeling and mechanical modeling.

Trailing edge won’t be trailing anymore
AI isn’t just just shaping high-performance computing. In edge devices, it’s accelerating the move to more advanced device nodes. This means that automotive, and even some smartphone components, are rapidly moving to the leading edge.

“To achieve ubiquitous intelligent AI, you need to drive computation to all the different forms of edge devices. The smartphone needs to be in constant communication with the data center to get the intelligence,” said Kevin Zhang at TSMC. He noted that in mobile phones, application processors are migrating to 2nm technology by the end of this year, RF devices are migrating to 6nm finFET technology, and the image signal processor in the phone’s camera will migrate to 12nm finFET transistors.

“We see that because of AI the technology at the edge is being accelerated,” Zhang said. “And nothing is more important than operating voltage. We are driving operating voltage down to 0.4V in order to achieve up to 70% switching power reduction.”


Fig. 1: AI is accelerating the push toward advanced technology nodes in edge devices. Source: TSMC

Imec is both an innovation incubator and a booster of scalable manufacturable solutions. “By collaborating with imec, we can evaluate new technologies within a proven ecosystem before transferring them into a production environment,” said Steven Hsu, vice president of technology development at UMC. “This is proving especially fruitful for companies entering new technology domains, such as integrated photonics. “For example, we are leveraging imec’s iSiPP300 integrated silicon photonics process technology to accelerate development of UMC’s own 12-inch silicon photonics platform. UMC has prior experience manufacturing photonic ICs on 6-inch and 8-inch wafers, primarily for telecom applications. With AI infrastructure driving demand for higher-bandwidth optical interconnects, the industry is increasingly moving toward 12-inch manufacturing to improve scalability, cost efficiency, and performance.”

One of the more intriguing presentations at ITF was by Salil Raje, senior vice president and general manager of AMD’s adaptive and embedded computing group. “I’m going to paint a scenario for you. You’re at work or at a conference, you get a buzz on the phone, you have a text message saying one of your parents has fallen, and he or she cannot get up. You could call emergency services, wait 10 to 15 minutes, or deploy the humanoid robot that is already on-site. The real question is not whether the robot can get to your parent. It is whether this robot will act correctly, safely, and in time. Would you trust this robot today? I will not. The question I would like to get to is, ‘What does it take to change that answer?’”

Raje explained why the system requirements for physical AI are completely different from those of cloud computing. “Data centers’ training and inference is automated, latency tolerant, and centralized, and the cloud is very forgiving. You send in a prompt, you wait for a response. If it takes longer, you wait longer,” he said. “If it makes a mistake, you retry with a new prompt. The next chapter for AI is very different. It’s entering vehicles, factories, and hospitals. For hospitals and homes, there are no second tries.”

Raje noted that rapid response is essential. “[Physical AI] is a time-bounded response problem, meaning the control group has to finish within microseconds, every cycle, continuously. That’s where adaptive architectures come into play, using FPGAs and adaptive SoCs, while system coordination is best suited to CPUs. They do the orchestration. The key insight is that physical AI systems are heterogeneous because the workloads are heterogeneous … The top 150 robotics OEMs and developers have three asks — determinism, predictable timing, open architecture, and scalable platform. They would like the same platform to work with their industrial arm, or an AMR, or a humanoid robot.”

Raje emphasized that the challenges of physical AI systems rely on the semiconductor industry ecosystem. “These physical AI systems will work in real-time, and continue to work even though parts of them could fail. These challenges are quite enormous, and no single company can really tackle all of these challenges. The problems are distributed, and the solutions need to be distributed, too, across companies, disciplines, and standards bodies.”

New roadmap pivots at A7, embedded memory

Fig. 2: Updated imec roadmap includes a new embedded RAM in silicon interposer architecture to help penetrate the memory wall. Source: imec

Imec’s latest roadmap projects the first production use of complementary field-effect transistors (CFETs), consisting of vertically stacked nFETs and pFETs in logic devices, somewhere in the 2033 time frame. Other highlights of the imec roadmap include:

  • The first implementation of CFETs in production will happen at the 7Å node;
  • CFETs will require both front- and backside metallization;
  • There will be parallel use of both frontside and backside power delivery networks until the 7Å node;
  • First use of semi-damascene ruthenium or molybdenum interconnects with airgap dielectric will happen at the 10Å node (not shown);
  • At the 7 Å node, CMOS 2.0 will combine a functional backside with tier-based functions at multiple levels within SoCs, and 
  • A new embedded memory interposer roadmap to bring memory closer to XPU and improve connectivity overhead

AI will drive hardware differentiation
In the rapidly changing world of frontier AI algorithms, it is becoming clear that data center hardware — and hardware in general — will be a key differentiator for overall system performance. One way of looking at the new scaling paradigm at the system level is:

Slowed scaling + STCO boosters = On-target scaling

That final equation reflects the industry’s ability to double down on 3D approaches, from the device level (3D DRAM, 3D NAND with metal gate and air gaps, CMOS 2.0 for logic) to the package level. System technology co-optimization (STCO) boosters include capabilities like wafer-to-wafer and chip-to-wafer hybrid bonding, backside power distribution networks, and TSV interconnects that boost the overall system performance.

In fact, rather than continuing to ramp substrate sizes from 5.5X the reticle size to 9X, 15X, to 300mm wafers, Eric Beyne, senior fellow at imec, proposes a “volumetric 3D” construct that wraps the interposer around several HBM modules. This enables closer positioning of HBM to SoCs/ASICs in data centers while avoiding the inevitable yield issues associated with such massive passive substrates.

“If you look at these big interposers, and you look at the value of the components you put on that interposer, >$1,000 for each HBM, are you worried about spending a little more for the substrate? It’s not about the cost of the substrate, but its yield and reliability, because if it’s not yielding you have to throw away everything,” said Beyne.


Fig. 3: Heterogeneous large-scale integration (HLSI) combines 3D logic (CFETs), 3D memory, RF, etc., with a 3D package that separates memory and power delivery on the active interposer’s backside and advanced logic stacks on the frontside in a novel “volumetric 3D” package. Source: imec

The volumetric 3D structure uses an active interposer, rather than today’s passive structures that are difficult to test and repair. “Making the interposer bigger and bigger makes it more difficult to yield, more expensive, and more difficult to repair,” added Beyne. “Suppose you are actually able to bend the interposer, and you put the HBM below the interposer in slots like a DIM card, together with flash and power delivery, because these chips need a lot of current. That interposer immediately becomes smaller, and you can add microchannel cooling to the vertical slots. You can then put the SoC complex on top and add the main cooling to the top of the cube.” He noted that the new structure is in the early stages of development.

Volumetric 3D is just one of the ways imec’s researchers think outside of the conventional manufacturing box. The generalized shift to 3D devices and packages is also rewriting the rules of design. “If you say I do something in 2D, and I’m now going to do the same thing in 3D, you’re going to gain a little bit. But if you say I’m going to think in 3D, you may come up with solutions that are completely different,” said Zsolt Tokei, imec fellow and program director of nano-interconnects.

As the industry plunges deeper into the AI era, many of the foundational technologies are reaching PPA limits. Further scaling of SRAM is slowing, as is DRAM cost/bit reduction is slowing. At the same time, the memory wall is increasing in terms of bandwidth, and power density is increasingly limited by thermal properties of the system. While engineers are still finding technology solutions to meet the demands of specific applications, a roadmap is needed to reduce the energy associated with moving data, increased thermal design power (TDP, heat generated by a processor under sustained load) with superior thermal management, improved efficiency of power delivery, and increased compute scaling while enabling voltage and capacitance scaling. 

“With xTCO, we try to create a dialog between the technology and the application by trying to understand how technology can improve the compute density, can improve the power delivery with better thermal designs, and improve the memory subsystem and connectivity in the systems through the optical I/O and electrical I/O fabric,” said Julien Ryckaert, vice president of R&D at imec.

Scaling is led by lithography developments at the edge. Imec recently installed its first high-NA (0.55) scanner from ASML in its cleanroom, which will be joined by 100 or so other leading fabrication tools from various manufacturers. With the new patterning system, imec recently demonstrated 16nm pitch lines and spaces (8nm lines), as well as features separated by 8.7nm tip-to-tip spacing. 

Cross-technology, cross-organization optimization
The collaborative model that imec has perfected over the last four decades involves companies from across the semiconductor ecosystem. “The imec partnership enables access to leading-edge infrastructure and talent through their state-of-the-art pilot lines, advanced metrology, and deep expertise in sub-nanometer process development that would be difficult to replicate internally,” said Douglas Guerrero, senior technologist at Brewer Science. “Early insight into future technology nodes allows us to align materials innovation with emerging device architectures (e.g., gate-all-around, 3D integration) earlier in the development cycle.” He added that collaborative R&D reduces the upfront investment and spreads technical risk among partners.

Others agree. “Imec brings deep expertise in advanced semiconductor research, along with access to leading-edge development environments, process assumptions, and silicon data that are difficult to obtain elsewhere,” said Germain Fenger, senior director of product management at Synopsys and a former imec assignee. “At imec, you’re surrounded by experts in many fields, whereas in a company you often are kind of siloed. On top of that, you have other companies with their own experts working at imec, so you gain a very broad viewpoint about where the technology is going, where the things are, and the ecosystem really helps a company develop meaningful products that solve problems that are 2, 3, even 10 years out. That helps us accelerate innovation, reduce risk, and validate emerging ideas earlier, allowing us to refine solutions faster and pursue advances that would be much more difficult — or much slower — to achieve independently.”

Fenger also sees AI driving even stronger partnerships than in the past. “The complexity of AI systems is pushing the ecosystem toward earlier and deeper collaboration, with companies working together from technology exploration through production, and across design, process development, manufacturing, and advanced packaging. We are seeing tighter alignment across the supply chain to optimize performance, power, cost, and time-to-market. In this environment, success will increasingly depend on shared innovation, faster feedback loops, and stronger ecosystem partnerships.”

Time-to-market is especially relevant for wafer-size transitions, as it reduces per-chip manufacturing costs and delivers greater economies of scale. “Combined with our internal R&D efforts, imec’s iSiPP300’s validated process flows and device technologies help accelerate development and shorten time to market. We expect to make our 12-inch silicon photonics PDK available to general customers in 2027,” said UMC’s Hsu.

Conclusion
The semiconductor industry is rapidly innovating to meet the new scaling requirements of the AI era. Full-stack co-optimization means not just design, manufacturing, and packaging but system-level co-optimization using enablers like wafer thinning, hybrid bonding, backside power delivery, CFETs, and more.

Industry leaders contend that AI workloads — especially emerging agentic and physical AI systems — will demand dramatically more compute, making collaboration across algorithms, architectures, devices, packaging, and communication fabrics essential to deliver a 10X improvement in compute efficiency over the next decade.

Some of the new innovations coming into manufacturing to enable further scaling include ferroelectric and IGZO memories for 3D DRAM. Imec’s roadmap includes a transformation in architecture to CMOS 2.0, in which a functional wafer backside will coexist with a tier-based frontside, with layers dedicated to specific device types (memory, logic, RF, etc.). This change coincides with the introduction of CFETs, after gate-all-around and forksheet FETs have been scaled to their limit.



Leave a Reply


(Note: This name will be displayed publicly)