Systems & Design
SPONSOR BLOG

AI Energy Gap And Chiplets: Why Data Movement Matters

Making AI chiplets truly work efficiently demands clarity at the architectural and semantic layers.

popularity

At the recent Chiplet Summit 2026 preconference tutorial, the panel session, “Best Way to Make Chiplets Work,” brought together leaders from across the semiconductor ecosystem to tackle one of the most pressing challenges in advanced system design: how do we make heterogeneous, multi-die systems operate as a cohesive, energy-efficient whole for AI?

While much discussion focused on standards such as UCIe and evolving interconnect specifications, a consistent theme emerged: connectivity at the physical layer is necessary, but insufficient. Making AI chiplets truly work efficiently demands clarity at the architectural and semantic layers, where data movement, coherency, and system behavior are defined.

From an Arteris perspective, this distinction is critical.

Beyond bits: Language matters

One of the central insights discussed was that UCIe provides a means to move bits from one die to another, but it does not define the meaning of those bits. As Ashley Stevens of Arteris emphasized, the physical link is only part of the solution. The real challenge lies in defining how chiplets communicate; what protocol is appropriate, what semantics are shared, and how capabilities are negotiated.

In some systems, particularly scale-up environments such as multi-core processing, a coherent interconnect spanning chiplets is required. In other scenarios, for example AI data movement workloads, coherent protocols, even when used for non-coherent communication, impose unnecessary energy overhead with packetization and protocol taking up to 50% of overall bandwidth. In such cases, die-to-die remote direct memory access (RDMA) techniques or long-burst transfers can be dramatically more efficient by maximizing the data-to-overhead ratio and therefore minimizing energy per useful data bit moved.

No single protocol is optimal for every chiplet use case. A 64-byte cache-coherent transaction may be appropriate for SMP-style scaling, or the control plane of a processor system enhanced with accelerators, but it is rarely the most energy-efficient mechanism for bulk AI tensor movement in the data plane. The responsibility falls to the system architect to select the right communication model based on workload semantics, traffic patterns, and power envelopes.

Data movement and the AI energy gap

Panelists repeatedly highlighted that AI compute scaling is dramatically outpacing traditional Moore’s Law transistor efficiency improvements. As compute density increases, energy consumed in data movement, whether on die, between dies, or across racks, becomes an increasingly dominant factor.

This is where system interconnect architecture plays a pivotal role. Efficient data movement is as much an energy issue as it is a performance issue, and measurable impact on overall system power consumption can be affected by architectural choices, such as:

  • Selecting coherent versus non-coherent communication models
  • Matching burst sizes to workload characteristics
  • Minimizing unnecessary protocol overhead
  • Reducing redundant memory transactions

At Arteris, scalable cache-coherent and non-coherent interconnect IP is designed specifically to enable this architectural flexibility. The goal is not to enforce a one-size-fits-all model, but to allow system designers to align data movement behavior with application intent.

Shift left sooner rather than later

Another recurring theme was the concept of “shifting left.” Verification, power analysis, thermal modeling, and interoperability validation must occur as early as possible in the design cycle.

In a chiplet-based system, the combinatorial complexity rises dramatically. It is not enough to verify a single SoC, but designers must:

  • Validate partitioning decisions.
  • Simulate traffic across chiplet boundaries.
  • Ensure interoperability of partial spec implementations.
  • Model power and thermal interactions across packages.

Arteris approaches this challenge through configurable IP generation and automated integration solutions that allow architects to generate system-level specifications across multiple dies. Instead of producing isolated chiplet collateral, designers can create a consolidated view of the full device as a single logical system, which is critical for software bring-up and system validation.

This system-centric perspective is essential. Ultimately, system designers and software developers do not care which chiplet a function resides in. They want a hardware and programmer’s model datasheet for the entire packaged device, consisting of multiple chiplets, not a specification for each individual chiplet.

As chiplets offer the ability to mix and match to quickly create multiple device product SKUs at low incremental cost, the ability to automate the generation of full device collateral for a given assembly of chiplets is key. The Arteris Magillem tool takes care of this task within their multi-die tooling flow.

Interoperability is an ecosystem problem

The fastest path to a working heterogeneous chiplet system today often involves a small number of tightly collaborating partners. But the long-term vision echoed across the panel is for a broader ecosystem where chiplets from multiple vendors can be composed with confidence.

Achieving that vision requires:

  • Clear behavioral models, not just interface specifications
  • Verification IP (VIP) trusted by ecosystem participants
  • Capability negotiation mechanisms
  • Pre-silicon compliance flows
  • Standardized descriptions of system-level behavior

A critical future challenge discussed on the panel is dynamic capability discovery. In an open chiplet marketplace, devices may not fully implement entire protocol specifications, which today can extend to more than 1000 pages; they may implement subsets. Multi-die systems must therefore support negotiation to determine the capabilities of each chiplet and agree on a common language. Without that layer, interoperability remains extremely challenging.

Solutions must evolve to support heterogeneous interoperability not only at the PHY and protocol layers, but at the system abstraction layer. Arteris has decades of experience architecting cache-coherent, multi-cluster, and multi-die interconnect solutions. That expertise spans coherent and non-coherent fabrics and positions Arteris interconnect IP not as a connectivity block, but as the architectural control plane of a heterogeneous chiplet system.

The fastest path to first silicon success

When asked about the fastest path to “first flight” in heterogeneous chiplet systems, consensus pointed to holistic pre-silicon simulation and ecosystem collaboration. From an Arteris viewpoint, that means:

  • Architecting communication models intentionally.
  • Using configurable, production-proven interconnect IP.
  • Generating complete system specifications early.
  • Validating performance and thermal constraints pre-silicon.
  • Simulating (or even emulating, where feasible) the RTL of the entire multi-die device prior to tape-out.
  • Enabling debug visibility across die boundaries.

Chiplets fundamentally change system partitioning. But rather than eliminating the need for rigorous system architecture discipline, they amplify it.

Conclusion: Integration defines competitive advantage

While the technical hurdles are significant: signal integrity, power integrity, interoperability, and verification scale; the direction is clear.

AI and advanced computing demand heterogeneous integration. Heterogeneous integration demands scalable, energy-efficient data movement. And scalable data movement requires deliberate, system-level architecture.

The best way to make chiplets work is not merely connecting them. It is architecting how they work together. And in that domain, interconnect intelligence is no longer just infrastructure but a strategic differentiation.

To learn more about Arteris, visit arteris.com.



Leave a Reply


(Note: This name will be displayed publicly)