AI Accelerators Moving Out From Data Centers

Chiplets will be a key enabler for customizing designs at every level, from edge devices to the cloud. AI is a key driver, but it’s not the only one.

popularity

Experts At The Table: The explosion in AI data is driving chipmakers to look beyond a single planar SoC. Semiconductor Engineering sat down to discuss the need for more computing and the expanding role of chiplets with Marc Meunier, director of ecosystem development at Arm; Jason Lawley, director of product marketing for AI IP at Cadence; Paul Karazuba, vice president of marketing at Expedera; Alexander Petr, senior director at Keysight; Steve Roddy, chief marketing officer at Quadric; Russ Klein, program director for Siemens EDA’s High-Level Synthesis Division; and Frank Schirrmeister, executive director for strategic programs and system solutions at Synopsys. What follows are excerpts of that discussion.

L-R: Arm’s Meunier, Cadence’s Lawley, Expedera’s Karazuba, Keysight’s Petr, Quadric’s Roddy, Siemens’ Klein, Synopsys’ Schirrmeister.

SE: AI accelerator development is happening at a fast clip today. What are you seeing in terms of overall trends today for AI accelerator applications?

Meunier: From an application trend perspective in AI, we’re seeing a lot of growth in the cloud space with larger models, the introduction of multi-modal models, and the increase in the length of LLMs. That is driving compute and the need for more powerful systems, but running into headwinds around a limited power budget. So we’re seeing that play out in the large cloud space. We’re also seeing the proliferation of AI into the enterprise and the edge. The trend we’re seeing there is the need for diverse structures and diverse systems to address the various needs of different AI models in diverse applications.

Lawley: When we look at the application space, we see companies going to the extremes. Some companies are going down to the very-low-power, small, in-ear type devices where they want to add AI. There are also companies going to the other extreme with ADAS, where they’ve got huge video streams and they’re trying to calculate all of those. The one thing that’s common among all of them is that power is always important. So no matter what the application is, they all have a certain power budget. We call that physical AI — this opportunity for AI to go out where it’s not connected to power. A lot of our focus is trying to address those spaces, both with our IP as well as with our chiplet strategy.

Karazuba: What we’re seeing in terms of AI accelerator trends is a movement to larger and larger models. It’s movement to the unknown, where maybe there were pre-subscribed models in the past — but now, with how rapid LLMs, VLMs and other such models are evolving, the expectation is that AI is going to be able to handle all of this. That means giant, highly functional software stacks, giant highly functional hardware cores. At the edge, it’s a little bit different. There’s certainly the desire to go to LLMs, and in many cases, there’s a desire to go to LLMs completely at the edge. But with that come the power problems that have been mentioned earlier, and we’re seeing a lot of customers move mostly to multi-modality models. We’re seeing a lot of customers explore small language models as an alternative to LLMs. We’re also seeing our customers moving to exploration of alternatives to transformers — things like Mamba, where you’re going to get lower memory usage. You’re going to get faster inference as a way to get around just the simple power and memory issues that you face with LLMs and VLMs.

Petr: For power management we see a lot of photonics coming in, just to have new technologies for stacking and communication so that the power can be better utilized. We see efforts to reduce the power by 1% to give a significant boost to the systems. On the edge, a big trend is the deployment of neural networks — not just LLMs — in autonomous systems. We see this in cars, in drones, and in humanoids. And we see this in 6G designed around neural networks being deployed in receivers and transmitters to do smart decisions on the fly. So we see the AI agentic space driving a lot of the hardware requirement at the edge.

Roddy: There are a couple key things we see. One is a growing recognition by most silicon chip designers that they have to deal with the uncertainty in models. Models are changing — LLMs, SLMs, VLMs — it doesn’t matter. Pretty much everyone’s woken up with the idea that, ‘It’s 2025, great. I’m going to design new silicon. I have no idea what my customers are going to run on this in three years when I have it produced.’ So, people are putting a premium on flexibility to be able to run whatever comes. The other thing we’re seeing is a premium on scalability, and this is where chiplets comes in. There are a heck of a lot of systems and applications where companies want to build base-model silicon with the bare minimum amount of AI they need, and some ability to scale up, whether it be a second chip or a chiplet. We clearly see that in automotive, where you’ve got the $100,000 car, the $50,000 car, and the entry level car, and people want to invest in a single platform and have some scalability. But you see it also in things like AI PCs and security camera-type applications. So the scalability and the flexibility are really two things we’re seeing repeated in multiple segments.

Klein: We have found a lot of resonance within our customer base in inferencing. People are trying to take inferences and put them in embedded systems, where they’ve got constrained compute or power budgets. Being able to offload that from a processor into a bespoke accelerator significantly improves performance and reduces power significantly, as well. So it addresses both of those. The increase in complexity of models is driving a lot of our customers to look at this level of customization, where in the past they could use IP or an array of processing elements. Now, that higher degree of customization allows them to get more performance and more efficiency out of it.

Schirrmeister: There are two categories of trends we see. The first is the application trend. Our industry is a fun one with laws, so in the application domain if you listen to people like Jensen Huang and so forth, they’re talking about three laws — the scaling laws that drive everything from the top. First, the pre-training scaling, the formal learning. Second, the post-training scaling, which is the mentoring, coaching, Third, the test-time scaling reasoning, which is the long thinking in the model. That’s on the application side driving things. From the data center through to the edges, through the networks, you have all kinds of requirements basically trickling down from there. The second category is at the bottom, if you will, at the implementation technique level, and that’s where you’re looking at compute, memory, and interconnect trends and loss. If you just look at the interconnect, the scale up/scale out type of things, you look into data centers with memories being in the way of getting data to the compute, the processing, and those laws. All that drives this never-before-seen need for customization of the AI accelerators, and then from there, the need for verification throughout the flow in all kinds of scopes that is workload-driven. So you want to take your AI workload at the end and see how your IP performs in the subsystem context, the chiplet context, and then the multi-die integration context within the system. It’s both constrained at the bottom and just growing like crazy on the top from the application perspective, leading to this need for cleverness in architectures.

SE: Will chiplets play a role in AI accelerators?

Meunier: The Chiplet Summit this year was an eye opener to the evolution of that technology and the interest in the market for chiplets, and it does go hand in hand with AI. AI becomes a big accelerator for chiplets. There’s still a lot to solve in that space in terms of how to package, and how we get to a point where there’s interoperability between chiplets. But what it offers to AI is the ability to tightly couple accelerators with compute, and to be able to do things that are otherwise limited in terms of efficiency or power. When you look at just an accelerator, while having it tightly coupled to a compute core, one of the areas that comes to mind is the ability to expand the memory footprint that you would normally have in an accelerator with HBM memory. For example, you can extend the memory footprint of the AI accelerator to leverage what is attached to the CPU — either DDR memory or CXL. So these optimizations are not only latency and speed. They include the ability to leverage what is found in the normal compute space, in addition to the accelerator, with minimal impact to latency and speed.

Schirrmeister: The chiplet is interesting, necessary, and unavoidable. And for us, as we are all prepared, it’s a welcome outcome of the complexity crisis. But it does have all these different effects, so we are partnering with Arm on the ecosystem side to bring together the protocols, the query and hub interfaces, the CHI implementations. Which versions, which features, do you support for these things? With things like the Server Base System Architecture (SBSA), you need to make sure you validate these results. I’m focused on verification, and the need for very specific verification techniques like interconnect verification, where you need to run lots of soak data, as Arm calls it, through essentially the compute unit. That compute unit is connected to something coherent. And chip-to-chip, it’s connected to the AI accelerator and driving all into that. But the AI accelerator potentially is not using coherency internally, so then you have to share the memory with the compute unit. From a verification perspective, you’ve just added another quadrillion cycles to verify. The chiplet area adds a new level of verification challenges to something that already was challenging and daunting. There are latency variations, more thermal effects, multi-effects coming from a package, and so forth.

Lawley: We talk a lot about the four C’s of chiplets. First, cost efficiency. That’s the ability to select the right process node for what the application requires. Second, customization. That’s being able to select for AI, especially the right size of AI that you want to put on a particular chiplet. Third, configurability. This allows you to pull in different chiplets and choose if you have your I/O and your compute, and now you need a different size AI. Well, for whatever application you’re targeting, you can pull in that AI. You want that ability next year, or three years from now, when a new architecture may be required to add in a new AI. Chiplets let you do that without having to disrupt your entire ecosystem of what you’re putting together. We’re looking at how to enable these things, not just from the AI point of view, but from the whole ecosystem point of view. We are working with partners to enable chiplets. It’s a system play more than anything else. So that was three C’s. I’ll let you guess what the fourth C is, and it might have something to do with my company’s name.

Roddy: Chiplets started in enormous, super-expensive, 800 watt gigantic conglomerations going into the data center, where you’ve got plenty of cooling, etc. But that is slowly going to work its way down. My phone still isn’t built with a $50 apps processor that uses four or five chiplets in a lower-cost package, but it’s coming. Today, it’s data center, it’s automotive, and maybe a few other esoteric markets. But we’re rapidly approaching that, and we’re going back to an interesting pivot in how we think of design starts in the world of SoCs. Who’s going to plunk down $200 million and build an expensive monolithic SoC at 3nm? Design starts are going to become as numerous as they used to be. I worked at LSI Logic 30-something years ago, and we would do 1,500 ASICs a year in one company. Per day, design starts were numerous because they were $50,000, and every systems company could do their own design start. We’re going to get back to that. When you get back to a level where most of the design is taking a half dozen or a dozen chiplets and putting them together in an inexpensive package, maybe one piece of logic you do yourself, maybe your analog, and maybe it’s in an older node that’s less expensive to do, and the very advanced node stuff will come in, say, modular compute. Arm will have chiplets with 1, 2, 4, 8, and 16 cores, and I’ll just pick whichever one I need. I don’t need to go redesign that part. I’ll just utilize that. That’s going to be a sea change, and AI is a key element of that, because right now there is the desire for modularity. The same thing will come to others. A decade from now, this discussion will be very interesting. There will be a whole host of other issues having to do with all those design starts and how to do all that.

Petr: Chiplets are not necessarily new. Within domains, chiplets have been used for a long, long time. The main driver was basically integration challenges. Technology did not align. We’ve seen that a lot. For instance, III-V was not able to go into CMOS. It was the same on the PCB level, where the whole industry was driving toward IP integration. Chiplet is the natural state in between, where you basically start stacking technologies on top of each other. One motivation is cost. Why do you need to go to the smallest node? It’s a waste of money if a different node can do this, and then you just package it up. So what is it you’re trying to do? How much flexibility are you trying to stitch together? You have different integration paths, depending on the cost and the power efficiency nowadays, and you choose the right level of integration. That’s what is driving chiplets today.

Related Reading
Mass Customization For AI Inference
The number of approaches to process AI inferencing is widening to deal with unique applications and larger and more complex models.
Chiplets Add New Power Issues
Well-understood challenges become much more complicated when SoCs are disaggregated.



Leave a Reply


(Note: This name will be displayed publicly)