How to put the pieces together in a complex design with AI is an unsolved problem.
Experts At The Table: Semiconductor Engineering sat down to discuss the various ways that AI accelerators are being applied today with Marc Meunier, director of ecosystem development at Arm; Jason Lawley, director of product marketing for AI IP at Cadence; Paul Karazuba, vice president of marketing at Expedera; Alexander Petr, senior director at Keysight; Steve Roddy, chief marketing officer at Quadric; Russ Klein, program director for Siemens EDA’s High-Level Synthesis Division; and Frank Schirrmeister, executive director for strategic programs and system solutions at Synopsys. What follows are excerpts of that discussion. Part one of this discussion is here. Part two is here.
L-R: Arm’s Meunier, Cadence’s Lawley, Expedera’s Karazuba, Keysight’s Petr, Quadric’s Roddy, Siemens’ Klein, Synopsys’ Schirrmeister.
SE: What is the current thinking on how to connect AI accelerators? What does that look like today?
Petr: It’s evolving. Tremendous effort is going into photonics, and all the big foundries are investing in photonic technologies. The question is, where do you want to use it? Do you want to use it to connect all the different elements at a chiplet level? Do you want to connect the NPUs between each other to just give more parallelization or scale? Do you want to connect the blades? Do you want to connect the servers? How far are you going to go? If you go to the networking trade shows, that’s one of the big topics. Photonics are an option to reduce the power. How do you combine those technologies? How do you manage the data traffic between all the different computes? There are questions around memory. We heard earlier that now you have different units competing for the same memory. Those challenges are basically what we’re seeing, and that also translates into chiplets. How can this be integrated further? We also could argue that some companies already are looking at IC integration to be even more efficient on this level.
Lawley: This is a tough problem that is yet to be figured out. When you asked how they are going to get connected, I was going to say, ‘carefully.’ You can connect accelerators with standards, either UCIe or AXI or something like that. There also are proprietary ways, where maybe you have one connection via a standard way, and then you have proprietary ways that you’re connecting your NPUs to reduce that back and forth and getting to system memory and getting to DDR memory. A big challenge of our time is not going to be necessarily the compute. It’s going to be the connectivity and the data movement and the storage.
SE: What does it take to move everything forward? We know how to do it technically. Isn’t it really just deciding amongst ourselves how to do it to enable the commercial chiplet market?
Meunier: At Arm we’re involved in a lot of the open standards, and hoping to push and see standards drive forward so there are proper specs and verification methodologies for these interfaces. At the same time, we’re working with partners to build that into products so we can get real life experience and evolve this toward, hopefully, a coalescing of standards and interfaces.
Schirrmeister: That also depends on the cohorts and the industry. If you have an initiative around automotive, around data centers, there is a good reason why you can’t just say, ‘I’m using this because…and it depends on the level where…’ You can have a discussion about UCIe versus Bunch of Wires and custom implementations, but the thing I always find interesting is that the stuff behind matters just as much. Which version of CHI do you use? Can we all agree on that? I don’t know. Things like memory tagging extensions, which are heavy to implement, are very valuable. But they have a downstream effect on how you implement and how you how you deal with things like that. All of that depends on which domain you are in and what you connect to. I would be very surprised, if we repeat this roundtable three years from now, if all of you have agreed and are implementing standards, because you have these domain-specific requirements that drive this.
Roddy: It’s very system- and domain-specific. A lot of the talk about accelerators is data center-centric. But even in data centers, you’ve got something that fits in the building, or you’ve got something with the Stargate thing that they’re talking about building in Texas that’s going to be half the size of the island of Manhattan. Your accelerator that you are trying to get to might be two miles away, as opposed to a car, which has accelerators. We’re speaking with auto OEMs who are talking about putting three petaOPS of compute in a car for real L5. That used to be a supercomputer 5 or 10, years ago. They would light up a petaOP supercomputer and say, ‘Oh, we’re at the top of the Top 500 Supercomputer list here over at Lawrence Livermore Lab.’ By 2035 you may be driving that thing. So it’s all about scale, system, and specialization. That means there are lots of different interconnect technologies that are going to come to play.
Petr: What about humanoids? The amount of data they have to compute to interact with humans will be even more than that. You see research going into that one. Also, our smartphones will be supercomputers in the future, and we will have accelerators at every sensor, IoT, and other systems. You need to break the problem down. Either you route it all back to an even bigger system that’s going to be your brain and does the compute, or you try to democratize the problem and have individual systems making their own decisions.
SE: So how does this look two years from now?
Lawley: Niels Bohr once said about quantum, ‘If you think you understand quantum, then you don’t understand quantum.’ It might be we could substitute AI here.
Karazuba: This industry has been talking about how we’re going to interconnect everything in chiplets for half a decade now. UCIe 1.0 has existed for 2.5 years, but we’re still questioning how this is all going to plug together — not just in chiplets, but in general. As an IP maker, my job is to make sure that I can connect into the host processor, or the host chip, in the way that’s best for them. Everyone has to figure that out for themselves. I’m not sure there’s a simple answer to this question, and I have no idea what a timeline would be for when we’ll know. You’re also talking AI. Customers need to make decisions about whether their product is going to be in market for a half decade or a decade. So how in the world are they going to make decisions about what models they’re going to need to support in five years? Because this is all going to be field-upgradeable. The interconnect decision is certainly a decision they have to make now, but it’s probably not in their top five toughest decisions right now. There’s lots of uncertainty. AI is based upon evolution in many ways, and what we’re going to see is an evolution. People are going to make bets, they’re going to make guesses, and the ones that succeed are going to be the ones that move forward. The ones that don’t are going to go back and try something different.
Petr: Paul hinted at one problem. We’re developing standards. Why did we build the tooling and the hardware at the same time? In the past, it was a bit more serialized. Now it’s all parallelized, and I still committees being formed to work on standards while we’re already running ahead and building everything around it. We threw security out when OpenAI released the first LLM. Customers believe that if they don’t keep up with the current development, it could be an end game scenario. So again, it’s one of those things where in the past, we serialized this and we had great verification and standards for this. Now, we’re basically saying, ‘Quick, quick, it needs to get out without having fully thought this through.’ The whole hallucination issue gets partly pushed under the carpet in a lot of areas. The fact that neural networks can over- or under-fit is not really mentioned. What happens in those scenarios, especially if you put this in applications where they interact with humans and have real implications on human life?
Read parts one and two of the discussion:
AI Accelerators Moving Out From Data Centers
Chiplets will be a key enabler for customizing designs at every level, from edge devices to the cloud. AI is a key driver, but it’s not the only one.
Future-proofing AI Models
The rate of change in AI algorithms complicates the decision-making process about what to put in software, and how flexible the hardware needs to be.
Leave a Reply