Future-proofing AI Models

The rate of change in AI algorithms complicates the decision-making process about what to put in software, and how flexible the hardware needs to be.

popularity

Experts At The Table: Making sure AI accelerators can be updated for future requirements is becoming essential due to the rapid introduction of new models. Semiconductor Engineering sat down to discuss the challenges of future-proofing these designs with Marc Meunier, director of ecosystem development at Arm; Jason Lawley, director of product marketing for AI IP at Cadence; Paul Karazuba, vice president of marketing at Expedera; Alexander Petr, senior director at Keysight; Steve Roddy, chief marketing officer at Quadric; Russ Klein, program director for Siemens EDA‘s High-Level Synthesis Division; and Frank Schirrmeister, executive director for strategic programs and system solutions at Synopsys. What follows are excerpts of that discussion. Part one of this discussion is here.

L-R: Arm’s Meunier, Cadence’s Lawley, Expedera’s Karazuba, Keysight’s Petr, Quadric’s Roddy, Siemens’ Klein, Synopsys’ Schirrmeister.

SE: What portion of your customers want a menu of well-optimized models versus using their own custom models or the model of the month? Does that vary by application?

Roddy: We see it being very market-specific. For example, there are folks building microcontrollers, lower-end things, serving industrial markets, etc., where the end users seem to be happy with, ‘Just give me a menu of 10 or 12 different detectors, classifiers. I want to quickly grab something from the silicon or solution providers menu, be able to detect something in a manufacturing line, maybe do a little local retraining for a particular classification, and then I’m done. Let’s go.’ There also are the more vertically integrated folks, such as in automotive, where they only want proprietary models. Those become a distinguishing feature of the vehicle. ‘My car’s safer than the other guy’s car, and therefore I’m going to charge more for it.’ Or, ‘It’s my self-driving subscription,’ or whatever. Then there’s a whole different segment of people who just want some widgets they can use in an SDK. And there is a growing number of people, particularly at the higher performance end, who say, ‘I absolutely have to be able to do my own, roll my own, grab the latest, LLM.’ So who builds the models and who trains them?

Petr: A lot of that really comes back to who owns the models and who drives them. In the LLM space, only the big players have the budgets and the breadth and depth to develop those models, and a lot of people just try to do fine tuning or look at rack systems to utilize those. Trust is a big issue with neural networks, so whoever builds it needs to build trust. Hallucination is a continuous issue. Over-fitting/under-fitting in neural networks is an issue. One of the drivers I see is on the neural network side. If you want to deploy it into a system, may it be autonomous or connected to the cloud, you constantly need to be able to update that neural network because it’s a living, breathing thing. As soon as you have some new information, you need to update the models and redeploy them. Tesla, for example, has deployed a neural network for self-driving, and every time it encounters a new scenario, they have to retrain and redeploy the model, so whatever hardware you provide needs to be able to do the same. The same is true for all the LLMs. Every day there’s a new LLM. You can see them on your mobile phone. There are NPUs that continuously need to get updated if you don’t want to have a connection to the cloud. And there are cloud connections, too. Some military/defense customers are now talking about neural networks and LLMs, but they are very concerned about connectivity, so edge deployment is certainly top of mind, and continuous evolution of those neural networks needs to be built into the system. So coming back to Steve’s points, whatever we build needs to be flexible enough to outlive at least the next two generations.

Karazuba: The answer to this question has changed over the last 24 months. Two years ago, our first couple of production customers had a very limited set of models. They didn’t see a whole lot of expansion beyond those models, which were largely based on public models. Since then, customers started bringing in their own public models, as well as their own completely customized models, which include things like custom operators that they have developed in-house. Those are really independent of the market it’s been in. We have a customer doing in-ear design that developed their own custom model in the GOPS performance range, which doesn’t seem like it would be a thing. It seems like you’d have a custom accelerator just for that. At the same time, people are doing 8100 TOPS types of engines that are using a mix of custom, proprietary, and standard models. I don’t see much in the way of differentiation by end market with that. There is a certain percentage of customers who simply want me to show them a menu. ‘I want general-purpose edge type of stuff.’ Maybe you see that in long tail markets like industrial. But when you talk consumer devices or automotive, it’s generally, ‘I have quite a list. It’s going to expand, and I’m not even going to be able to tell you what’s in that list. A lot of the time, you’re just going to have to simply support what I have. So your software had better be flexible, and your hardware, as well.’

Lawley: We are seeing this transition with customers wanting to do their own custom, proprietary models, to the point where they don’t even want to share them with us so we can help them. A lot of times we’ll get customers that send us representative models that are public. We see the public models filling in for them to do their evaluations in the space. And this is where the software becomes incredibly important. The rate of change of the software and the operators, the different frameworks that people are using to do their development, is just huge. And then, to be able to take that and map that all onto our custom IPs, is a big challenge. And it’s a big investment for all of our companies to be able to do that, to keep up with the changing times.

Schirrmeister: We see a lot of specialization, but that often is not related to the application domain itself. It’s more about the characteristics in the application domain. Customization requires effort. Does that give you the return at the end? Or can you do something in between? When I hear ‘model,’ there’s always something else triggering it. With this notion of customizing IP, can I represent that in different ways and try it out without going to silicon? That’s the other side of those models, and just like the LLMs, it requires some type of specialization. There’s specialization of what’s going on at the edge versus in the data center, where you have to be much more specific with respect to accuracy and the number of bits used for compute. That translates into really fun verification, and it adds another couple billion cycles to verify all that. It also brings together the specialization by end market. This is a classic generalization/specialization problem, and the big question is whether you get sufficient ROI from it.

Klein: Our customers are almost exclusively bringing highly customized models into the mix, so if somebody is interested in a dozen off-the-shelf models, they’re usually going to go for a fairly generic accelerator. And there’s a large market for that. Where we’re interacting with people is the customers who have a very specific workload and they want a very specific solution. As other folks have said, there isn’t just one market where this fits in well. It’s really folks who are either looking for the highest level of performance or the highest level of efficiency. And that’s really where we’re seeing a lot of resonance and increased interest recently. So it seems to be more and more customization that folks are looking for in their models.

SE: Where’s the dividing line today in terms of where customers start to contemplate adding modular AI horsepower?

Roddy: With chiplets, there’s that uncertainty of the complexity of the models they want to run. This is driving people to figure out how to build this scalable thing. Clearly, in data centers and automotive, along with some other segments, people are trying to do 20 TOPS in the base and modularly add on 100. How do I make that happen? If you have multiple chips and spread the execution of a model over two or three different chips that are modularly added on, all are going into memory. You’ve got to synchronize those memory accesses, and that means the systems become quite complex. This uncertainty of what people need to run in five years when systems are in production is driving a ton of questions around how to be flexible and how to be scalable. Those are underlying a lot of the choices and a lot of the challenges that companies are facing.

Karazuba: There are two inflection points when you talk about doing your own processor or a co-processor. Performance is one of them. AI as a differentiating feature of a product is another reason you might do it. Certainly, if someone has a system that has a 1 to 2 TOPS NPU that’s built into it, and if that NPU is capable of solving their AI needs, it then really becomes a cost issue. Are you really going to invest in something else when you’re already paying for an application processor, or a CPU with an onboard NPU, or an MCU that has AI processing capabilities? Doing something different, or buying another chip, often isn’t going to pass the bill of materials test. But if you want to differentiate your product with AI, a co-processor, a chiplet type of situation, something like that may make financial sense — especially if you’re looking at running on a smartphone, for example. The NPUs that are in smartphones today largely were designed 3, 4, or 5 years ago, just based on the design cycle of an application processor. Therefore, those NPUs may not have the ability to run today’s LLMs. They may not have the ability to run today some of today’s most advanced networks. Whereas, investing in a co-processor or investing in a bolt-on chiplet that does have those capabilities because it’s built with a much newer generation of processing and software allows you to potentially differentiate your product with AI. So it’s not only a question of the number of TOPS. It’s also a question of the intent you have with your AI. If your AI is how you’re going to differentiate your product to your customers, that is a huge reason for the decision.

Petr: So the question is, ‘When do you want to differentiate?’ One of the most important reasons is performance. If you want to build the most performant system in a car, for example, you want to have the best self-driving system. You want to protect the people in a car. For a phone, you want to have the best user experience. It’s the same for networking, communication, and data centers. All of these are where we deploy AI to also start managing all the traffic we see. With communication, in general, all the standards that are going on in 5G and 6G are baking neural networks and AI into the protocols, and that’s because of performance needs. If you look at the data center discussions, what is driving the need is power, and the need to reduce the power needed to drive all of this. So yes, there’s a whole domain where you can choose. But in other areas, it’s mandatory. If you don’t have it, you don’t have a product.

Lawley: In terms of the customization, a lot of companies have a big decision to make as to whether they’re going to do an IP that’s off-the-shelf versus doing their own NPUs, where they’re doing their own IP. As they look down the road, they have to make a determination of whether doing their own something is financially viable, because then they have to maintain not just that architecture and that RTL, but the huge software cost that goes along with being able to compile for that NPUs and stay up to date with everything that’s changing. So there’s a real tradeoff that people have to think about. How important is it for them to differentiate at the architecture level versus how important, or easy, is it for them to differentiate at the model and the software level? We’re seeing more and more companies saying they’re going to work on the software level and going to leave the NPU architectures to the experts.

Schirrmeister: In this day and age, the ISA becomes much less of an issue for differentiation than the systemic effects. If I’m in the data center, the systemic differentiation of how to scale up, scale out, what interconnect do I use, and how do I distribute my workloads, are suddenly not key differentiators anymore. And on the verification side, with great freedom to differentiate comes great responsibility to verify, which includes a tradeoff in cost invested and time. If I’m making a change here in my chain — from ISA, IP chiplet, multi-die assembly, multi-die in board, board in rack, rack in whatever that thing’s called that Nvidia has with the doors on the outside — it creates a huge challenge for bringing all that together and verifying it at the right level. You have to be very careful. A change here better be differentiated enough to justify the cost to verify and have all the downstream effects sorted out.

Klein: While our customers are typically building things that are highly customized, one of the things that we keep pointing out to folks is that it’s still hardware design, and they can leave levels of programmability and scalability in that hardware that they’re building. They might have a set of features that is inherent to the problem we’re solving, and we can bake that into hardware, but we don’t know what future models are going to look like. So they can soft code some of those other things and augment with software to be able to keep that scalability for the future. That is something that can be built into these custom accelerators that our folks are building with high-level synthesis.

Meunier: I agree we’re not differentiating at the ISA level. In fact, a lot of our discussions with customers now are at the system level. And what’s interesting is the need for an analysis on the total cost of ownership of the system. When you start playing with the levers of adding more flexibility, making your design more constrained to a specific application, look at your power envelope and what you’re going to use for the duration of your server life. You come into an equation that really is a question of total cost. Can you fit this into your cost budget? And do you see enough runway, given the rapid change in models, to support your business? These are analyses that are interesting, and we’re doing that with partners and customers.

Read part one of the discussion:
AI Accelerators Moving Out From Data Centers
Chiplets will be a key enabler for customizing designs at every level, from edge devices to the cloud. AI is a key driver, but it’s not the only one.



Leave a Reply


(Note: This name will be displayed publicly)