There are challenges and solutions for processing AI workloads on-device to achieve consistent performance, reduce costs, and enhance data privacy and security.
Experts At The Table: Semiconductor Engineering gathered a group of experts to discuss why some AI workloads are better suited for on-device processing to achieve consistent performance, avoid network connectivity issues, reduce cloud computing costs, and ensure privacy. The panel included Frank Ferro, group director in the Silicon Solutions Group at Cadence; Eduardo Montanez, vice president and head of PSOC Edge Microcontrollers & Edge AI Solutions, IoT, Wireless and Compute Business at Infineon; Alexander Petr, senior director at Keysight; Raj Uppala, senior director of marketing and partnerships for Silicon IP at Rambus; Niranjan Sitapure, central AI product manager at Siemens EDA; and Gordon Cooper, principal product manager at Synopsys. What follows are excerpts of that discussion. To read part one, click here. Part two is here.

L-R: Cadence’s Ferro, Infineon’s Montanez, Keysight’s Petr, Rambus’ Uppala, Siemens’ Sitapure, and Synopsys’ Cooper.
SE: Securing data requires understanding where it’s processed and stored. Is it better to store data at the edge?
Sitapure: There are two buckets. One is about the data that is being live-processed — that needs to stay on the device. Think about some of these robots that are going to come to your home. They’re scanning your home. It’s very sensitive data. Tesla FSD is an example. It is always tracking the driver, so it’s looking at you, and they’re processing it on the side. That data ideally should not be sent to the cloud unless there’s some sharing requirement. But because these AI models, and all these applications, are in millions of devices that are not trackable, or unable to be monitored in a central location, you need to have some component of the data that can be sent back to the mothership, or the central data warehouse, for fine-tuning the models or making them better. In a Tesla FSD example, when you know it’s doing full autonomous driving, sometimes the user does something wrong and moves the steering wheel. The incident is logged in, but that data in an anonymized format can be sent back to the data center for saying, ‘Hey, we need to improve the model.’
Montanez: Memory is a very important topic. It’s become one of the bigger challenges when it comes to bringing more to the edge. I think we have a memory problem, both in security and the need for even more memory. Infineon is investing in is co-packaging DRAM to give the edge solutions more capability to grow with these models, and the amount of data required for these models, while keeping them co-packaged and being able to give an extra level of security when it comes to the model and the data that’s being injected into the edge device. We need to continue to address memory and provide more security for it.
Ferro: From the memory standpoint, we constantly look at how we create secure memory regions, even on the device itself, and how we create secure boots and things like that, including taking up some of the memory bandwidth specifically for security. It’s always the cost tradeoff. Unfortunately, security usually doesn’t get invested in until you’ve been hit by [an attack]. Everyone wants a secure piece of the memory, but that investment is balanced against the cost. Still, I do see it starting to increase more, where we create more secure portions of the memory regions.
Uppala: There are different types of data. You have data in motion and data in use. The first two could be addressed with encryption. Especially for the data in motion, you utilize encryption, and for data at rest, you store it in an encrypted fashion and have different roots of trust and things like that. But recently, there was a Trusted Execution Environment (TEE) fail attack, which pretty much bypassed the security for all the Intel, Nvidia, and AMD confidential compute infrastructures. The way you avoid that is to encrypt and decrypt only when it’s necessary, and that goes all the way from when it’s being transmitted, when it’s in use, and in the memory itself. That was one of the recent ways that they got around that requirement in a confidential compute scenario with the TEE fail. But again, beyond that, you will have to look at authentication, attestation, and role-based access control. It just doesn’t stop at one point. It must be the whole surface that you need to protect, and that could be from a physical perspective, or a side-channel attack, or even the supply chain and things like that. It needs to be a holistic thing, because it’s the weakest link that gets you access to the system.
Petr: Encryption now introduces a problem that also drives the move to the edge, which is, ‘What happens if you lose a package in the transition?’ Most of the packages get lost in the network, not in the motherboard or the ICs or SiPs. There, the package management is fairly okay. But networks, as soon as you’re trying to send something from one device to another, there can be all kinds of external influences where you lose packages. Encrypted packages have a different challenge than unencrypted packages. By introducing encryption to solve one problem, you’re introducing another, which also explains why, when you want to do inferencing, you need to get as close to the device as possible to not have the additional headache of networking latencies or package loss.
Cooper: Reflecting on what Frank was saying, this is an interesting business challenge. Because if you’re going to protect this, you’re going to have hardware latencies that you’re introducing, and you’re going to have the security costs. Either you’re doing this because you had a catastrophe, and now you’re recovering from it, or you’re adding it ahead of time, and you’re marketing it and trying to make it part of your business case, or you leave it out and try to save costs and hope you don’t have the risk. So it’s an interesting business case to decide what layer of security to put on. That’s a challenge that every vendor is going to have to face, because it is a little painful to add. All those great performance metrics you have now get a little bit tweaked since you have to add security.
Petr: There’s a third driving force, which is regulation.
SE: What are the other important issues for chip architects and designers moving AI applications from the data center to the edge?
Sitapure: One of the points we discussed, but didn’t put a name to, is the industry evolution. It’s gone from being hardware-defined 20 years ago — like Intel would come up with a chip, and the software had to be adjusted for the hardware — to the software-defined paradigm, whether it’s the AI solutions or the different aspects of autonomous driving robotics, IoT, or whatever is defining the chip. NPUs and TPUs are a great example of that. If a robot has to do a workload on a battery, the chip has to be designed very specifically down to the granular level to do those matrix calculations only. The GPUs are great, but GPUs are like an all-purpose workhorse. They may not be very specific to a robotics use case or a Tesla. To give another Tesla example, they used to use Nvidia RTX GPUs, which are great all-purpose, but now I believe they develop their own in-house NPUs for specific things. That paradigm shift of software-defined ICs, driving the new design of NPUs, TPUs, and chips in general, is becoming very apparent.
Ferro: That’s a good point, because the hardware always seems to be chasing the software now. We’re in that situation where we’re trying to get that balance of processor and memory, and it feels like we’re behind the curve all the time on the hardware side.
Cooper: The development tools are chasing the hardware, too. It’s easy to slap a bunch of multiply/accumulates down. It’s a lot harder to have a programmable solution. If the algorithms were fixed, you could put in a hard-wired ASIC, but it has to be programmable, and it has to be efficient for power and area. So, there’s this balance of how to make it programmable and yet as power and area efficient as possible, and I have to make it programmable in a sense that the algorithms change every month with some new LLM, or VLAs (vision language action models), VLMs, etc. It’s an interesting challenge to get that mix just right between power and area, efficiency, and flexibility.
Ferro: And cost. We’re fighting those boxes all the time.
Montanez: We need to be challenging ourselves to not just question [performance]. I get this from customers all the time. ‘What’s the amount of GOPS/TOPS that your product supports?’ We need to go one layer deeper here. The architecture has a huge contribution to this. The memory latencies, the bus architecture, all need to be optimized to get the most performance out of these highly capable GPUs, NPUs, etc. There’s a big aspect to the architecture and the type of performance you’re going to get at the edge, so we need to be challenging not only our customers, but ourselves, to really optimize these systems to get the most out of them.
Petr: I want to add complexity to that statement, which was just raised. We also have multiple vendors. We have multiple vendors on the software side, and we have multiple vendors on the hardware side. Take mobile phones, for example. If you buy a Samsung or an Apple, those are two vendors that will have their own NPUs. Now we want to have even neural networks on the communication layer, which in the 6G protocol are going to handle the beam forming, for example. So now you have two vendors with NPUs who need to manage the beam forming on the mobile device. Then you have different vendors like Vodafone or AT&T on the cellular tower side. So now you have multiple vendors on both devices, and they still need to find a hardware NPU that can manage different software neural networks to combine the metrics of how those things have to all work together. Try to design a hardware device that can scale, not just for one specific use case, but across vendors. That puts a massive number of challenges on us. I totally agree that hardware is chasing behind software right now. You also see this with why Nvidia is so successful. It’s not because they have the best hardware. They have CUDA. The CUDA mainframe is driving a lot of workloads to Nvidia, and they’re doing the same with the humanoids. They are doing the same with the world simulation tools, where you can train self-driving cars and all that kind of stuff. So the software stack drives a lot of requirements right now.
Sitapure: The new term is extreme co-design. Jensen was talking about this in his keynote [at GTC in Washington, D.C. on October 28, 2025], where we are getting toward this point of hardware/software. And what Alex was saying about working with multiple vendors is also extreme co-design. There, you can’t have silos to an extent. Now, you must work with the different vendors together and co-design the software, hardware, and all the other logistics of it. That’s why you have seen more partnerships like in 6G with Nvidia plus Nokia, and all those kinds of things coming up, where it’s exactly that hardware software co-design.
Uppala: That brings us to standards. Because things are moving so fast, standards kind of lag. And without standards, you have everyone doing their own custom designs, and that limits the ecosystem compatibility. We need standards so that the ecosystem can participate in a much broader manner. Custom things always limit the number of people, where you have some big players that can do their own thing. But that leaves everyone out of the ecosystem. Standards help level that playing field.
Related Reading
Moving AI Workloads To The Edge
There are benefits and challenges of processing AI workloads on-device to enhance performance, reduce costs, and ensure data privacy.
Optimizing AI Workloads For Edge Computing
Performance enhancement, cost reduction, data security, and improved energy efficiency are the end goals for optimizing AI workloads at the edge.
Leave a Reply