Edge AI, GenAI, and next-gen communications are adding more workloads to phones that are already under pressure to deliver high performance and low power.
Leading smart phone vendors are struggling to keep pace with the rising compute and power demands of localized generative AI, standard phone functions, and the need to move more data back and forth between handsets and the cloud.
In addition to edge functions, such as facial recognition and other on-device apps, phones must accommodate a continuous stream of new communications protocols, and systems and applications updates. And they need to do all of this on a single battery charge, while still remaining cool in a user’s hands or next to their face.
Fig. 1: Mobile phone circuit board with the system on chip (SoC) shown top right, containing Arm CPUs and other components. Source: Arm
“If you look at any premium mobile phone configurations, you will see that all the SoCs have a heterogeneous architecture, they have different blocks doing different things, but also working together,” said Vitali Liouti, senior director of segment strategy, product management at Imagination Technologies. “From a system perspective, that’s what every mobile SoC maker does. They look at the systems heterogeneously, and they look at it from a platform perspective, both hardware and software.”
Designing SoCs for the mobile market has become increasingly complex due to the rapid evolution of AI networks and the growing diversity of AI model requirements, said Amol Borkar, director of product management and marketing for Tensilica DSPs in the Silicon Solutions Group at Cadence. “Unlike traditional workloads, AI models — especially large language models (LLMs) and transformer variants — are constantly evolving in architecture, size, and computational demands. This creates a moving target for chip designers, who must hardwire support for future AI capabilities into silicon that cannot be changed once fabricated. The challenge is further intensified by the need to support both ends of the AI spectrum — massive cloud-based models, and compact, efficient models like TinyLlama that are optimized for on-device inference. These smaller LLMs are critical for enabling intelligent features on mobile and embedded devices, where power and memory constraints are tight.”
In addition to keeping the SoC system perspective in mind, AI is driving changes to individual processors and what tasks they are assigned.
“The biggest changes going on right now follow two vectors,” said John Weil, vice president and general manager for IoT and edge AI processors at Synaptics. “There are enhancements in CPU architectures that are happening in the Arm ecosystem as well as in RISC-V. People are adding vector math unit blocks to accelerate various math functions that are needed for transformer-based models. The second path involves neural processor enhancements, which can be thought of like GPUs but focused specifically on edge AI model acceleration. These are mostly vector math units that are designed to accelerate various operands within the model. If you look at the Arm Tensor Operator Set Architecture (TOSA) specification, there are a variety of these AI operands that exist in that, and people are writing acceleration routines just like you would a GPU with Open GL.”
Fig. 2: A Mobile SoC design where the AI accelerator could be another GPU, an NPU, or a high-end ASIC. Source: Synopsys
Over the last few years, both GPU and NPU designs have been refreshed regularly to keep up with new use cases. GPUs typically occupy about 25% of silicon area in premium phones, and the NPU also has grown in size to take a lot of the workloads, noted Imagination’s Liouti. “Depending on the workload, you will have areas where the NPU will be king, or you would have to break the problem into some layers in NPU, some layers in GPU. The NPU has become an integral part for everything that requires very little power. For anything that would be always on, the NPU is the right thing. And you still have to have a good CPU, because it does take a lot of the initial work, as well as the management. If the CPU is not performant, it doesn’t matter how big your GPU or NPU is.”
The key focus is power efficiency for any form of parallel processing, whether it’s graphics, generic compute, or AI specific. “We’ve been looking at our ALU engines and completely redesigning and retuning them for very power-efficient number crunching,” said Kristof Beets, vice president of technology insights at Imagination. “The next thing is to bring more techniques from the NPU space into the GPU — data types that are more dedicated processing pipelines to deliver enough performance. Also, we need to be scalable across our customer base. We can’t ignore the developer community because how do we give people access to this? How can we make sure things run out-of-the-box? Then, how can you efficiently optimize and tune them?”
Overall, designing AI into chips has gotten easier. “Five-plus years ago it was, ‘Oh my gosh, I hear this AI thing is coming. I don’t know what to do. We don’t even have any data scientists. Do I need to hire a team of data scientists to figure this thing out?’ And maybe that was true a decade ago. That’s absolutely not true anymore,” said Steve Tateosian, senior vice president of IoT, consumer, and industrial MCUs at Infineon. “If we talk about the developer side of it, I’ve got a whole team of PhD DSP engineers tuning my audio front end. Now, maybe you have a handful of — I won’t even say AI engineers, because they’re just engineers — development engineers who know how to use the AI tooling to create these models. And the other thing that has improved drastically over the last 5 to 10 years is the tooling, which includes development workflow for engineers to take their data, label their data, create a model, test their model, and optimize their model to the end device. A lot of the most specialized knowledge has been built into these tools so that it becomes a lot more accessible to a wider range of developers to create these applications or these models.”
Everything visual, wireless, and touch
Along with more AI, there is a growing trend toward a more visual format. That requires more processing power than the traditional text format.
“It used to be computers or text-based interfaces. Now everything is video, or full graphic interfaces, and they are much more computationally demanding,” said Marc Swinnen, director of product marketing, Ansys. “A lot of computation goes into managing the video, both in and out — input from the screens and output through 1080p and so on.”
In addition, everything is wireless, and so the analog content has gone up on mobile phones. “A phone these days has about six antennas in it — it’s crazy,” said Swinnen. “All of these high frequency telecommunications capabilities, ranging from Wi-Fi, 5G, Bluetooth, AirDrop, have their own frequencies, their own chips, their own antennas.
The fact that communications standards are ever-evolving presents a further challenge for SoC designers.
“The big thing is to enable the AI use cases and get UFS off the ground, accelerating the spec,” said Hezi Saar, executive director of product management for mobile, automotive, and consumer IP at Synopsys, and chair of the MIPI Alliance. “MIPI Alliance was able to accelerate it by one year, so that’s really reducing the risk. People are defining it right now. SoCs and IP vendors need to develop their IP while the spec is being developed. They need to tape out and have silicon with partial spec and plan for the next one, plan for interoperability, and plan for building the ecosystem while we work. In the past, it wasn’t like that. There used to be a certain frequency of spec evolution. Every two years there was a spec. But everything has been condensed because AI is more of a software thing, and it impacts the hardware. Hardware is not software.”
Fig. 3: The use case that is being implemented in smartphones, where the LLM or AI engine needs to be available on the storage. Source: Synopsys
“When you turn on a device, most of that model needs to find itself in the DRAM, and that means the read connectivity from the UFS device to the SoC needs to be very efficient,” said Saar. “That’s latency. You can’t push that button and ask whatever question, and wait two seconds. Of course, there are more ways to make it. You don’t need to read the whole thing. You can do partial reads. But all of these kinds of systems have the data here, and I need to push it fast to the DRAM. I have the LLM running, let’s say, an accelerator on chip. But I need that to be connected to the DRAM to do the computation, and now get it back to the user, so they listen back to the audio. In mobile, it has to be very, very efficient. Power is extremely important. And they will reduce the transfers. I will put the UFS device in sleep mode as much as they can. I expect both the storage connectivity and the DRAM connectivity to continue to evolve very, very fast — much faster than before.”
Adding to the complexity are the rise of multi-modal models and GenAI tools like Stable Diffusion, which combine text, image, and sometimes audio processing into unified architectures. “These models require a flexible and efficient compute fabric capable of handling diverse data types and execution patterns,” said Cadence’s Borkar. “To remain resilient in the face of uncertainty and rapid AI evolution, the AI subsystem must be designed with future-proofing in mind. This often involves integrating programmable IP blocks alongside the NPU, allowing SoCs to adapt to new model architectures and workloads post-silicon. Supporting such a wide range of AI use cases requires SoCs to be not only powerful and efficient, but also architecturally agile, making AI-centric chip design one of the most dynamic and challenging frontiers in mobile computing.”
Another use case for algorithms in phones is determining what is and isn’t a meaningful touch on the screen, whether it’s a “candy bar” block phone or a foldable phone, which has extra challenges due to its very thin screen.
“When the display gets thin, the touch layer on the top has to get much, much closer to the noisy display layer,” said Sam Toba, director of product marketing at Synaptics. “We have to deal with a lot of the display noise coming from the individual pixels. That becomes a problem on very, very thin types of displays. The background is so thin that it becomes higher capacitance due to the plates becoming closer and closer. It’s a big problem because when you sense touch, it’s sensing a very, very small capacitance, and you have so much background capacitance, so determining a valid finger signal from this big noise, capacitive noise becomes even more difficult in thin panels.”
The very low-power chip needs to decide what signals are meaningful and only then wake up the host SoC. “If the host had to detect the touch, it would spend a lot of power just looking for it, which means you’re always on. Most of these touches have to be rejected.”
Local processing for AI functions and models
Phones contain numerous AI applications, and the list keeps growing. Whenever possible, the processing should happen on the phone so only distilled amounts of information are sent to the cloud, noted Ansys’ Swinnen. For example, machine learning functions such as facial recognition or photo editing should be processed close to the camera.
Inference requests for GenAI models, such as ChatGPT or agentic AI assistants, also can be processed locally. AI models have become more effective and compact, so they’re able to be stored on the device if they’re a couple of megabytes, kilobytes, or gigabytes, depending on which model and device you’re talking about, noted Synopsys’ Saar.
Local, on-device processing brings several advantages. “By putting the AI hardware into these mobile devices, they can do the large language model inferencing right there in the device itself,” said Ron Squiers, solution networking specialist at Siemens Digital Industries Software. “Instead of sending transactions back up to the cloud to do the heavy lifting for the inferencing there, you just do it at the edge. That has a double whammy benefit of lower latency, better real-time response, better closed-loop servo control, and it allows better privacy conditions for data that’s being generated locally at the edge.”
Others agree. “You’re not sending the data to the cloud, so there’s a reduced power aspect of that, and a reduced cost aspect,” said Infineon’s Tateosian. “Some edge AI applications could add intelligence without adding the cost of connectivity, or they could reduce the amount of connectivity. This means reducing cloud connectivity and reducing the amount of power in the end device.”
The era of hyper-optimization means designers need to go to the lowest levels of technical debt in order to get more oomph and performance out of the phone, said Imagination’s Liouti. “Data movement causes 78% of the power consumption. The main focus of ours is, ‘How do you reduce this data movement?’ That can be done at the GPU level, and that’s where we focus, but it also can be done at the platform level, at the SoC level. We have to generate technology that is extremely advanced in order to reduce data movement, and that gets even more complicated with neural networks — especially with the large ones, because those require a significant amount of data.”
While more on-device AI processing is happening, some things will still run on the cloud due to battery and power constraints. “You will always have to pick and choose,” said Liouti. “This is just the start of a big journey, and the situation will be extremely different in a few years. We’ve just scratched the surface. I see the transformer as a fundamental block for something much bigger. For now, we need to separate the hype from the reality. Let’s take the example of image generation models that run locally now on mobile. The reality is they’re way less performant than the ones you can find by using Midjourney [on your PC]. That is not going to be the case in a few years.”
Better GPUs will be part of the solution. “In mobile, we can turn that extra power saving into a higher clock frequency and higher performance because we can stay within the same power and thermal budget,” said Imagination’s Beets.
However, Infineon’s Tateosian observed that the phone user experience doesn’t change much between each new release. “Even though there’s more performance and more memory in these devices, the software is gobbling that up.”
Conclusion
There are several key trends driving changes in mobile SoC design.
“Rising analog, everything becoming video and AI, and the HPC demands of today’s applications are such that they need a lot of computational power on the chips,” said Ansys’ Swinnen. “Those are driving these SoC developments, but mobile phone makers are constrained by the fact that they need to keep it low power and small form factor, and then they are more constrained by economics than some of the other companies like NVIDIA with their GPUs. For them, it’s all about performance, and if it costs a bit more, so be it. But not so much for a cell phone chip. It has to be cheap to manufacture in the millions.”
Designers must be sure to approach the SoC from both the hardware and software perspectives. “Anyone who forgets about this loses,” said Imagination’s Liouti. “We have to look at it when you think about language models, the bunch of layers, and using operations. That sounds simple, but it is not. Essentially, you have to find the most optimal way to do the mathematics using the hardware in order to make sure that your solution is at the top, because we’re competing against giants. You have to do hardware-software co-design, and one engineer won’t be able to do that alone. It has to be a number of different disciplines, some of them completely unrelated.”
Related Reading
AI Drives IC Design Shifts At The Edge
Rollout of artificial intelligence has created a whole new set of challenges, along with a dizzying array of innovative options and tradeoffs.
AI Races To The Edge
Inferencing and some training are being pushed to smaller devices as AI spreads to new applications.
Preparing For An AI-Driven Future In Chips
Designs need to be flexible enough to handle an onslaught of continuous and rapid changes, but secure enough to protect data.
Leave a Reply