Small Language Models Create New Security Risks

Edge AI will improve performance, reduce power, and keep data local, but the risk equation changes.

popularity

The rollout of edge AI is creating new security risks due to a mix of small language models (SLMs), their integration into increasingly complex hardware, and the behavior and interactions of both over time.

AI data centers still garner the most attention due to massive investments and an ongoing flood of deals and acquisitions, but the edge is quietly starting to take shape for several reasons. It takes too much energy and too much time to send data to the cloud and back, and it’s easier to secure data if it remains on-premises. As a result, more localized compute capabilities are being implemented across a broad swath of vertical market segments, opening up new opportunities within and alongside those markets by scaling down language models for both training and inference.

“Small language models have started to become more capable,” said Niranjan Sitapure, group product manager for generative AI at Siemens Digital Industries Software. “The voice models on your phone, or even small image models, can make small changes. And a few months back, we saw the first few series of small language models that can do reasoning, which is required for math and coding.”

In general, the edge is considered anything outside of the cloud, although definitions and boundaries are fluid. Edge devices include everything from large, on-premise data centers to ADAS-enabled cars and smart doorbells, and SLMs and scaled-down LLMs are being developed that are finely tuned to these different segments.

“The SLM space is now getting as crowded as the LLM space, and the reason is that they are very task-specific,” said Sitapure. “One of the challenges is the orchestration, and at a broader level, how people think about which models should be used or fine-tuned for a specific task on their device. For example, SLMs have multi-lingual capabilities, and so do you prioritize a model that’s multi-lingual in five languages or 50 languages? Those decisions have started to become more complex as the model space gets more crowded, or as the wattage gets constrained.”

This has broad implications for both hardware and software security. SLMs can self-optimize, update, and interact in novel ways, and they can interact with other SLMs in the same or different devices. That, in turn, widens the attack surface in unpredictable ways. So while automakers have committed significant resources to ensuring the security of safety-critical systems, security is far less consistent for SLMs used in IoT devices and other emerging markets.

“When you move out of the data center, putting models and all of the compute on people’s devices, then for the most part you no longer have professional IT management providing perimeter defenses that you have with server deployments,” said Dana Neustadter, senior director of product management for Security IP Solutions at Synopsys. “Coming up with a specialized model is really expensive because you’re taking the data sets, customizing them, and in some cases retraining them or augmenting the training of the data sets.”

Moreover, because these markets are so different, there is little overlap in models today. “There isn’t a notion of economizing to build a shared model with augmented views, which allows you to economize on the data to create a fairly complete but general model and then the highly trained augmentation of that for particular purposes,” Neustadter said. “In order to keep them isolated and behaving, as they were intended by their developers, you need to have strong isolation in the platforms. There are some benefits to doing that in hardware, particularly as you get into things that are more safety-critical for the protection of data. It’s a bit like having a single key vault in a mobile device. You know the keys are for different domains, so you depend on the applications to keep them separate from each other.”

The competition to be first to market with new language models is no less intense than in AI data centers. As with LLMs, the speed of delivery for SLMs is far outpacing the ability to develop standards, regulations, or even best practices. But there is at least some general understanding of what needs to be done, where the potential security holes are, and how to address them.

“Security frameworks and AI systems are evolving,” said Eddie Ramirez, vice president of marketing in Arm’s Infrastructure Business. “We’ve spent a lot of time at Arm on confidential computing, where the goal is to ensure that you’re driving security at the virtualization level. Anything that’s virtualized has its own memory space that’s encrypted. It has its own keys. Another process won’t ever be able to see what this one process does in a virtualized, containerized world, which is preventing the leakage.”

Others agree. “In terms of product maturity, it’s going to go through a lot of stringent tightening up,” said Amol Borkar, automotive segment senior director for DSP product management and marketing at Cadence. “You’ll have a trusted island or domain that has a memory-mapped area that you can’t access, and the devices can’t talk to all the applications that are not secured. Or, if you don’t have a security bit enabled, then you can’t access certain points of memory. In the evolution of these models, there’s probably going to be some framework in place for that. If these devices are learning, can they co-learn? Can they share the learnings between each other? Security privileges will be assigned to these specific models. They might be controlled by different types of vendors — the guys who generate the models — or on the device itself, where they have certain levels of privileges that they can access and talk to each other. We are already seeing that now because we have access to Copilot within Office, Word, and Outlook. In some cases, they’re allowed to interact with other models. And every day, the responses are slightly different.”

Protection against what?
A big problem for SLMs in particular, and edge AI in general, is the newness of localized AI. Much of the focus so far involves shrinking language models to several billion parameters or less in order to be able to run them on battery-powered devices. In many cases, security is a secondary concern.

The guardrails that exist today, such as MITRE’s Common Weakness Enumeration list, are a good way to identify known vulnerabilities, but in many cases they don’t apply to small language models — or to encryption algorithms developed for edge devices that are supposed to be resilient enough to withstand post-quantum cryptography, which will be essential for devices that are expected to be in the field for a decade or more.

“The MITRE CWE list keeps growing,” said Scott Best, senior director of silicon security products at Rambus. “If you dig into some of those CWEs, to some degree, they end up becoming a description of the symptom and not the problem itself.  So it’s difficult to navigate the CWE list and say, ‘Does this apply to the chip or product I’m delivering to my customer?’ Our customers want to know, and we definitely have some that ask, how our security IP performs against the MITRE CWE list. We tell them that 80% of the CWEs on the hardware side do not apply at the IP level. If I’m delivering a block that is going to accelerate Dilithium, there are very few CWEs that even apply at that level of granularity. And if you tell them 68% of the CWEs don’t apply, then they assume you’re self-scoring at 32%.”

Once a cyberattack breaches security, there are multiple options for attackers. They can insert malware into a system to shift its behavior, which is particularly concerning when it comes to autonomous vehicles, drones, or humanoid robots. They also can steal data, whether that’s data collected through sensors or the training data for an SLM.

“Security and safety are a big problem,” said Ram Natarajan, vice president of MLSoC platform architecture at SiMa. “We do a lot of encryption to make sure the weights in the DRAM cannot be explored or decrypted. You want to make sure somebody doesn’t steal your weights. But there’s also information coming in through the sensors. How do you protect those and make sure they don’t get tabbed and diverted into your cloud? So there’s weight security, and there’s data security. And every time we look at this, there’s a penalty to the cost, the power, and the performance, so that has to be balanced against all of this.”

Moreover, the typical assumption of how people interact with machines is changing. The goal is for the machine to learn, not for a human to learn how to use it. One of the benefits of all language models — large, small, edge, micro — is to change the input from text/code to natural language and/or gestures.

“The second that I have to learn, as a human, how to use this thing because it’s a static model that maybe was trained on a data set that doesn’t operate or speak the way I speak, or move the way I move, then I have to change my behavior to interact with it,” said Steve Tateosian, senior vice president of IoT, consumer, and industrial MCUs at Infineon. “That’s human learning. We really want to get to the point where it is machine learning about what’s important to the human or how the human interacts with it. It makes that experience easier.”

But that also opens the door to very different kinds of attacks. Infineon, at its recent OktoberTech conference, showed off a humanoid robot that could, within minutes, exactly copy the voice patterns and intonations of a person it converses with.

Added to all of this is the relationship between the size of language models and the accuracy of results, which can create different kinds of security issues. As large language models (LLMs) are shrunk down to fit inside edge devices, or smaller models are developed for specific domains and workloads, the potential for errors increases. SLMs, by definition, are less accurate than LLMs, often based on FP16 or FP8, versus FP32 for the LLMs. That allows them to run on mobile devices powered by a battery, but the quantization takes a toll on accuracy.

Taking responsibility
One of the big changes involves responsibility when there is a successful attack. The EU’s Cyber Resilience Act, for example, imposes huge fines for failure to comply with regulations involving either hardware, software, or both. And OCP SAFE shifts the security burden to the makers of processors and peripherals with updatable software.

These efforts basically amount to a shift left for system security, which already is underway in the automotive sector.

“There are three key elements in automotive,” said Marc Serughetti, vice president of product management and applications engineering at Synopsys. “On the edge with cars and robots, you need the safety aspect with the safety island. The second piece is that you need to bring new AI capabilities into your product. And the third one is that you need to secure all of that. With these three areas, you go from the architecture to validation to things like over-the-air updates, which they are trying to do in the automotive market, or any market. That’s the purpose of software-defined products. You need to have that security, and that security has to impact the models that are being used in those edge devices.”

In addition, while there is a tendency to think that every edge device needs an SLM, that isn’t always the best approach.

“There are two elements here,” said Satish Ganesan, senior vice president and general manager for Intelligent Sensing, and chief strategy officer at Synaptics. “For sensing, many of the solutions like smartphones and PCs have an application processor in there. It may be from TI, Intel, Qualcomm, or somebody else, but for those particular items, it’s about how to make the human machine interface more intuitive, and not necessarily with an SLM. An SLM can run on the application process, but that’s different than what we’re talking about with AI native processing, like with our Astra, where it’s the main processor of the design. In a robotic arm, for example, that’s where you’re going to have workload-specific controls — mechanisms in those small language models that many have 70 million parameters versus a trillion parameters in an LLM. And that’s where you train it on very specific tasks. So if you have a cobot that’s doing a pick-and-place in an automation line that has to send something else different, you give it instructions. That’s different than using automation in an industrial setting today. SLMs are very workload-specific.”

The future
The proliferation of SLMs is already underway, but the biggest changes are yet to come, and the industry needs to start planning for them.

“Once language models start training themselves and prioritizing information that they see, that’s the start of an autonomous AI thinking for itself and behaving according to its own needs,” Synopsys’ Neustadter said. “At that point we’re into the realm of a new kind of life form that is behaving in its own right. It has its own goal-setting capabilities, it’s able to do what it needs, and it has a kind of ruthless logic-based ethos. That’s several technology generations down the road. But in the interim, you really want to ensure that the data being used to train the thing is authentic. You don’t want it training on its own hallucinations, or whatever it’s being fed by somebody who is aggressively trying to train it in a particular direction. And you have very big concerns about the integrity of all that data, and the integrity of the paths to the training of the data back to the training engines.”

Today, software patches are frequently added onto other patches through a new over-the-air software release, such as the application updates on a smartphone. But as SLMs learn, they also change. So what is needed for a patch today may be different than what is needed in the future, and it may vary from one user to the next. SLMs effectively are speeding up the ability of a system of hardware and software to react to changes, but the flip side of that is organizations need to be able to react with the necessary level of specificity. And while regulations attempt to hone in on best practices for security, and standards seek to define them, this is likely to be a moving target for the foreseeable future.

Related Reading
Special Report: Small Vs. Large Language Models
SLMs targeted at specific workloads could change the relationship between edge devices and the cloud, creating new opportunities for chipmakers, EDA companies, and IP vendors.
Moving AI Workloads To The Edge
There are benefits and challenges of processing AI workloads on-device to enhance performance, reduce costs, and ensure data privacy.



Leave a Reply


(Note: This name will be displayed publicly)