As multimodal AI fuses data from billions of devices, attackers can weaponize detailed digital twins of people or systems.
Key Takeaways:
As AI systems grow more powerful and pervasive, they are transforming data into increasingly precise maps of people’s lives. By fusing data from various sources, attackers can construct highly detailed digital twins of individuals and their environments, making it easier to attack specific targets. And once they infiltrate the home, through routers, PCs, phones, or wearables, they can continuously harvest location and behavioral data, eroding personal privacy and even physical safety.
At the intersection of AI and cybersecurity, the central battleground is the data itself. Trustworthy data fusion depends on authenticated, integrity‑checked inputs and AI outputs that are verifiable and attributable. Defending against AI‑enabled attacks on fused data hinges on strong roots of trust, robust encryption, secure key storage, and strict anonymization and minimization of any personally identifiable or location data that is shared.
To keep pace with evolving threats, chips and systems must themselves be secure by design. But they also need sufficient bandwidth, compute, isolation, and cryptographic infrastructure to perform continuous integrity checks and synchronization for worst‑case loads, and ensure that no data leaks out at any step along the way.
“AI data fusion combines data from multiple sources to give AI models a richer, more complete view, helping them generate deeper, more reliable insights,” noted David Maidment, senior director for market strategy at Arm. “But this data fusion also concentrates sensitive information and can expand the attack surfaces. The more data types, owners, and systems involved, the greater the risk that one weak point could expose or corrupt the broader intelligence pool.”
This threat has been on the horizon for some time. People often click pop-ups on their screen to share data, often just to get the icon out of the way of what they’re trying to see or do. But those seemingly insignificant assents, plus a slew of bad actors seeking ways to steal data, have enabled attackers to build detailed profiles. “There’s a lot of white market, gray market, and black-market data out there, and depending on your source and your willingness to take in data from untrusted sources, or the clandestine rogue data, you can fuse all that data into a profile or a digital twin of a person,” said Reed Hinkel, director, strategic programs, security, processor, wireless and NVM at Synopsys.
Once a location such as a home can be monitored, Trojans can locate a particular person. “They either deposit Trojans in the home routers, or on your computer, or gain access through phishing attacks even at home,” Hinkel said. “Even though you may be trained at work, there are still members of your household who are giving away lots of data. They just don’t realize it. Sometimes it seems innocuous. Sometimes it’s literally directed to a financial outcome that you won’t like. The big issue is once they’re in the home, they’re also in your devices. If they can get into your phone, that’s even worse, because then it’s all the devices connected to and controlled by your phone.”
The idea goes well beyond just creating a digital twin or digital model of a person. Hackers can create a digital model of all your surroundings. The personal wearable IoT is especially problematic. “When I was at TI, hackers were leveraging watch data — personal fitness data — through the phone, and they were tracking soldiers as they were marching around perimeters of buildings, or locating people before the data could be anonymized,” Hinkel said. “The problem is that once you have a GPS and GPS-attached data, and you can connect that to a person, then they are 100% guaranteed to lose any semblance of privacy or personal assurance that they can’t be tracked.”
Using those digital footprints and digital data streams, it’s relatively easy to leverage AI to build a detailed model of a person. The key to addressing this is not to control the individual data streams, but to block the AI.
“It’s how you lay the foundation to ensure that it’s relevant and not an attack vector, because AI itself is actually the attack vector, not the individual data streams,” said Mohit Arora, senior director, architecture at Synaptics. “If you can influence data fusion at that level, you effectively influence the output, and you will no longer need the raw data.”
AI has fundamentally changed data fusion. “Data fusion before AI was straightforward,” Arora said. “All you cared about was confidentiality, about what you were doing and how you were doing it. Data fusion with AI is creating the intelligence. That’s the biggest part. Essentially, it’s binding all these individual data streams together to create local intelligence. Now, you have to protect this intelligence in a very different way as it enters the overall AI data pipeline. This wasn’t the conventional problem in typical data fusion because you were already aware of the individual data sets, and federated AI is just a mathematical operation. But here, since this could affect how your model behaves, you no longer care whether someone will steal your data. They can, but if they can influence the output, they don’t need to steal the data. This changes the whole landscape, because now you’re dealing with all the runtime stuff, and as with the AI pipeline, you’re doing all the inference back and forth, going to the memory channel and reading it, and compute, and so on. It’s quite a bit of runtime data manipulation. And that actually has a much larger attack surface than the conventional compute side, where you just go to secure boot and call it a day.”
Data fusion in automotive
This becomes particularly worrisome in safety-critical applications such as automotive. With zonal architectures consolidating multiple sensor and software workloads onto shared compute, AI data fusion is rapidly becoming standard.
“It is a key component of next-generation vehicle autonomy, but it also increases the car’s attack surface,” said Rob Fisher, senior director of product management at Imagination Technologies. “To manage this, one of the key architectural trends that is emerging as workloads centralize is the isolation of safety-critical and non-safety workloads to prevent cross-domain access and deliver freedom from interference between workloads. This is a de-risking strategy, a means to make sure that any fault or vulnerability in one area is contained, and that the vehicle can continue to operate safely.”
Imagination uses hardware virtualization technology for such situations. “It assigns each VM a dedicated hardware interface and OSID-tagged memory transactions, enforced via the system MMU to prevent cross-domain access,” Fisher noted. “This is backed by built-in QoS, prioritization and deadline-based pre-emption, including protection against denial-of-service workloads, to help ensure safety-critical applications maintain deterministic performance and mitigate the impact of faulty or malicious workloads.”
The big issue with AI data fusion is that it amplifies cybersecurity risk by combining an inference model from multiple, semi-independent data sources (e.g., environmental sensors) that were not necessarily designed to work together or to trust each other. “It is difficult to know a priori how the resulting model is going to respond if one of those data sources is compromised (e.g., starts injecting malicious data into a system far different from the nominal data ingested during model training),” explained Scott Best, senior technical director, Security IP at Rambus. “In a sense, you’re attempting to protect against previously unknown, black-box interactions between many disparate systems.”
The first time Sylvain Guilley, fellow and CTO at Secure-IC, a Cadence company, encountered AI data fusion was in the automotive industry, working for a chip company in Israel that was doing image analysis for multiple cameras. “My activity at the time was on the security of those kinds of chips and understanding what they were doing. It was an automotive project, and they were asking for a hardware security module (HSM). In automotive, it’s all about safety, so you want to ensure the chips cannot be initialized or provisioned by anyone. This is the first line of defense against attackers, so you want to own your chip. That’s the HSM, which means you cannot use the chip or set the firmware and configuration unless you are the legal owner. We delivered the HSM, and then they said, ‘We have a problem because now the traffic will be one gigabit per second.’ We were a little bit surprised because the HSM was configured to perform secure boots, take finite-size images, verify the signature, and now they were telling us, ‘Now we will have to manage some streaming data.’ Then they said they needed another mailbox because they were multiplying the number of cameras. It was incoming IPsec (Internet Protocol Security) traffic, and we were asked, because it was an ASIL B project, how to secure streaming data at speed. One of the initial concerns was losing a packet or a frame. Because I was managing the chip level, I couldn’t see the full application, but I could imagine that if you are a little bit desynchronized, and your car is lagging with where it is compared to what it thinks it is, that is scary. So, we worked to avoid any latency and any desynchronization, such as dropping packets, etc.”
What AI data fusion means to a chip architect
AI data fusion requires enforcing trust in real time and at scale, and that needs to be architected into the design. “The risk is not just data exposure, but operational integrity,” said Rambus’ Best. “If an attacker can compromise one input stream, even with subtle bias, they potentially can influence the entire model output. You need a chain of trust that starts in silicon and extends outward. Root of trust in hardware, system-level attestation, and authenticated components in the supply chain must all align. If any stage cannot prove its integrity, the whole system is suspect. The chip must enforce trust and never assume it.”
That starts with a hardware root of trust, isolated execution, and authenticated data movement between blocks. “If every data source is cryptographically verified (including hardware and software attestation), and authenticated before being allowed to source an input into an inference operation, you shrink the attack surface an attacker can exploit,” Best said. “Beyond the strong hardware root of trust, protected key storage, and protections for secure supply-chain concepts, an indelible, tamper-resistant cryptographic hardware is also needed that can extend its trusted operations to transform untrusted software into something worthy of consideration by a high-level inference engine. Without a structured, specific path to trusted hardware, everything executing in software is built on an insecure foundation.”
Databases and keys need to be encrypted, and any technologies that require facial or biometric activation need to be checked for potential compromise. “We’re going to see multiple stages, like multi-factor authentication for humans,” said Hinkel. “You’re going to start seeing it more for devices, and you’re going to have perhaps multiple ways to establish a root of trust. At the end of the day, there are technologies available for secure storage, including PUFs (physically unclonable functions), key wrapping, and storing keys in an encrypted, scrambled state. That adds another layer of protection that wasn’t available until we started enabling it. On top of that is the cryptography, making sure that your device is post-quantum ready, because quantum days are likely here much sooner than we think. The device industry is still at various levels of coming to grips with it, but they acknowledge that. Still, there are major problems with adding it late.”
Arm’s Maidment said that chip architects should consider what role hardware should play in establishing trust. That includes how to process sensitive, multi-source data without exposing it to unauthorized software, infrastructure operators, or other workloads sharing the same system. “Hardware-backed security provides an important foundation to this problem space. Roots of trust, memory protection, secure boot, attestation, and confidential computing can all help verify the platform, isolate sensitive workloads, and protect/attest data while it is being processed.”
Arm’s Confidential Compute Architecture (CCA), for example, enables workloads to run within a hardware-isolated area, restricting visibility of such workloads to privileged software and other tenants. Attestation, meanwhile, allows data owners to verify that their data is being processed by the intended software and to ensure the integrity of the data sets themselves.
No single security measure is sufficient by itself. AI data fusion requires a combination of approaches. “Hardware isolation helps protect data while it is being processed, but organizations must also ensure the integrity and provenance of the data being combined, apply appropriate access controls, secure software supply chains, and protect communications between systems and devices,” Maidment said.
Regardless of data type or weakness, the same general rules apply, including:
“It’s not a question of chips,” Guilley said. “You’re talking about AI, and if you compare layer 1 to 3 to layer 7, when you understand that the data and everything will need to go to the processing unit, at the end of the day we go through some transistors. We must ensure that we have enough bandwidth, compute speed, and capability to perform all those checks, including signature verification, as well as the checks necessary to ensure the timestamps are well aligned. You must also make sure you have proper silos. You might have different streams that you don’t want to combine. For some of them, you want to perform correlation, but for other streams it would be a safety issue to mix or swap them. You know you will get data at a certain rate, and the worst case is 1 gigabit per second, with small packets and key rolling for every packet. You must maintain the ability to perform decapsulation and integrity verification without stalling or dropping packets. This means it’s a system-level, data-flow-oriented system. If the data is not flowing through this node, the beauty of the internet is that it will find a way. But at the end of the day, the quality of service might not be as good as you expect, whereas in automotive, when you design a chip, you must design it for the worst case, and security is on the critical path. Cybersecurity is all about risk analysis, and if you underestimate the risks, your analysis can be biased.”
The challenge here is understanding the combinatorial risk. “The data classification problem is a big risk, and integrity attacks have become much more dangerous than simple data theft, because in a typical architecture you can think you can keep your data confidential,” Synaptics’ Aurora noted. “I think, ‘I’m going to encrypt it,’ but if you can influence or glitch your fused data, then it’s not good. And how are you going to check the integrity of the fused data, since that’s all dealing with models and activation? You’ve got to make sure the data that’s being fused is integrity-protected. So, integrity becomes a bigger problem to deal with, and I don’t think the industry is paying much attention to it because it’s a harder problem on the memory side. The model becomes the target, because if the attacker can manipulate the fused input, they can influence the inference, in particular. Then the model is useless and more dangerous.”
Conclusion
Looking ahead, as agentic AI workloads increasingly combine data from multiple sources, building confidence in both the processing environment and the data itself will become essential, Arm’s Maidment noted. “Security features built into the chip provide the critical foundation for that trust, helping organizations process sensitive data while reducing exposure to infrastructure-level threats.”
But AI data fusion also raises broader questions about where AI and cybersecurity intersect. “How do we make AI and cybersecurity intersect, rather than superimposing them so they don’t touch? Where and what are the touch points? Even taking one step back, what is AI? AI is about dealing with data,” Guilley said. “You take data and turn it into a model, generate something, and infer. You might also have some generated data that reinforces, but it’s about what you work on, which is data, obviously, and multimodal. Then, what is cybersecurity? Cybersecurity is the science of protecting data. Cybersecurity is not protecting your car. It is not protecting you as a driver or anyone in the car. Cybersecurity is about protecting data. So, the intersection of cybersecurity and AI is data. And what we need to have in place is very simple — check data in and authenticate data out.”
Related Articles
Agentic AI Is Changing Data Center Architectures
Standalone GPUs are being replaced by heterogeneous SoCs and chiplets that combine CPUs, GPUs, and NPUs to eliminate memory bottlenecks, reduce latency, and boost efficiency.
Executive Outlook: Agentic AI’s Impact On Chip Design
Can engineers trust AI to get everything right in semiconductor design and verification?
Leave a Reply