Safeguarding data during computation using hardware-protected enclaves that isolate code and data from untrusted software.
Artificial Intelligence (AI), data analytics, and high-performance computing (HPC) are transforming industries such as healthcare, finance, and manufacturing. These workloads rely on distributed systems managing massive datasets with high reliability. As computational demand grows, so does the need for end-to-end data protection.
Traditional security addresses Data at Rest (DAR) and Data in Motion (DIM) through encryption and secure protocols. Yet Data in Use (DIU), data actively processed in memory, remains the weakest link.

Fig. 1: The three stages of data protection: DAR, DIM, and DIU.
Forecasts by Forbes and Gartner project sustained double-digit growth in AI and HPC investment through 2027 [1], emphasizing AI security and trusted execution as key enablers. As workloads expand across hybrid and multi-tenant environments, protection must extend beyond storage and network layers.
Confidential Computing addresses this by safeguarding data during computation. It employs Trusted Execution Environments (TEEs), hardware-protected enclaves that isolate code and data from untrusted software, including operating systems and hypervisors. Together with memory encryption, they ensure sensitive data remains secure throughout its lifecycle.
However, flawed TEE implementations can introduce vulnerabilities. Attacks such as TEE.fail [2] and Battering RAM [3] show that side-channel or bus-level exploits can extract secrets during execution. This highlights the need for TEEs to minimize off-die data exposure and define verifiable hardware boundaries.
This article examines how Confidential Computing principles are applied in heterogeneous architectures to secure AI workloads, covering architectural trends, DIU protection mechanisms, and implications for chip and system designers.
Modern compute systems are increasingly heterogeneous. While Central Processing Units (CPUs) remain the general-purpose backbone, AI and HPC workloads rely on specialized accelerators such as Graphics Processing Units (GPUs), Neural Processing Units (NPUs), Data Processing Units (DPUs), and Domain-Specific Accelerators (DSAs) to deliver massive parallelism and energy efficiency. This integration introduces complex data flows across CPU cores, high-bandwidth device memory, and peer-to-peer fabrics [4].

Fig. 2: Example of heterogeneous computing.
Performance-driven designs minimize CPU involvement in data movement, using methodologies like:
While these optimizations boost throughput, they expand the attack surface. Unified memory migrations must preserve confidentiality; GPUDirect transfers require authentication and encryption; and storage controllers performing compute become part of the trust perimeter. In short, the performance-optimized data path must also be the secure data path.
Conventional encryption protects DAR and DIM but leaves data exposed during computation, and AI workloads often keep training data and model parameters in plaintext in memory.
Confidential Computing addresses this by using TEEs since they provide [8], [9]:
Early TEEs (Intel SGX, AMD SEV-SNP, Arm CCA [10], [11]) focused on CPUs, but as AI workloads moved to GPUs and NPUs, equivalent protections became essential.
AI workloads in regulated industries must meet compliance frameworks such as GDPR, HIPAA, and PCI-DSS, which require protecting data during processing, not just storage or transport [12].
Modern Confidential Computing therefore extends TEEs from CPUs to XPUs/accelerators. A CPU enclave establishes trust with an XPU/accelerator through remote attestation and encrypted command channels. Data is decrypted only inside XPU/accelerator memory, processed on-die, and immediately re-encrypted when leaving the XPU/accelerator.
Heterogeneous Isolated Execution (HIX) [13] and Confidential GPU Computing for Arm CCA [14] extended enclave principles to XPUs/accelerators via modified interconnects and drivers [13]. Commercial designs now incorporate these ideas. NVIDIA’s GB10x and GB20x Hopper architecture, for instance, supports secure boot, attestation, device certificates, and AES-GCM–protected CPU–GPU links [15].
The threat model mirrors that of CPU TEEs: adversaries may control drivers, OSs, or hypervisors and observe physical buses like PCIe. Attackers target DMA buffers and residual data. Although probing HBM directly is impractical, interconnects and DMA paths must be assumed observable and protected [15].
Securing AI workloads involves more than encryption. It requires balancing throughput, energy efficiency, and compliance while mitigating real-world attack vectors. The key considerations include:
Security must be a fundamental architectural property, not an add-on. The following principles guide data-in-use protection for heterogeneous AI systems:
Decrypt where you compute: Decrypt only within the XPU/accelerator’s trusted boundary and re-encrypt immediately afterward. This ensures the secure path equals the performance path.
Encrypt device memory per execution context: Treat XPU/accelerator memory as confidential. Per-tenant or per-context keys render residual data meaningless once the session ends, avoiding performance penalties of global scrubbing.
Implement context-aware key management: Associate encryption keys with specific execution contexts rather than static memory regions, maintaining isolation aligned with VM or process identity.
Modularize the security engine: Inline encryption units should be modular and decoupled from memory controllers, enabling independent scaling of cryptographic throughput and algorithms as threat models evolve.
Secure interconnects and DMA paths: High-speed fabrics like PCIe and Compute Express Link (CXL) must fall within the TEE boundary. Use link-level integrity and encryption—e.g., PCIe Integrity and Data Encryption (IDE)—and bind DMA mappings to attested sessions.
Mitigate side channels systematically: Employ constant-time execution, randomized scheduling, and isolation of shared telemetry. Assume observability and mitigate leakage at design time.
Strengthen TEE implementation boundaries: Lessons from TEE.fail and Battering RAM underscore the need for strict physical isolation of sensitive state. Critical TEE data and keys should remain in on-die SRAM, inaccessible to external buses. Combined with inline encryption, this mirrors a hardware Root of Trust model.
The central principle is simple yet decisive: decrypt as close to the compute engine as possible. Applied consistently, this aligns high-performance AI design methodologies, multi-level memory hierarchies, and interconnects with the confidentiality demands of modern data ecosystems.
Per-context memory encryption prevents residual data leakage; attestation transforms trust into verifiable proof; link-level encryption unifies performance and protection; and modular cryptographic engines enable future evolution.
As threats evolve—through new side channels, fabrics, and chip packaging—the response must be proactive architectural implementation, not reactive patching. Compute elements must prove integrity; memory must enforce isolation by default; and interconnects must assume observation.
When implemented according to these principles, Confidential Computing turns trust from a software convention into a hardware guarantee, allowing AI systems to achieve both high performance and verifiable security—demonstrating that throughput and trust can advance together rather than compete.
Leave a Reply