Enhancing Compute Security Architecture For New-Age Applications

AI accelerators should have direct access to encrypted data and be able to decrypt it locally when needed for computations.

popularity

New-age AI-powered applications are becoming increasingly essential in our daily lives. Continuing to do so requires that these applications and services meet three primary challenges:

  • Achieving high performance for complex compute tasks.
  • Ensuring cost-effectiveness and seamless integration with existing infrastructure.
  • Maintaining robust security and privacy measures.

Historically, compute hardware scaling and application performance were driven by Dennard scaling and Moore’s law. However, the rapid increase in algorithm complexity and the diminishing returns of technology scaling have led to a focus on heterogeneous computing. This approach combines domain-specific compute elements to achieve the required performance, moving away from traditional on-chip parallel processing.

These modern architectures must consider memory organization, interconnection, and security alongside raw computing efficiency. Optimizations such as GPUDirect RDMA, Unified Memory, and Computational Storage, complemented by high-performance networking like PCI Express (PCIe) and Ethernet, are crucial for developing AI workload handling hardware. Minimizing the involvement of the host CPU as a bottleneck is a key theme.

GPUDirect RDMA technology establishes a direct communication path between GPUs and other devices, bypassing the CPU and system memory. This significantly reduces latency and improves bandwidth, resulting in faster data transfers and enhanced system performance. The Unified Memory architecture allows the same pool of memory to be available for different computing elements, minimizing data swaps between dedicated memory units. Finally, Computational Storage moves processing to the data system, reducing the need for data to be sent back and forth across a network. This saves time and energy by eliminating repetitive steps that require large volumes of data movement.

AI compute refers to the computational resources required for artificial intelligence systems to perform tasks such as processing data, training machine learning models, and making predictions. Given the rapidly rising performance demands, optimizations as discussed above are used to provide greater computing capabilities. Concurrently, modern applications increasingly work with private data in regulated environments and rely on shared infrastructure such as private and public cloud environments. Traditional security measures should be complemented with protection of data-in-use. Confidential computing addresses this by focusing on data-in-use protection in addition to data-at-rest and data-in-motion.

Further, the rapid evolution of AI is producing autonomous systems that can operate with minimal human oversight. This requires organizations to develop resilience by creating a culture of mutual trust and shared risk. Emerging technologies supporting human-centric security and privacy include AI Trust, Risk, Security Management (AI TRiSM), cybersecurity mesh architecture, digital immune system, disinformation security, federated machine learning, and homomorphic encryption.

Specifically, AI-centric security must:

  • Protect training data and data labels from unauthorized access.
  • Prevent data poisoning by ensuring models are not tampered with.
  • Comply with privacy and copyright laws.
  • Ensure secure data sharing on data exchanges.

Addressing these security concerns necessitates enhancing the confidential computing architecture for AI. This involves extending the Trusted Execution Environment (TEE) from CPU to accelerators, allowing AI accelerators to have direct access to encrypted data and decrypt it locally when needed for computations.

When analyzing current computation approaches for secure AI, several limitations become apparent:

  • CPU-centric Bottleneck: encrypted stored data must first go to the host CPU, where it is decrypted and then passed to the accelerator. This process requires excessive CPU resources, claims excessive network bandwidth, and causes higher latency and power consumption.
  • Privacy Among VMs on Shared Infrastructure: Accelerator resources allocated to individual VMs store data in unencrypted form. This means that other VMs sharing the accelerators during or after computations can access this data.

To address these limitations, future secure AI solutions should consider the following:

  • Direct Access for AI Accelerators: AI accelerators should have direct access to encrypted data and be able to decrypt it locally when needed for computations. They should also have the capability to encrypt data, results, and store it directly into the Unified Memory of the system.
  • Encrypted Isolation Between VMs: Data should remain in encrypted form in the local memory of AI accelerators and GPUs to ensure encrypted isolation between VMs.

The key takeaway is that data should remain encrypted and should only be decrypted as close as possible to the compute location. Additionally, it should be encrypted as close as possible to the origin of transmission. These computing principles require data-in-use protection, i.e., memory encryption should be adopted for these purposes.

Reference



Leave a Reply


(Note: This name will be displayed publicly)