Why security for these devices is different, and what to do about it.
Data centers handle huge amounts of AI/ML training and inference workloads for their individual customers. Such a vast number of workloads calls for efficient processing, and to handle these workloads we have seen many new solutions emerge in the market. One of these solutions is pluggable accelerator blades, often deployed in massively parallel arrays, that implement the latest state-of-the-art neural processing architectures. These blades handle valuable inference models, algorithms, and training data, and as such they require a high level of protection.
Machine learning assets face many different threats. These can include input attacks that maliciously attempt to try and influence AI systems into making alternate decisions or the theft of valuable assets such as inference models, algorithms, and training data. Attacks can target software, firmware, hardware, or all these assets. They can be invasive or non-invasive. They can enter across the network, through an edge node, or they can directly target endpoints. With more and more AI-powered devices in our everyday lives, any attack on them can threaten privacy, property, and personal safety.
Accelerator blades contain several key components. The heart of the system is provided by powerful accelerator chips, ranging from a handful to a large array of dedicated AI/ML processing units, each with their own pool of attached memory. They process as many tasks, on as much data as possible, with the shortest latency. Often, there is also a Gateway CPU, with its own dedicated Flash and DDR, which manages models and assets, and programs and controls the accelerators. Finally, there is a connection to the fabric, offered by high-speed network or PCI Express (PCIe) interfaces.
Accelerator blades need to follow security requirements that at a minimum, authenticate and protect the blade itself. However, there are several additional security requirements needed for AI/ML acceleration. As mentioned, protecting assets is, of course, a primary concern. This can include protecting assets from theft or replacement, and ensuring that data privacy regulations, such as HIPAA in the USA and GDPR in Europe, are adhered to. When accelerator blades are installed in public cloud servers, they are usually assigned to handle multiple users or tenants. In this case, the ability to switch between different users or tenants in a secure way is extremely important. Finally, security is also required to avoid system misuse, to ensure proper billing of services provided, and prevent unethical use of the system.
According to the seminal Microsoft white paper on the topic, there are seven properties of highly secure devices: hardware root of trust, defense in depth, a small, trusted computing base, dynamic compartments, password-less authentication, error reporting, and renewable security. Each of these merits a detailed examination on their own, but for the purpose of this blog, we’ll be discussing some security use cases for the hardware root of trust (RoT) in the context of securing accelerator blades.
One of the major security use cases is ensuring the availability of the accelerators themselves. An adversary can tamper with accelerator hardware to deny or disrupt usage or bypass its security measures. A root of trust can monitor system status and memory content and detect tampering activity independently of the applications and the CPUs or MBUs. The root of trust can also detect security attacks like fault injection.
So, how exactly would this work? The root of trust and the Gateway CPU monitors test and debug logic, hardware configuration, other hardware status in the SoC. The root of trust in the accelerator monitors AI accelerator operation. The root of trust periodically hashes known embedded SRAM state to detect tampering, and it can also periodically hash invariant flash data. Internal logic inside the root of trust detects physical attacks to the system, and a security protocol engine can monitor network traffic. The security applications running inside the secure boundary of the root of trust then determine how to act upon a detected anomaly.
Secondly, we identified earlier that the inference and training models are valuable assets that require protection. While in use or while loaded into the AI accelerators, these models can be intercepted, replaced, or altered. Upon completion of training, the resulting inference models need to be stored in encrypted form and decrypted dynamically upon use.
A root of trust implementation for this scenario would include the following steps. The signed encrypted inference model is stored in an off-chip flash module. The root of trust module reads the inference models from flash, decrypts them, and hashes the decrypted data. The root of trust verifies the signature and compares hashes. Only if the hashes match, will the model be loaded into the accelerator. Alternatively, if each accelerator has its own flash, the local root of trust can handle this. The root of trust would provide encryption, hashing, and digital signature verification capabilities.
Finally, when a complete AI ecosystem operates in inference mode, an adversary can target or tamper with the inference process or the inference results. Protecting the inference results can be done using a secure channel, like that used to protect input data integrity. In this case, the host module communicates with the edge devices over the network, and there will be mutual authentication using pre-provisioned keys and identities. After establishing a secure communication channel, the root of trust and the edge device manage the passing of inference results to the servers. Once loaded in the AI accelerator, there is an integrity check of the inference results before committing them.
Rambus has three decades of security expertise and a broad portfolio of hardware security IP solutions designed to support the security needs of high-performance data centers handling valuable data. We have root of trust solutions tailored to the needs of state-of-the-art accelerators for AI/ML training, and lightweight solutions appropriate to inference engines in IoT devices.
Additional Resources:
Leave a Reply