Securing Server Systems And Data At The Hardware Level

The impact of disaggregation on security.


Across the global internet, there’s a growing need to secure data, not only coursing over the network, but within the servers in data centers and deployed at the edge. Interconnect technologies such as Compute Express Link (CXL) will enable future servers to be disaggregated into composable resources that can be finely matched to the requirements of varied workloads and support virtualized compute and storage. Disaggregation, however, increases the attack surface that adversaries will attempt to exploit which only raises the urgency to safeguard data and systems.

Within the computing architecture, two types of interfaces exist. For processor to peripheral interconnects, or you could think of these as SoC-to-SoC interconnects, PCI Express (PCIe) is used. These include CPU-to-GPU, CPU-to-accelerator, and CPU-to-NIC (network interface card) interconnects. PCIe operates in a similar fashion to network communications in that data is transferred in fixed size packets. Processor-to-memory are another class of interconnects, and in this case, the interconnect needs to support reading or writing partial words or partial data blocks.

Let’s consider as an example an AI processing blade plugged into a server. The AI blade is connected to the server motherboard via a PCIe interface. To prevent eavesdropping of data, the PCIe interface needs to be encrypted. PCIe 5.0, 6.0 and CXL all have defined encryption schemes, where the speed of the link may be affected by the encryption. A high-performance encryption engine is required to protect these interfaces at line rates. Most important factors are latency and throughput, such that no stalling and zero or limited delay is introduced by the encryption algorithm.

The second type of interface in our server architecture is the memory interface. This can be from a processor (CPU, GPU, etc.) to main memory DDR, workload-specific memory such as GDDR, or a storage-class memory using Flash. These interfaces share the objectives of security, high performance and low fixed-latency as in the PCIe/CXL interconnects discussed above, however the size of the data transfers is variable.

For PCIe and CXL encryption, the algorithm is already defined by the standards or extensions to the standards: AES-GCM. It’s a proven and trusted algorithm deployed widely in the industry. MACsec, which provides security between networked devices, also uses AES-GCM. The advantage of AES-GCM is that it offers both integrity and confidentiality. Payload data encrypted, and the complete data stream is authenticated. To achieve this, a unique vector per data block is required and a TAG is transmitted.

Due to the above properties, AES-GCM is not suitable for memory encryption, AES-GCM adds data for the encrypted data block in the form of a TAG, and at the same time this TAG prevents reading and writing of partial blocks. Therefore, another algorithm needs to be considered. The AES-XTS algorithm does allow encryption and decryption at the block level. Like AES-GCM, it is high performance, highly scalable and can be used with a limited and fixed latency.

The AES-XTS algorithm does not include a unique IV, but a tweak value derived from the address. Data encrypted will be for a fixed address and cannot be moved and read from any other address. The one disadvantage of this scheme is that it provides confidentiality but not integrity, this can be partially mitigated with error detection codes, although it doesn’t give full data integrity. By switching keys regularly, you can successfully prevent attacks with old data.

For all the interfaces, the one generic requirement is for a fixed and low latency. For PCIe and CXL encryption, both integrity and confidentiality are required and can be achieved with AES-GCM. Current performance requirements range from 16 gigabits per second (Gb/s) with a one lane PCIe 4.0 link, to over a terabit per second (Tb/s) of bandwidth needed in a 16-lane PCIe 6.0 link.

In addition, CXL targets zero latency to support memory cache coherency, and within CXL, there’s a requirement to aggregate the headers because the unencrypted data is spread over the various flits. Rambus offers an AES-GCM engine with PCIe and CXL support that can scale from 16 Gb/s up to 1.6 Tb/s. With this scalable protocol engine IP, we offer support of both current and next-generation PCIe and CXL encryption. For CXL no latency is added because of the encryption.

For inline memory encryption in complex virtual environments, there is the need to use and switch between keys. This allows data for different memory locations to be encrypted and decrypted with a different key. Support for parallel accesses requires switching keys continuously. Further, the ability to read and write small and partial sectors at the same time, or concurrently, without affecting other accesses will be required in more advanced systems. For this use case, Rambus has an inline AES-XTS engine that can service all these parallel accesses at any DDR speeds available in the market. Interleaving accesses is natively supported, and memory accesses can be of any block size.

An end-to-end security architecture requires securing data in motion at the macro level of links between networked devices down to the micro level interface between computing components such as processors, accelerators and memory. In every case, anchoring this security in hardware with scalable protocol engines delivers protection of data at the line rates required by high performance network and computing systems. Whether network encryption or securing server and virtual compute architectures, Rambus has tailored protocol and cipher engine IP solutions to meet the needs of your next-generation design.

Additional Resources:
Website: Rambus CXL Memory Interconnect Initiative
Semiconductor Engineering Video: Improving Power & Performance Beyond Scaling

White Paper: CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture

Leave a Reply

(Note: This name will be displayed publicly)