Addressing Modern Bottlenecks With Smart Data Acceleration

Old bottlenecks are shifting and new ones are forming.


Over the past 30 years, the relentless progression of Moore’s Law has driven dramatic improvements in transistor counts and ultimately in processor performance. CPU performance was often the primary factor in determining overall system performance, leading us to believe that better CPUs led to better system performance. But, as processors have become more powerful, other subsystems have not kept pace. Coupled with the emergence of new usage models and applications, old bottlenecks have been shifting and new ones are forming, causing architects to rethink how to build systems in the future.

Virtualized infrastructures, Big Data analytics, and the growth of connected devices are causing data set sizes to grow rapidly. In fact, in some cases they far exceed the amount of main memory in servers. Forced to access data on disk or in remote nodes, where access times can be several orders of magnitude longer than local memory, CPU utilizations have been dropping, either from the long latency to access data, or from idling some of the CPU cores so that the active cores can have more memory capacity. Moving data over long distances also requires more power to be used, which increases operating costs. As the amount of data being processed continues to grow, data movement is fast becoming a critical bottleneck to improving performance and reducing power.

Architects have begun looking at ways to minimize the movement of data by turning the tables and examining ways of moving the processing closer to the data. Virtualized infrastructures allow virtual machines to be deployed near where the data resides, helping to minimize data movement. On the hardware front, architects are exploring Near Data Processing as a way to more closely couple compute engines to the data so that the data doesn’t need to be moved long distances to be processed. There is also a strong desire for flexible and reconfigurable processing capabilities near the data with increasing attention being placed on using FPGAs to accomplish this.

The SDA engine (Source: Rambus)

Recently, Rambus revealed details of our Smart Data Acceleration (SDA) Research Program in which we’re investigating new architectures for servers and data centers to achieve significant improvements in performance and power efficiency. As part of this program, we’ve created the SDA research platform, which enables experimentation and the evaluation of new concepts. At the heart of the SDA research platform are multiple SDA engines, which combine FPGAs, bitfiles, software, firmware, and large amounts of memory together in a flexible environment that we believe is well suited to Near Data Processing. The SDA research platform can take on different personalities, appearing to the rest of the system as a wide range of things, including an ultra-fast solid state storage device, a Key-Value store, and a large pool of shared memory, to name a few.


In some of our early work, we configured an SDA engine as an ultra-fast, solid-state storage device to see how it would compare against a high-end enterprise NVMe SSD. In our Latency Under Load tests, which measure how Read and Write latency change as a function of workload, the SDA engine achieves much higher peak IOPS rates at significantly lower latency than a state-of-the-art NVMe SSD. In 4KB Random Read and Write tests, the SDA engine delivered over 1M IOPS at a latency under load in the 10 μs to 60 μs range, with additional headroom to achieve higher IOPS rates. The NVMe SSD achieved 100-200us latencies at low IOPS levels, but past a certain workload point, these latencies increase dramatically.

In November, Rambus announced a partnership with Los Alamos National Laboratory (LANL) on the SDA Research Program. As part of this partnership, we have deployed our platform at LANL, where researchers are looking to improve the performance of in-memory databases, graph analytics and other Big Data applications. Initial tests indicate the performance of the SDA is well matched to HPC interconnects and can help reduce data movement by being able to process information close to memory.

We look forward to continuing our work on the Smart Data Acceleration Research Program, and our continuing collaborating with our partners and customers in this and other programs as we continue to develop cutting-edge technologies and solutions for future servers and data centers.