The Journey To Exascale Computing And Beyond

Optimizing power consumption in the world’s most powerful supercomputers.


High performance computing witnessed one of its most ambitious leaps forward with the development of the US supercomputer “Frontier.” As Scott Atchley from Oak Ridge National Laboratory discussed at Supercomputing 23 (SC23) in Denver last month, the Frontier had the ambitious goal of achieving performance levels 1000 times higher than the petascale systems that preceded it, while also staying within a power budget of 20MW.

Challenges assessed by the Defense Advanced Research Projects Agency (DARPA) in planning for Frontier included improving power efficiency, supporting exaflop processing levels with enough memory and storage performance, supporting high levels of concurrency, and improving machine resiliency. Frontier successfully addressed these challenges, showcasing advancements in power efficiency, high memory bandwidth using HBM DRAMs, high concurrency through the use of GPUs, and resiliency above initial expectations. However, the cost to achieve exascale performance was increased for Frontier, coming in at 4x-6x times higher than the previous generation of supercomputer. Components like memory, particularly High Bandwidth Memory (HBM), proved more expensive, and storage costs also didn’t scale well enough compared the previous generation to keep costs targets the same as the prior generation.

Fully deployed in 2022 at the Oak Ridge Leadership Computing Facility (OLCF) in Tennessee, Frontier (or OLCF-5) is the world’s first and still fastest exascale supercomputer. Frontier is composed of 9,472 AMD 3rd Generation EPYC CPUs and 37,888 AMD Radeon MI250X GPUs, with the GPUs enabling exaflop performance via 500M threads that provide high levels of concurrency. The system is deployed in 74 19-inch racks with 64 blades per rack. Frontier uses liquid-cooling to achieve a compute density five times that of air-cooled architectures. Frontier was, until last month (November 2023), the top-rated supercomputer on the Green500 thanks to its outstanding power efficiency, and yet it consumes a whopping 21 megawatts (MW) of power.

Power and cooling are interesting inflection points for this story, as the successful launch of Frontier has spawned early conversations on what’s needed to achieve the next 1000x increase in performance without increasing the power budget. Stephen Pawlowski, Intel Senior Fellow, discussed the challenge of achieving 1000x energy efficiency within the next two decades in his keynote talk at the SC23 Workshop on Memory Technologies, Systems, and Applications. With exascale supercomputers now a reality, the focus has shifted to optimizing energy consumption to ensure the cost of powering the system over its lifetime does not exceed the cost of the system itself.

Pawlowski highlighted the significant energy and time consumed by data movement, especially between processors and memory. To address this, he proposed a chiplet-based memory that stacks high-performance memory die on top of a System-on-Chip (SoC) to minimize the distance data travels, and to improve the connections between the memory and SoC. This approach promises a 5-6X improvement in power efficiency for data movement and a 10X boost in bandwidth. The potential benefits make it a compelling path forward for the industry.

However, with stacking of memory on processors, challenges emerge, such as the need to standardize memory footprints that define interconnect locations, manage thermals, and address issues like Error-Correcting Codes (ECC) and post-package repair.

Additional gains in power efficiency can be achieved by tailoring architectures to the specific problems being solved to eliminate unnecessary overheads, as is done in many AI systems today. And Pawlowski noted that biological systems offer 5 orders of magnitude better power efficiency than processors achieve today. Future systems can apply concepts from biologically inspired computing to get further gains in power efficiency.

New architectures and technologies will be complemented by continued development of existing memory and interconnect technologies, including high-performance HBM and GDDR memories as well as critical PCI Express (PCIe) and Compute Express Link interconnect technologies. With over 30 years of high-speed memory and interconnect industry leadership, Rambus is exploring novel memory architectures as well as advancing the roadmap of performance for HBM3, GDDR6, PCIe 6 and CXL 3 digital controller IP.

The Rambus HBM3 Memory Controller delivers a data rate of 9.6 Gigabits per second (Gb/s), supporting the continued evolution of HBM memory in the industry and meeting the demanding memory requirements of AI training in the heart of the data center. Rambus PCIe 6.0 and CXL 3.0 IP solutions provide high-performance interconnect solutions to address the speed and latency requirements of AI/ML, data center and edge applications.

Additional Reading:

Leave a Reply

(Note: This name will be displayed publicly)