Chiplet-based exascale computers; new ‘Frontier’ in exascale systems; Exascale Day.
Chiplet-based exascale computers
At the recent IEEE International Electron Devices Meeting (IEDM), CEA-Leti presented a paper on a 3D chiplet technology that enables exascale-level computing systems.
The United States and other nations are working on exascale supercomputers. Today’s supercomputers are measured in floating point operations per second. The world’s fastest supercomputers can reach hundreds of petaflops. And by next year, the world’s fastest systems will reach the exaflop or exascale level.
Exascale computers are capable of calculating at least 10¹⁸, or one-billion-billion, floating-point operations per second. This is roughly two or more times the speeds of today’s supercomputers. Exascale systems will enable new breakthroughs in medicine, physics, scientific discovery, weather predictions and others.
Based on classical computing methods, exascale supercomputers are different than quantum computing. In classical computing, the information is stored in bits, which can be either a “0” or “1”. In quantum computing, information is stored in quantum bits, or qubits, which can exist as a “0” or “1” or a combination of both.
The superposition state enables a quantum computer to perform millions of calculations at once, enabling it to outperform a traditional system. But quantum computing is still in its infancy and has a long way to go. So for now, traditional supercomputers dominate the high-performance computing (HPC) landscape.
To enable exascale systems, meanwhile, CEA-List and CEA-Leti, research institutes at CEA, presented a paper on a 3D packaging technology called ExaNoDe. This architecture paves the way towards exascale computing.
ExaNoDe is actually a multi-chip module (MCM) architecture, which is integrated in a system. Still in R&D, the MCM combines two FPGAs and a separate chiplet-like architecture on the same substrate. The FPGAs perform the pre-processing functions and serve as the interface to the shared double-data-rate (DDR) memory architecture.
On a separate part of the substrate, there are six or so chiplets or dies stacked on an active interposer. The chiplets themselves are based on a 28nm FD-SOI technology. Each chiplet or die integrates 16 cores. Each one is stacked and bonded on an active interposer using 20μm pitch microbumps. The active interposer is based on a 65nm CMOS technology.
“Each chiplet is built around a network on chip, ensuring a communication channel between four functional clusters. Its topology is a 2D mesh with four routers per chiplet, connected to clusters and allowing communication in each geographical direction,” said Denis Dutoit, a CEA-List scientist, in a paper at IEDM. “Each chiplet embeds two dedicated accelerators, a convolutional processor and a traffic generator with their own memory hierarchy. Chiplets share data with neighbors through an active interposer using short-reach fine-pitch parallel links. The interposer plays a central communication role with a flexible interconnect and an FPGA interface.”
ExaNoDe embeds 50,000 3D interconnects, with a 20μm TSV pitch and 20μm microbumps. “Simulations were performed and an embedded traffic generator was used to generate heavy traffic between chiplets: up to 1.2Gb/s transfer rate per 3D interconnect was achieved. Combined with the 20μm pitch, overall bandwidth density reaches 375GB/s/mm²,” Dutoit said. “Measurements, together with architecture extrapolation, show that combined integration of chiplets on an active interposer with bare dice in a MCM covers ultra-wide range of workloads for next generation scalable and high-performance compute nodes. This technology could mix energy efficient accelerators with generic CPUs on active interposers at one level, and then the 3D IC with bare dice for a second level within a multi-chip module.
“These R&D successes open a path towards heterogeneous processors that will enable exascale-level supercomputers,” said Dutoit. “We demonstrated that co-optimization of advanced architectures with 3D integration technologies achieves the level of computing performance and bandwidth required for HPC.”
New ‘Frontier’ in exascale systems
The U.S. Department of Energy is preparing to install a new and fast exascale supercomputer at the Oak Ridge National Laboratory.
Slated for installation in 2021, the so-called Frontier exascale supercomputer is expected to perform tasks at 1.5 exaflops or one quintillion floating-point operations per second. Performing calculations up to 50 times faster than today’s top supercomputers, Frontier is targeted for several applications, such as scientific discovery, energy assurance, economic competitiveness and national security. It will also provide new capabilities for deep learning, machine learning and data analytics for applications ranging from manufacturing to human health.
Frontier has been in the works for some time. In 2019, the U.S. Department of Energy announced a contract with Cray to build the Frontier supercomputer at Oak Ridge. Cray was recently acquired by Hewlett Packard Enterprise (HPE).
The system is based on Cray’s Shasta architecture and Slingshot interconnect. It will feature AMD’s EPYC CPU and the Radeon Instinct GPU technology.
Frontier will reside in the former data center of the Oak Ridge Leadership Computing Facility’s Cray XK7 Titan supercomputer. Once one of the world’s most powerful supercomputers, Titan last year was decommissioned after seven years of service.
To accommodate Frontier, Oak Ridge is revamping the 20,000-square-foot room. In the room, the floor consists of over 4,500 new tiles weighing 48 pounds each or 110 tons in total.
The room will also consist of the systems that will cool Frontier. The new supercomputer’s cooling water towers will have a system volume of 130,000 gallons. The towers will also consist of 350-horsepower pumps that can each move over 5,000 gallons per minute of the high-temperature water through the Frontier system.
“Titan at peak probably consumed about 10 megawatts of power. At peak, Frontier will consume about 30 megawatts. If you use more power, you have to get rid of additional heat, so we are adding the equivalent of 40 megawatts of cooling capacity, about 11,000 tons, for Frontier—much bigger pipes to distribute cool water to the computer,” said Justin Whitt, program director for the OLCF, a DOE Office of Science User Facility located at Oak Ridge. “Additionally, supercomputer systems have become denser and heavier with each new generation, and Frontier is no exception to that, so we upgraded the raised floor so it could support that weight.”
Exascale Day
Not long ago, the computer industry celebrated Exascale Day. It celebrated the advent of exescale computers.
In a blog, Brandon Draeger, who leads the compute product marketing teams for HPE, presented more information about exascale computing.
Denis Dutoit of LETI does good work. Back in 2011 he built the first Chip on Chip ( DRAM on CPU ) processor module using microbumps ( TSVs were not mature back then ). More recent stacks like Intel COFEFOS etc. is much like Denis’ original work. Did n’t attend IEDM. Has LETI actually built the new Module described in your article ? Any photos of their new Module w/ uBump joints at 20 um pitch and / or performance ( computational throughput, bandwidth, power consumption per Op Cycle i,e. instruction & data fetch, processing ) ??