Is RISC-V Ready For Supercomputing?

The industry seems to think it is a real goal for the open instruction set architecture.

popularity

RISC-V processors, which until several years ago were considered auxiliary processors for specific functions, appear to be garnering support for an entirely different type of role — high-performance computing.

This is still at the discussion stage. Questions remain about the software ecosystem, or whether the chips, boards, and systems are reliable enough. And there are both business and technical problems, with the business issues being the most difficult. But it speaks to the momentum of the RISC-V architecture, which has seen a massive surge in adoption and experimentation due to its open ISA. That, in turn, gives the industry the freedom to innovate with it.

“It’s not the ISA (instruction-set architecture) in itself that’s so attractive,” said Rupert Baines, chief marketing officer at Codasip. “It’s what you build around it. So the working groups around security and publishing best practices and guidelines and references architectures are important. Open Titan, which is the open-source root of trust, is important because it’s a reference architecture that’s been done well. People can look at it and not reinvent the wheel, and maybe make a mistake.”

The big question now is just how far this architecture can be pushed in new directions. The chip industry has firmly entered the era of domain-specific computing, where processors can be highly tailored to specific tasks, and then outperform other fixed architectures for those tasks. But that also can make the porting of software more problematic if that software needs to be optimized for those custom cores.

For RISC-V, high-performance computing and supercomputing could represent a giant leap forward. A supercomputer is defined as a computer with a high level of performance compared to a general-purpose computer. These typically are floating-point machines with vector extensions, and the current leader, Frontier, is able to operate at around 1.1 exaFLOPs on the LINPACK benchmark. It has 8,730,112 processing cores based on the x86 ISA.

Still, the need for such behemoths has been evolving as other alternatives become widely available. An HPC used to be a custom-designed, all-purpose computer. Today, a very similar capability is available to everyone deploying clusters of high-speed servers, hosted either on-premise or in the cloud.

Whether RISC-V has a possible play here needs to be examined from a number of different angles. Who might need a supercomputer based on the RISC-V architecture, and who is willing to pay for it? Do the RISC-V ISA and extensions have all of the necessary capabilities for a supercomputer to be created? Has anyone created a core with suitable performance? Is all of the necessary software in place?

Following in Arm’s footsteps
Until recently, most supercomputers were based on Intel’s x86 architecture. Arm wanted to improve its penetration into high-performance computing and had the basic hardware support ready around 2016.

“When the first Arm supercomputer programs were being initiated, Arm wasn’t ready in the sense that all of the ecosystem was there, or that all of the problems had been solved,” says Rob Aitken, a Synopsys fellow. “It was more that somebody, somewhere was saying it’s close enough that I’m willing to take the risk. I’m willing to try it. I would say that RISC-V is either at, or very close to that point where somebody’s going to be willing to take a gamble on it and build something for a supercomputer.”

On June 22, 2020, Japan’s Fugaku supercomputer, powered by Fujitsu’s 48-core A64FX SoC, became the first Arm-powered supercomputer to take the top spot, at least momentarily, as the world’s fastest computer. A list of the most powerful high-performance computers can be found on the TOP500 list.

Performance is not the only consideration. “To be a successful HPC-ready processor requires delivery of performance, efficiency, and security at the same time as the support of an ecosystem of applications and of important leading-edge server standards,” says David Lecomber, senior director of HPC and tools for Arm’s Infrastructure Line of Business. “When it comes to design flexibility to create this, it’s important to deliver this flexibility where it best serves the developer. For example, a stable and consistent ISA is critical to commercial HPC developers, but flexibility to design in your own choice of memory subsystem (DDR5, HBM, CXL-attached) or accelerators (on-die or PCIe/CXL-attached) is powerful.”

What does fastest mean?
Performance metrics have been changing for the industry over the past few years. While absolute performance still reigns supreme, systems are often constrained by power, and that results in architectures that are optimized for specific tasks. But it also raises questions about how to measure performance, because no machine is likely to be the fastest on every task.

For years, the industry has used the LINPACK benchmark, but that is becoming increasingly controversial, providing no simple answer. One approach is to expand the benchmark, and this is being referred to as the HPC Challenge benchmark suite. One of the originators, Jack Dongarra, professor of computer science at the University of Tennessee (extensive bio is here), has been commissioned by the U.S. Government to work on this. But solving one problem creates another. The benchmark no longer produces a single number, which makes comparisons difficult.

Performance is difficult to measure for other reasons. Throughput and latency are often set against each other, and that’s not confined just to supercomputers. So one system may be able to produce one answer faster, but the other can produce a series of answers in less time, even if you have to wait for the first answer a little longer.

With applications now able to scale to more than 1 million cores on the commercial cloud, building an HPC for size is no longer the issue. It is time to results, and that is especially true for tasks that require as close to real-time results as possible. That means HPC probably will continue to be used for tasks like financial trading, where beating your opponent by even the slightest margin means you win and they lose — sometimes involving vast sums of money.

Balancing the system
Building any computer requires that many factors are properly balanced. “When you look at HPC, it often focuses on things like clock speeds, number of cores, scalability of the cores with the associated interconnect,” says Frank Schirrmeister, vice president of solutions and business development at Arteris IP. “But memory bandwidth, power efficiency, the ability to add your own vector instructions are equally important.”

It has to be looked at as a dataflow problem. “The data starts someplace, it has to be loaded out of memory into a processor, worked on by a processor or accelerator, and then be put back into memory,” says Synopsys’ Aitken. “It is that whole pathway where the bottlenecks exist. The ‘uncore’ is a key part of that, the memory system is a key part of that. You have to identify where the bottlenecks are in the system architecture while solving a specific task. This is independent of the CPU. In the enterprise space, the world is investigating this for RISC-V and working on it, but it’s not necessarily there yet.”

In many cases, it is in the uncore that the real innovation happens. “When looking at a cluster, you have many processors being interconnected,” says Arteris’ Schirrmeister. “This where you have to consider the scalability of the cores, and that means co-optimizing the cores and the interconnect. RISC-V gives you the freedom to innovate at that level, probably a bit better than some of the standard licenses. But it’s a lot of work, and it’s certainly not trivial. And it’s part of the secret sauce on how that cluster will work when integrated.”

Many tasks these days, like AI/ML, are powered by custom accelerators, and the general-purpose cores may be doing little more than scheduling and coordination tasks. “You are going to have to do domain-specific acceleration or use various accelerators to handle the growing computation that’s needed in these data centers,” says Travis Lanier, vice president at Ventana. “You are not going to be able to do that with generic CPUs.”

Others agree. “Core performance is table stakes,” says Arm’s Lecomber. “An HPC-ready CPU needs good vector performance and memory bandwidth per core. Finally, but equally critical, HPC-ready CPUs need to deliver efficiency. Developers need programming efficiency to extract the most performance from available cores and accelerators. Rack-level and data center power efficiency is becoming a limiting design and operational factor.”

How silicon performs is not just about the ISA or even the RTL. “If you look any IP, their success is often about the connections to the physical tools, the physical awareness of things,” says Schirrmeister. “Even for our part, the interconnect, which is a portion of the system, requires co-optimization of the IP with the implementation flow to get the right performance and power. The same thing is true for RISC-V to make it HPC-ready. It’s not that easy, but there have been announcements for processors that seem to be going directly against some of the other cores in the data center.”

Performance is not just dependent on hardware. It can take a long time to port and optimize software for a given piece of hardware, and that requires the right ecosystem. “Arm was very smart about how they prepared the ecosystem,” adds Schirrmeister. “Ecosystems are centered around the different architecture, like x86, ArmV9, and now RISC-V. It always has taken a while for those ecosystems to be ready and have everything supported. It all takes time to develop and to stabilize that. I would say it’s probably early days for RISC-V. Yes, the momentum is big and we are getting there potentially much faster than in the past. RISC-V benefits from what was going on with Arm because you can learn from what it took for them to get a foothold in the door.”

Industry support
There is clearly work to be done in getting RISC-V ready for HPC. To help facilitate the discussion and the necessary work, the RISC-V consortium has created a Special Interest Group on High Performance Computing (SIG-HPC). The goals of that group are to address the requirements of the HPC community and align the RISC-V ISA. According to their website, they started with a definition for scope, and the interests of the SIG-HPC were rank ordered to provide high-impact results, from discovery and gap analysis to implementation. In order to accomplish this, two things were needed — plot a path to becoming competitive, and extend that path to lead the community with new features and capabilities.

There are also many things happening in the industry that show where several companies are heading. Intel has invested heavily in the Barcelona Supercomputing Centre. It announced a €400 million investment in a new lab dedicated to developing RISC-V processors and supercomputing. However, Jeff McVeigh, vice president and general manager of Intel’s Supercomputing Group, said in a related press release that “RISC-V for HPC is still many years away.”

Their objective is to build zettascale-class systems, which is orders of magnitude faster that the supercomputers of today, and to do it within five years.

Another developer of high-performance processors, MIPS, announced last year that it had switched to developing processors using RISC-V. MIPS announced availability of its first core based on the RISC-V ISA, which currently is being licensed for applications such as automotive driver assistance systems and autonomous driving. But MIPS says that processor core also could be used for data centers, storage, and high-performance computing.

As with software development, being 90% there is only half-way finished. As Tom Cargill of Bell Labs once famously said, “The first 90% of the code accounts for the first 90% of the development time. The remaining 10% of the code accounts for the other 90% of the development time.”

Related
RISC-V Decoupled Vector Processing Unit (VPU) For HPC
“Vitruvius+: An Area-Efficient RISC-V Decoupled Vector Coprocessor for High Performance Computing Applications” published by researchers at Barcelona Supercomputing Center.
RISC-V Targets Data Centers
Open-source architecture is gaining some traction in more complex designs as ecosystem matures.



Leave a Reply


(Note: This name will be displayed publicly)