Does the world need another CPU architecture when that no longer reflects the typical workload? Perhaps not, but it may need a bridge to get to where it needs to be.
The industry is increasingly talking about benefits brought by the RISC-V architecture, but is it even the right starting point? While it may not be perfect, it may provide the flexibility necessary to move forward gradually.
Computer architectures and software have followed in the footsteps of processors developed 80 years ago. They aimed to solve sequential, scalar arithmetic problems using a foundational technology that could solve any finite problem, provided it had sufficient memory.
The chip industry has shown its reluctance to move away from that approach, particularly in industries that still run software developed 50 years ago. Software paradigms and backward compatibility have a huge influence. It took decades before programming migrated from single processor to multi-processor architectures in any meaningful way, and it wasn’t until NVIDIA developed CUDA that applications targeting massively parallel processors started to be developed outside of highly specialized applications.
So why does the industry need another CPU architecture when a typical workload for many applications is a dataflow problem, with a small amount of control? The answer lies in the many ways in which the RISC-V architecture can be evolved, and some of these may enable a slow migration rather than attempting to cross a wide chasm.
A couple of weeks ago, the RISC-V community held its annual summit in Europe, showcasing the increasing manner in which RISC-V is being used and how its impact is increasing, particularly in application domains that do not have a lot of legacy software holding it back. “RISC-V is no longer just the tiny embedded microcontroller that you were not even aware of, that is in your product,” says Andrea Gallo, CEO for RISC-V International. “RISC-V has entered a different phase.”
This is backed up by others. “Compared to previous years in Europe, there was much more industrial attendance,” says Larry Lapides, executive director of business development at Synopsys. “The content was coming more from industry, and that has changed the tenor of the conference. What we’re seeing is that RISC-V is starting to intrude in places that we didn’t expect it to even five years ago.”
Gallo provided several examples of significant progress. “Infinium said they are adopting RISC-V for automotive,” he said. “The European Union is funding HPC projects and funding automotive projects. Meta is using it for AI in their accelerator cards. And NVIDIA estimated that last year, in 2024, they shipped 1 billion RISC-V cores in their GPUs.”
Others highlighted the significant advances of RISC-V in cutting-edge processors. “It is not only the low end, but processors serving as a co-processor or accelerator for GPUs in certain segments,” says Mohit Wani, principal product manager at Synopsys. “NVIDIA gave a presentation where they said they have almost 30 types of different kinds of functionalities across their portfolio, where RISC-V based cores are doing the job for those 30-plus functionalities.”
Some business and technical hurdles still must be overcome. “Automotive is increasingly looking into RISC-V,” says Roland Jancke, design methodology head in Fraunhofer IIS’ Engineering of Adaptive Systems Division. “They have been hesitant because RISC-V is an open architecture. In the automotive industry, you always need someone to blame if something goes wrong. This is hard if you do not have a single provider, but a community. Today, they are increasingly looking into RISC-V due to the potential cost reduction, because in automotive you look for every cent to lower the price. However, you need to have an ecosystem. It is not enough to have the tools that are able to develop your processor, but also the software on top of that. RISC-V is gaining ground, but it still has a way to go until we have RISC-V processors as the main horse in the automotive area.”
Not everyone sees this as being highly noteworthy, however. “RISC-V is not a solution for AI — not for training, not for inference,” says Steve Roddy, chief marketing officer at Quadric. “RISC-V is simply just another control CPU in the same vein as the Arm, x86, MIPS, Xtensa and ARC processors. The latter two also provide the designer with instruction set customization ability similar to, and superior to, that of RISC-V. As such, RISC-V offers nothing of a technical nature that is leaps and bounds better than its predecessors.”
To combine both extreme views requires a long-term understanding of the way the industry works. “RISC-V has the ability to enable AI evolution,” says Venki Narayanan, senior director for systems architecture and embedded solutions in the FPGA division of Microchip Technology. “It needs that. It’s evolving. The models are evolving, both at the learning and inferencing levels. It needs various data types, various memory elements, local memory and to be able to do more custom computing in a much faster way. If you look at that, there are various ways of doing it. RISC-V is enabling that with domain-specific architectures.”
Unique opportunity
It is rare in the chip industry that an application emerges for which there is no legacy software. Nevertheless, that is what happened with AI. Moreover, given the extreme rate at which the technology is evolving, no software is likely to become established before it is again disrupted. This creates the perfect environment for continuous evolution and adaptation.
“With no legacy software to support, you really can tailor your processor to the workload and leave out things you don’t need,” says Synopsys’ Lapides. “There’s still a lot of optimization that can be done at the architectural level, even at a micro-architectural level, designing pipelines and caches and memory. And that’s all before you get to anything else, before you get to the implementation.”
The freedom that comes with RISC-V enables freedom in architecture. “AI accelerator cards can have a huge number of RISC-V cores, even different cores within the same cluster,” says RISC-V International’s Gallo. “You can have hundreds of little blocks, each of them different RISC-V cores, some dedicated to moving data in and out, and some dedicated to the inference itself and the processing. Then you have the concept of adding custom instructions that allow you to have tensor units that are very efficient. This is how RISC-V is influencing the architecture of the chips.”
While RISC-V licensing adds a cost advantage, that is not the only reason. “It goes way beyond that,” says Synopsys’ Wani. “If you look at the way accelerators connect to a processor, it is through a fast interface, and typically in a memory-mapped fashion. From a developer perspective, if you want work done on the accelerator, you send the data and the task information to the accelerator through the interface, and then you’ll wait for the accelerator to send your results back.”
But that communication is expensive, and it means cores are sitting idle. “You are wasting 30% of your time sending data and getting the results back,” says Wani. “You can avoid all of that time if you can natively do those particular operations, sending data through your own vector pipeline, which then connects to the accelerator directly. This kind of flexibility only exists in the RISC-V world.”
Quadric’s Roddy is not convinced. “All control CPUs suffer from the same severe limitations when applied to AI applications, particularly inference applications,” he says. “CPUs are designed to chase pointers in random code. They are not matrix or tensor engines. At best, these CPUs offer vector * vector computation throughput. They are bound by conventional load/store bandwidth bottlenecks. Thus, every approach purporting to use RISC-V for AI must bundle in a separate matrix engine, which introduces graph partitioning problems that are the Achilles’ heel of CPU-based approaches. The appropriate innovation to tackle AI is the creation of architectures that are inherently (matrix * matrix) or (tensor * tensor) optimized, and that breaks the CPU-centric dependency on memory caching and speculative out-of-order pipelines.”
RISC-V enables pieces of this, wrapped in a conventional control processor. “Some of our members already have custom tensor instructions,” says Gallo. “That is the flexibility of RISC-V. You can develop your custom instructions for your specific workloads, and then you carry the complete cost of ownership for your custom specific application. There is also value in standardizing and extending the specifications so that we share the cost of maintaining the compilers and the tool chains and the libraries. We have vector, we’re working on matrix, and there will be different approaches to matrix depending on application use cases. If it is an accelerator card, if it is an AI IoT application or edge AI, there will be different ways of implementing matrix acceleration.”
With no other candidates on the horizon, you have to make do with what is available. “Data flow is very important in AI, and in many compute elements,” says Microchip’s Narayanan. “The compute requirements have grown, and that needs to be in a more power-efficient way. It’s not just the micro-architecture of the instruction, fetch, execute, and the write-back. It’s how you organize the micro-architecture and the data flow. How do you get data in and out — particularly large amounts of data going from one layer to another layer. You can’t keep going and writing back to DDR.”
With AI evolving so rapidly, there needs to be a lot of flexibility. “The types of models that one needs to implement in an efficient manner are growing,” says Nilam Ruparelia, segment leader for AI and communications at Microchip. “Transformers are what ChatGPT needs, and this is what made AI popular, but there are a whole set of models that are a lot less complex than transformers. Those also need higher performance. For example, classic CNN, RNN, LSTM need the math block architecture, the DSP block architecture, to be more amenable to those transformers or to those models in order to make the performance a lot better.”
AI is not just one problem, and flexibility remains important. “AI has multiple layers,” says Narayanan. “You have segmentation, object detection, classification, transformers, and all of those use different data types. How are these layers being implemented? How can you do that efficiently, and how do you get the data in and out between the layers? Those are the questions that you have, and this is how the architecture helps you build that.”
It includes not only the operations but the data. “RISC-V has an inherent advantage, where you can make custom hardware without violating the ISA specification,” says Microchip’s Ruparelia. “You can make processing for a specific data type better, be it at the instruction level, be it in with the micro architecture, so you are building a custom computing solution for that class of workload. This flexibility plays a big role in being agile towards new data types as they come along, and they will be coming along, in the near future, for different classes of applications.”
But the full requirements for a processor to optimize the tasks of tomorrow remain unknown. “If we are building a processor for edge inferencing, it may be doing classification, detection, segmentation or even some sort of transformer,” says Narayanan. “These are fixed layers, and we know the compute elements are designed to be more optimized and work through these workflows. We build a computer optimized for the needs of today, but that doesn’t mean that if you have a new layer, or new operator types, or something like that, we can’t do it. It’s just that it’s not going to be as efficient as if you designed it that way.”
Ecosystem
The ecosystem has been a major obstacle for adoption in some application areas, but steady progress is being made. “We upgraded our membership in the Yocto project to the platinum level,” says Gallo. “This is a very strong message to the ecosystem. Yocto is the most pervasive embedded Linux distribution. This is the first time in many years that the Yocto project added a new ISA, a new architecture, and being a platinum member means that RISC-V will be on par with the other architectures. Yocto is used not only for embedded Linux, edge AI IoT, but also consumer set-top box TVs, and automotive in the infotainment side.”
Mobile is catching up. “Google has made RISC-V a first-class citizen as far as Android development is concerned,” says Wani. “As the software stack matures, we will also see entry level application cores in popular segments like mobile, laptops and so forth.”
When enough leaders step in, the pack follows. “Red Hat announced that they have a developer preview of REL for RISC-V,” says Gallo. “Fedora is available for RISC-V. Last January, at FOSDEM in Europe, Canonical disclosed that they are building plans to support the RVA23 profile with Ubuntu. The message is that RISC-V is ready for application processors and for standard operating systems.”
Significant investments are being made. “Another interesting project in Europe is the DARE project,” says Lapides. “That is injecting 260 million to 280 million Euros over the next five years. The three key vendors are each building chiplets, not just processor IP. One is a general-purpose CPU, one is a vector accelerator, and one is an AI accelerator. Those chiplets will be integrated. RISC-V with a new chiplet architecture is interesting.”
Conclusions
RISC-V may not be the perfect solution for many applications, especially those related to AI, but it could provide an evolutionary path. In addition, there is nothing else on the horizon that can provide a better alternative.
Very few revolutions have succeeded in the semiconductor industry. By having an open, and growing, community of people who rapidly evolve the definition of what they need today, what they think they might need tomorrow, and with a view to the horizon, they may be able to get to where they ultimately need to be without taking on too much risk. By taking small steps, the ecosystem can keep up, and steady progress is made.
Related Reading
RISC-V Heralds New Era Of Cooperation
Collective risk taking and pooling of knowledge are only possible with a framework for collaboration. But will it continue?
When To Expect Domain-Specific AI Chips
With the intended application evolving faster than silicon can be developed, optimizing hardware becomes a delicate balance.
Leave a Reply