Systems & Design
SPONSOR BLOG

No One-Size-Fits-All Approach To RISC-V Processor Optimization

Combining different levels of configuration and customization to meet PPA goals.

popularity

As the demand for high-performance processors continues to grow and semiconductor scaling laws continue to show their limits, the need for processor optimization is inevitable. As I explained in a previous blog, RISC-V is designed to enable this. However, there is no one-size-fits-all approach to processor optimization. As each workload and each application will have their own requirements, there are actually different ways to optimize. You can modify a processor IP at different levels, each with its own benefits. In this blog post, let’s define and explore the different levels of processor optimization. From configuration to customization, let’s see how you can use them to create optimized processors that meet specific requirements.

We can define three levels of processor optimization, with different benefits and use cases. All three levels are not mutually exclusive, you can combine them to achieve your PPA goals.

3 levels of processor customization. Source: Codasip

Configuration: setting the RTL parameters of a standard core to pre-defined values

Each processor IP comes with a set of adjustable, pre-defined parameters. They are delivered with a default value that you can modify and set to the one you need for your particular use case. You typically set and easily modify these parameters at RTL level. This level of optimization is very common and widely spread in the industry. Such parameters may include the number of interrupts, the presence or absence of a simple feature, or a cache size for example.

This level of tuning is expected for any processor IP and is obviously available with standard Codasip RISC-V cores that we deliver as RTL. The IP is fully verified, simplifying its integration, however the range of parameters and possible values is limited, as well as the exploration space.

Providing these parameters is necessary but is insufficient to create a truly unique product tailored for your specific needs. Why? Both because of the limited set of options and their implementation at RTL level, which is notoriously difficult to parametrize. Configuration therefore only gives you limited control over the final design.

Advanced configuration: big structural changes to adapt your design

Let’s now look at the next level beyond configuration. At a high level, this concept looks similar. But the idea here is to enable larger, more complex parameters, that result in significantly different RTL. Example of configuration options include:

  • The addition of caches and TCMs
  • The presence of a floating-point unit
  • Or the presence of a branch predictor

This flexibility is less common for processor IP. All Codasip RISC-V cores are designed in a high-level language called CodAL and can be configured with Codasip Studio. You just select your advanced parameters from the configurator GUI, and the tool generates RTL that only contains your own optimized configuration.

Under the hood, the CodAL source code of the processor features all the options presented to the user. The Codasip Studio tool then synthesizes CodAL into RTL.

You can pick among a large list of CodAL configuration options. You do not need any specific knowledge of CodAL (even though this C-like programming language is easy and straight forward). This is a great step forward towards a tailored product for your specific application. Indeed, when you are ready to go one level up and optimize both hardware and software, you can do that from the same source code.

Customization: deeper processor IP optimization

Here we enter another level of optimization. Designers actually modify the IP to access a higher level of efficiency for targeted applications. This is the realm of custom compute.

Customization of a Codasip RISC-V core means fine-grained modification of the IP with the ability to modify anything you need in the source code, at both architecture and microarchitecture levels. Instead of just modifying existing parameters, you can now add or remove instructions, change the register set or add completely new capabilities or interfaces. Codasip Studio’s profiling function points to potential areas to improve and gives you very fast feedback on how your application performs with these modifications, which is essential to iterate fast and obtain optimal results.

Starting from a verified RISC-V core also makes this customization project faster and can significantly reduce the verification effort, which is typically the most time-consuming task in a design project. Full optimization of a Codasip RISC-V core, done in Codasip Studio with CodAL, is a practical way to get custom compute for your application. The entire design flow is automated, and the tool automatically generates for you an SDK and HDK that are aware of and match your custom core. No need to create everything by hand!

Automated approach to custom compute. Source: Codasip

Example use case of processor optimization

Let’s imagine you want to optimize a processor for specific machine learning workloads, such as convolutional neural networks (CNNs).

With an important shift to device-level AI processing, the ability to run AI/ML tasks becomes a must-have when selecting an SoC or MCU for IoT applications. But embedded devices are typically resource-constrained, making it difficult to run AI algorithms on embedded platforms.

Using the Codasip L31 RISC-V core and Codasip Studio, we can explore and customize the processor design to improve its efficiency when running ML algorithms. Profiling tools enable designers to compare the performance of a standard and an optimized core, highlighting the benefits of custom instructions for NNs.

Our approach was to tune the processor at different levels:

Our approach to processor optimization for ML workloads (use case). Source: Codasip

By profiling the benchmark applications on image recognition, we confirmed that image convolution is a major bottleneck that takes more than 89% of CPU time. Less than 200 lines of CodAL code were sufficient to implement a convolution accelerator, tightly integrated into the Codasip L31 core. With less than 10% impact on maximum frequency, this modification provided >5× performance increase and >3× less energy consumption. Codasip Studio automatically generated an optimized compiler, to enable this efficiency improvement without changing the software!

We describe all the details of this use case in a dedicated technical paper.

Combining processor optimization approaches

As we said, there is no one-size-fits-all approach to processor optimization. Processor IP modification can be done at different levels, each level bringing different benefits. It’s usually the combination of all that helps you achieve optimal PPA for the unique product you are developing.



Leave a Reply


(Note: This name will be displayed publicly)