A Minimal RISC-V

Is there room for an even smaller version of a RISC-V processor that could replace 8-bit microcontrollers?


Microcontrollers exist in almost everything, but can RISC-V satisfy the needs of this market? Is it small enough to replace 8-bit processors? What might help people migrate to a more modern processor architecture?

RISC-V defines a 32-bit processor instruction set architecture (ISA) that is open source and free to be implemented in any number of ways. It is touted for being a very small and efficient architecture, and at the same time has been defined to be easily extensible. Many add-ons already are approved extensions, and a large number were unveiled at the RISC-V Summit in December 2021.

But questions remain. Is the base specification small enough? Instead of adding additional capabilities, is there a need to remove things? Is it useful as a microcontroller? The 8-bit microcontroller market was about $8B in 2020, expected to grow between 4% and 5% for the foreseeable future, according to multiple industry reports. In 2014, 8-bit was still the largest volume, accounting for 39.7% of sales, while 32-bit was close behind at 38.5%.

Today, the 16-bit market has become the largest, with a 48.8% share. 8-bit is gradually losing market share, but that is going to 16-bit and not necessarily 32-bit. Most of these are discrete chips, and there clearly is a large and sustainable market for small processors.

Controllers everywhere
Complex devices may contain a number of controllers, but they rarely get any attention. “There are many chips that contain several large Arm cores, being used as the applications processor, but then you also find lots of much smaller processors in there as well,” says Simon Davidmann, CEO for Imperas Software. “They are being used to do all sorts of things, and many of those may well be niche processors based around RISC-V. Nobody really knows what they are, because they are hidden.”

And those processors do not have to abide by the same rules. “You will always have a few CPUs that do the general processing and orchestrate the movement of data around a system,” says Michael Frank, fellow and system architect at Arteris IP. “They do everything that needs to be Turing-complete and programmable. But then you have a few things that are specialized. And this is the best use of silicon area, because now you can leave out anything that you do not need in the algorithm.”

Other large markets for microcontrollers include automotive, HVAC, IoT, and medical. A RISC-V core may take as little as 20,000 gates, so why would anyone worry about trying to optimize it further, when it is likely that an entire chip may be millions of gates? In some cases, cost is the most critical element, which means the smallest area possible. For others, it is power. For devices that must last months or years on a single battery, any logic that sits around doing nothing is seen as waste that has to be removed.

A minimal RISC-V
The RISC-V base is small. It contains just 47 instructions that everyone has to implement. This compares to 1,503 for an x86 and about 500 for Arm. It uses the simplest load/store architecture, which means that all operations are performed on the internal registers, and there are dedicated instructions to transfer between registers and memory.

“RISC-V starts with a simple integer instruction set, basically, the bare bones of a processor already,” says Arteris’ Frank. “There is not much you can strip out of it. The implementation of the simplest RISC-V processor has 32-bit integers. This is what a microcontroller used to be in the old days. I don’t see a reason why you want to strip this down further. The Berkeley team has created a nice layered and scalable architecture. They learned from all the things they had done before and by building a number of variants, and extensions integrated into the architecture, I would always look at this as a base layer rather than something that can be cut down.”

Within the instruction set, it is possible to define the size of the register file. “A very nice aspect of RISC-V is that you can cut it down,” says Imperas’ Davidmann. “What RISC-V is trying to do, as an organization, is help people do things like that. For example, there is the E version that cuts down on the number of registers. It is usually 32, but this version only defines 16.”

For example, SiFive recently discussed an implementation of the RV32E that can be configured to be implemented in just 13,500 gates. ZERO-RISCY, a core developed as part of the PULP platform for energy-efficient computing, has a two-stage pipeline implementation that consumes 11,600 gates.

The specification also allows simple controllers to be defined that do not require a lot of logic to sit around the core. “They have tried to make it so that you can even design it without the privilege mode capabilities and the control/status registers, so you can get it down to a really simple controller,” says Davidmann. “This still allows you to use the standard assembler, and it is still a RISC-V, but it’s not going to do your floating point very fast. It’s just a very small controller.”

Some of these reductions have come from early work with the standard. “There can be many reasons why stripping down a big design to obtain a smaller processor would be valuable,” says Ashish Darbari, founder and CEO for Axiomise. “There are already examples from publicly available processors. For example, ZERO-RISCY and Ibex, two RISC-V cores from the PULP Platform group, are trimmed down versions of RI5CY that later became cv32e40p. In this specific case, RI5CY had custom instructions that were not part of the standard RISC-V ISA.”

One reduction that does not seem to be under consideration involves word size. “If you reduce the word size, you lose the capability and beauty of the RISC-V processor, where you can do address calculations, integer number calculations, everything in the same registers, in the same ALUs,” says Frank. “It is very hard to keep something that is useful after cutting down RISC-V. The beauty of RISC-V is there is a tool chain, and if you start cutting things out you are on your own.”

One irony is that an extension may produce savings. “If you have a bunch of processors on a chip, each one of them can be optimized or customized for specific tasks,” says Zdeněk Přikryl, CTO at Codasip. “It can be AI, it could be security, it can be whatever. We enable processors to be designed in a high-level architecture description language, by which we capture the instruction set, which could be called the architecture view, as well as the microarchitecture view, which is basically the implementation of the ISA. Since we have this single description in a high-level architecture language, we are able to generate the compilers, assemblers, disassemblers, simulators, and in the end, the RTL.”

One such extension is the compressed instruction set, which makes the code space smaller. Compressed instructions allow you to place two instructions into a single 32-bit word. This reduces the amount of program memory required, even though it adds very slightly to the complexity of the processor. One claim is that it takes 400 gates to implement this. That is probably more than made up by the reduction in area for the memory. Other attempts to do this, such as the Arm Thumb format, are essentially a different instruction set.

So an optimized core may not always imply the smallest. “We had one customer who started investigating the ratified extensions to RISC-V extensions, trying to find the best tradeoff,” says Codasip’s Přikryl. “They started with the baseline, then added extensions and investigated combinations of different extensions. They looked at not only performance, but also on size and memory footprint. One of the key things is that the processor is part of a system, and you are trying to optimize that system. In this case, it was important to have efficient code, because when you’re implementing a subsystem in silicon, your instruction memory is one of the key consumers of energy and power. We managed to reduce the code size by about a factor of three during the course of that optimization.”

Making changes does bring in some complexity into the process. “Anybody can take RISC-V and make changes to it,” says Davidmann. “They can add bits, take the bits they like, and throw away the bits they don’t like. So long as it’s for a deeply embedded system, nobody cares what they finish up with. No will ever see it. And that’s why, in the beginning, nobody really cared about compatibility and compliance.”

The modified cores do have to be verified. “The greatest challenge going down this path is to determine what delta changes are doing to the overall functionality of the core, as well as determining that the delta does what it is intended to,” says Axiomise’s Darbari. “These are sweet spots for formal methods that can find discrepancies by checking the architectural compliance of the trimmed down core against a set of formally specified architectural properties in formalISA, an app to formally verify RISC-V cores.”

The RISC-V ISA as specified is a minimal but complete processor architecture that can be implemented in less than 20K gates. The standard has taken into account that in some cases only a minimal core is required and variations and extensions have been defined to make it adaptable to issues such as code size. Extensions may produce more optimal cores that not only get the job done faster, but may reduce size, power or other aspects that are important in a particular application.

In the words of John Lydgate, later adapted by President Lincoln, “You can please some of the people all of the time, you can please all of the people some of the time, but you can’t please all of the people all of the time.”

RISC-V International has done a pretty good job of keeping most people happy.

Working With RISC-V
What’s available, what’s missing, what’s next.
RISC-V Knowledge Center
Top stories, videos, white papers and blogs on RISC-V.
High-Level Synthesis For RISC-V
Abstraction is the key to custom processor design and verification, but defining the right language and tool flow is a work in progress.
RISC-V Targets Data Centers
Open-source architecture is gaining some traction in more complex designs as ecosystem matures.
RISC-V Verification Challenges Spread
Continuous design innovation adds to verification complexity, and pushes more companies to actually do it.


TMS-EE says:

My first job in 1984 was at TI working on the 8 bit Micro TMS7000 with colleagues in Japan. In the same dept were ppl developing 16 & 32 bit LAN & DSP chips resp. My boss said, “Don’t sniff at this as it is the company’s bread and butter and one day even your lightswitch will have a micro in it”.
We may get excited about big Intel, Nvidia and AMD chips but neglect the importance of the ST, TI and other parts on the BoM. (I also worked in auto electronics).

Speculation about the future of semi fabs like Infineon, NXP etc, is growing and the supply of parts from fabless companies continues. Now, the development of more low-energy parts like this is vital. It can even be a good skill for the future architects before doing a small role on a big chip.
I tell young engineers that Jensen has said that it takes a team of 3000 people 3 years to develop a new GPU. Lex Fridman may interview some fascinating chip architects but the tech industry is notorious for failing to develop its talent. Even the TSMC CEO has said that the US cannot catch up without more people doing PhDs.
I currently see debate about RISC-V in HPC and the growth of the Arm server market but I think more about the opportunity for techies to respond to the opportunity in Emerging technologies across many sectors of industry. A great opportunity to develop skills will lie in the problems that have been overlooked for too long.
I’m as fascinated about the development of big silicon, quantum etc as anyone, but I want to see talent grown as attitudes to the importance of the semiconductor industry have changed.
Finally, more than 35 years later, my lightswitches are still not remotely controlled yet!

Will says:

“The RISC-V base is small. It contains just 47 instructions that everyone has to implement. This compares to 1,503 for an x86 and about 500 for Arm.”

That’s incorrect – since we’re talking about tiny microcontrollers, the comparison should be with Cortex-M0, which has 56 instructions and about 12K gates.

Brian Bailey says:

That is for the Thumb instructions only, but you are right that a minimal controller may not worry about those limitations.

S. Meirowsky says:

You are deceiving reader by falsely claiming that minimum ARM requires implementing 500 instructions. Since you are comparing minimum RISC-V, then you need to state the minimum required to implement ARM Cortex-M0 or M0+. Please fix this article!

Brian Bailey says:

You are correct that a Thumb only processor, such as the Cortex-M0 is smaller, and may be suitable for many controller applications. However, it then comes with restrictions such as limited address offsets and the lack of certain control registers that prevent it from running many operating systems. Once you remove those restriction you need the full Arm instruction set which is around 500 instructions.

Andrew says:

The RISC-V SERV processor is quite small and a SoC version is supposed to be built soon via Google’s MPW project. Any discussion of limited size processors really should investigate SERV.

Karl Stevens says:

The underlying thing is that RISC V is intended to be programmed in assembly.

However, in the real world compiled languages are used and every little change has the potential to impact the compiler, be careful.

Also remember that every load or store accesses memory — which takes time in addition to the processor execution time.

So what? For one thing, chips have embedded memory blocks that could be allocated instead of a block of shared memory. Check out the speed!

Ragnar says:

You’re missing the point guys.

Using à generalized microprocessor is à way to gainin time.

If you really want to reduce the cost of an application you need to go for dedicated hardware, Electronic circuits.

Another tradeoff to gain time is programmzble logic, a PLA.

Leave a Reply

(Note: This name will be displayed publicly)