RISC-V Micro-Architectural Verification

Verifying a processor is much more than making sure the instructions work, but the industry is building from a limited knowledge base and few dedicated tools.

popularity

RISC-V processors are garnering a lot of attention due to their flexibility and extensibility, but without an efficient and effective verification strategy, buggy implementations may lead to industry problems.

Prior to RISC-V, processor verification almost became a lost art for most semiconductor companies. Expertise was condensed into the few commercial companies that provided processors or processor IP, and often developed their own in-house flows and tools. But the advent of the open-source RISC-V ISA, and a proliferation of open-source implementations, has spawned a lot of interest — and the need for appropriate tools and expertise.

In the past year, several new extensions were announced by RISC-V International. In addition, users are encouraged to make their own extensions and modifications. On top of this, there are many ways in which a core can be implemented, with the specification often leaving details open. However, while it may be a relatively quick and easy process to develop these extensions, it is not so easy to verify them.

There are few standards or open-source tools that help with processor verification. “RISC-V is an open ISA,” says Pete Hardee, group director for product management at Cadence. “Anyone can take it and implement a processor. But the leaders in the RISC-V market know that just because they don’t need to pay license royalties, it doesn’t mean RISC-V is the cheap option. There can be no short-cuts for verification if you want to be successful with RISC-V.”

This goes way beyond testing instructions. “A lot of people are naive in thinking that verifying a processor is testing that the instructions work,” says Simon Davidmann, CEO of Imperas Software (now part of Synopsys). “They go and build a test generator that does this, or does that, or bring in a compliance suite, but the real issues are to do with micro-architecture, pipelines, asynchronous events, and all these sorts of things. There is no standard way, or even public discussion, about the complexities of verification of the micro-architecture.”

Others agree. “There is some level of underestimation of verification required for a given CPU,” says Ravindra Aneja, director of applications engineering in Synopsys‘ EDA Group. “RISC-V brings in a lot of control and data path complexity. It provides a huge spectrum of complexity that you can have based on what application you’re targeting, what instruction set you’re looking at, whether you’re going from 32-bit to 64-bit, or which extension of the core you are considering.”

The control path certainly is getting a lot more complex. “We are just starting to see implementations include speculative execution, and out-of-order execution is coming,” says Cadence’s Hardee. “These are common techniques to provide greater performance for more challenging workloads, and will open up RISC-V as a serious challenger to established processor architectures in server-class devices. But these techniques also lead to security flaws that can be exploited, such as Spectre and Meltdown.”

Still, the chip industry can be a quick learner. “People working on RISC-V come from traditional CPU architecture backgrounds,” says Synopsys’ Aneja. “It could be Intel, AMD, or Arm, and they are now working on RISC-V. They are re-using whatever has been learned, and applying this to make sure you’re not reinventing the wheel. That’s why the community has brought up this ecosystem with relatively fast turnaround.”

But it’s not easy. “The verification of RISC-V micro-architectures stands at a crossroads, balancing the benefits of openness and flexibility against the challenges of diversity and complexity,” says Andy Nightingale, vice president of product marketing at Arteris. “As the ecosystem matures, addressing these challenges will be crucial for establishing RISC-V as a reliable and secure alternative to other micro-architectures. With a community-driven approach, continued innovation, and a focus on standardization, the verification landscape for RISC-V undoubtedly will evolve and rise to the challenge.”

Beyond random
Many techniques have been developed for ASIC verification. “Processor verification is different from regular ASIC verification,” says Hardee. “It’s harder. Remember, the AS in ASIC stands for application-specific. Fully verifying a chip for its intended application is finite and bounded. Processor verification is not. Every operation in the processor instruction set architecture (ISA) must be verified to provide the specified behavior in every eventuality — every combination of instructions. In general-purpose applications, that cannot be predicted at the time of verification of the processor IP.”

SystemVerilog and UVM are the ASIC verification workhorses. “UVM is a great way for doing random instruction generation,” says Charlie Hauck, CEO at Bluespec. “But it has limitations. For example, coverage. If you say you have 100% coverage on the ‘add instruction,’ you must have a different meaning of coverage than me, because I don’t believe you covered every possible combination. Plus, a typical way of feeding stimulus to a processor is to generate a bunch of random instructions. You may be able to shape those instructions based on particular areas that you want to test, but that’s a very inefficient way of trying to verify a processor. It will flush out some of the easy bugs, but you’ve got to get a lot more efficient.”

Several instruction generators do exist. “We have tried out five different test generators,” says Imperas’ Davidmann. “If you consider one with a bit of directed random, where you can give it guidance to test a 32-bit add instruction, with hazards, it will take hundreds of thousands of instructions. There are better approaches than simple random. You also will need streams of asynchronous events interacting with instructions. Test generators are evolving. Anybody can write a simple random test generator in an afternoon. There’s a whole host of technologies that are evolving in the test generation area of RISC-V, and they are getting more sophisticated.”

But the question remains: Is it enough? “A lot of customers are doing constrained random, and that provides a lot of value,” says Aneja. “Based on experience with traditional CPU providers, and what we see in RISC-V cores, my sense is that simulation-based verification is not going to be enough. And that’s why you have to look at alternative approaches like formal verification. It can target both the control path complexity, which brings in a lot of concurrency, as well as the data path complexity. I see user community people talking about verifying data paths with random simulation, and that gives me pause.”

Bottom up
Processors are verified in a bottom-up manner, similar to how systems are verified today. “Processor sub-units include branch prediction, parts of a pipeline, or any type of memory system like a cache,” says Bluespec’s Hauck. “When we explain what a particular device under test is supposed to be in terms of certain properties, we can capture those as properties and define a vocabulary of commands. I can create a generator that creates a sequence of these commands. It will keep adding to this sequence until it finds some sequence that breaks with respect to a golden reference model. Then it starts to shrink the sequence by removing commands that do not affect its ability to generate the bug. This was a big benefit to not only finding bugs, but diagnosing, debugging, and fixing them as well. This tends to work very well for a large class of sub-units.”

This also tends to be where formal methods shine. “If you’re doing a RISC-V, you will have a prefetch buffer, you have ALUs, you have your register models, you have multipliers, a load store unit,” says Aneja. “For control paths, simulation has its challenges. But when you go to data path, it’s a totally different scale, and you are not going to be anywhere close to exhaustive with simulation. Instead, you can write properties for those functions and then verify it with formal. Some people are doing it with constrained random, and they get good coverage. But if you don’t use formal you run the risk of leaving some corner case there, which can come and bite you.”

Fig. 1: Using formal to verify processor sub-units. Source: Synopsys

Fig. 1: Using formal to verify processor sub-units. Source: Synopsys

With the sub-units verified, they can be integrated. The last thing you want to be doing is finding an ALU bug while booting Linux.

Now, a more mixed verification strategy is required. “Formal is useful since, fundamentally, the formal tools exercise every possible combination of inputs to break the ISA-specified behavior, which generally is captured as SystemVerilog assertions,” says Hardee. “The major processor vendors also have extensive verification suites, including UVM testbenches and test software. Emulation is necessary for complete verification of all the elements of a large processor, and to ensure correct behavior integrated into a SoC, while also allowing test software to be executed on the processor under test.”

Properties developed earlier for sub-units can still be used. “Micro-architectural verification happens in two ways,” says Ashish Darbari, founder and CEO of Axiomise. “The first method picks up bugs automatically when architectural verification assertions and covers fail in formal verification. RTL implementation of architecture can cause architectural violations, which are easily picked up as functional bugs — or even as safety ones, or security issues via the confidentiality-integrity-availability triad. The second way of enforcing rigor with micro-architectural verification is to shower checks and covers all across the RTL interfaces and let formal verification tools pick up failures across different functional design components. This method has the additional value of increasing bug hunting, as well as increasing proof convergence via compositional reasoning and increasing overall coverage.”

And eventually, someone will want to boot Linux. “It is amazing how many bugs you can have in a core and still boot Linux,” says Hauck. “You find all kinds of things that you don’t find in other verification, like lots of asynchronous effects. There are timers that go off. There are even things that can drive you crazy, like the difference in the timing bases between a simulation versus an FPGA-based emulation.”

Most people verify a processor by comparing what an implementation does against a golden model. “Some people think by just comparing instruction traces that they are doing a good job,” says Davidmann. “But this approach starts to have real issues when you bring in things like asynchronous events, multi-issue pipelines, or out-of-order execution. Plus, the specification is not precise in every aspect. It doesn’t say what happens when six interrupts of the same priority all happen at once. Which one does the micro-architecture choose to take, and at which stage in the pipeline? There are many implementation choices made in the RTL, in the pipeline, in the micro-architecture, which will be different from core to core.”

There are some things that can be done to help. “When you get timer interrupts, you may get out of sync between the reference model and the actual DUT,” says Hauck. “But if your DUT is good, and your golden reference model is indeed a golden reference model, you will always execute the same number of instructions if you take that timer out of the equation. So instead, I may have a timer interrupt every 5,000 retired instructions, not every 1 million clock cycles.”

One of the big attractions of RISC-V is that you are free, and almost encouraged, to modify the processor so it is better suited to your specific application. “The challenge is every single thing you put in doubles the verification, doubles the complexity,” says Davidmann. “It’s very easy to add things. It’s very hard to get them out the door in high quality. A lot of people don’t realize the complexity that custom instructions add. They have to do all of that verification themselves. For everything you add, you have to completely reverify everything, and much more. You have to take into account how it affects the rest of the design, especially if it changes things in the pipeline, conflicts in the ALU, issues with the caching system, or the load store stuff.”

When are you done?
Verification is never complete. The generally accepted approach is that when it is done adequately, the risk is manageable.

“Coverage tells you that you have done something, and it gives you a certain level of confidence,” says Aneja. “But it doesn’t tell you everything. It is a problem if you don’t have it, but it doesn’t guarantee there will not be problems. If you are using simulation, you can generate a lot of coverage reports, and these will give you confidence to say you have stimulated a good part of your design and that coverage proves it, and you have found certain class of bugs. With the complexity, this coverage is not going to be sufficient.”

But processor coverage is unique. “Everybody is talking a different language when they’re talking about coverage,” says Hauck. “It’s pretty easy to go through and touch every one of the 432 million instruction variants that exist. But at that point, it just says you tested your decoder. You didn’t really test anything else in terms of sequences of instructions, combinations that you see, and what might happen to a pipeline. Some people are happy to generate 4 billion instructions, but perhaps you can do better by sitting down with the designer and talk about the pipeline. Identify the things they are really worried about and focus on different combinations of instructions that are more dangerous than others.”

It has to go beyond pure function, as well. “With new custom instructions and items such as vector extensions being introduced in RISC-V, it’s important to know how micro-architectural decisions affect the full SoC and the workloads running on them,” says Andy Meier, principal product marketing manager for Siemens EDA. “Hardware-assisted verification, such as virtual prototype capabilities, emulation, and hardware prototyping are critical components in the overall verification flow. These technologies help to ensure that RISC-V micro-architectural decisions don’t have negative impacts on power and performance tradeoffs.”

When safety is concerned, more rigor is required. “Depending on the level of certification required for the end product, there is a certain level of fault coverage they have to achieve,” says Aneja. “It requires you to inject well-defined faults, which are defined by ISO 26262 for functional safety. You do fault analysis, and then you generate diagnostic coverage. If you insert these faults for critical functions in your design, do you have the safety mechanism built in to take care of those faults, depending on the criticality of those faults?”

When it is impossible to complete verification, heuristics tend to become important. “The overall lesson is you got to keep running real software,” says Hauck. “After you run it for a certain amount of time, everything just seems to work, and it doesn’t seem to break anymore. We can’t quantify what that is and when it is.”

RISC-V’s open-source nature also exposes it to potential security risks. “While transparency allows for community-driven scrutiny, it also means that adversaries have access to the same information,” says Arteris’ Nightingale. “This necessitates robust security verification, ensuring the micro-architecture can withstand diverse attack vectors. Compared to closed architectures, the challenge intensifies, where proprietary designs can maintain secrecy around their security features.”

More dedicated verification tools are required that are tuned for processor verification. “When processors were designed by five different companies, there wasn’t a lot of need for test generators or formal tools around those processors because they were all built internally,” says Davidmann. “But now with RISC-V, there’s definitely a market for architecture analysis, verification, formal around RISC-V ISA and many more. We built a RISC-V DV environment for verification. We’ll see people doing that for performance analysis tools and formal tools over time, but it’s early days.”

Fig. 2 (below) lists methodologies that could be considered when verifying a processor micro-architecture.

Fig. 2: Verification methodologies for processors. Source: Arteris

Related Reading
RISC-V Wants All Your Cores
It is not enough to want to dominate the world of CPUs. RISC-V has every core in its sights, and it’s starting to take steps to get there.
Is RISC-V Ready For Supercomputing?
The industry seems to think it is a real goal for the open instruction set architecture.
The Uncertainties Of RISC-V Compliance
Tests and verification are required, but that still won’t guarantee these open-source ISAs will work with other software.
Selecting The Right RISC-V Core
Ensuring that your product contains the best RISC-V processor core is not an easy decision, and current tools are not up to the task.



3 comments

Pitchumani Guruswamy says:

Good article of current verification flow scenario. The key dependencies of verification seems to be on Architecture, Design/Implementation, Methodology, Tool/IP Vendor and IT infrastructure apart from verification plan/execution.

Mike Bradley says:

I wonder if standardized interfaces between the internal CPU blocks would ease verification. That is, standard interfaces would limit interaction between blocks, and allow automation of verification.
Of course the downside is less customization and potential loss of performance. To some extent, standardization is always a limit to performance and flexibility at the gain of interoperability and reduced implementation and verification costs

V.P.Sampath says:

Good post.The list of methodologies in a nutshell for verifying a processor and bottlenecks we face in entire cycle and also the EDA tool surge with the Security concerns of the Processor.Keep posting similar and lets deep dive into Processor verification

Leave a Reply


(Note: This name will be displayed publicly)