Who Does Processor Validation?

Heterogeneous designs and AI/ML processing expose the limitations of existing methodologies and tools.

popularity

Defining what a processor is, and what it is supposed to do, is not always as easy as it sounds. In fact, companies are struggling with the implications of hundreds of heterogenous processing elements crammed into a single chip or package.

Companies have extensive verification methodologies, but not for validation. Verification is a process of ensuring that an implementation matches a specification, and companies today have extensive verification methodologies. In comparison, validation is making sure the specification is right and fit for purpose. An important aspect for processor is whether they correctly execute code that targets the specification and meets the requirements?

It takes Arm more than 11,000 pages of dense text to define its family of processors. And that only works because it controls everything about those processors, starting with the instruction set and the micro-architecture. Even the more recent forms of extensibility are very constrained, and architectural licensees only get to change the micro-architecture.

For an extensible processor architecture, such as RISC-V, the notion of validation is a herculean task because of the degrees of freedom that are central to its concept. “An ARC or Arm processor comes as a piece of RTL with a defined instruction set, and you know that software is going to work,” says Johannes Stahl, senior director of marketing at Synopsys. “The validation has been done for you by the IP provider. The second type of processor is where you have an instruction set, and an architecture license that allows you to implement your own RTL. That adds more degrees of freedom, and more ropes to hang yourself for verification. Another degree of freedom comes when you don’t have a fixed instruction set. You’re building a new processor, which is an application-specific processor. There are tools that help with that. And then maybe the fourth category involves processors with a completely new architecture. These are not instruction-set processors. They are parallel processors, processing AI workloads.”

Depending upon how many degrees of freedom you want, you take on additional responsibilities. “Processor validation confirms that it is a RISC-V or an Arm,” says Simon Davidmann, founder and CEO for Imperas Software. “Then there is verification. Does my RTL match the reference? And then the third thing that needs to be confirmed is, does my system function? There’s a blur between validation, conformance, and verification in that validation, or conformance to the standard, says it is a RISC-V, and verification shows that it has been implemented correctly. Everybody has to do both.”

Conformance is a subset of validation. “System readiness is almost like a certification,” says Frank Schirrmeister, senior group director for Solutions & Ecosystem at Cadence. “It defines a set of test cases that must run correctly. On top of that, for any modifications you made, you need to validate those yourself.”

There is no single tool that can perform processor validation, instead, it requires a flow that often stretches from abstract virtual prototypes to physical prototypes and spans from hardware to software and the firmware in-between. Unlike verification that uses coverage to define closure, it is not clear what validation coverage looks like. This creates problems when a conformance test suite is being constructed.

In both cases it’s difficult to define what performance, power and other non-functional requirements mean, unless provided in the context of a particular workload. “When something is really flexible and can be implement in many ways, everybody has their own variation,” says Synopsys’ Stahl. “Everybody is acutely aware that you can write a C model, and you can run the workload through the C model. However, the key measure of success for an AI engine are power/performance figures of merit. The only way to do this effectively is to run workloads that are meaningful, and that means enough cycles. This can be done with emulation to validate the workload when passed through the compiler and executed on the RTL. Once you have done that, you have the activity profile of what this workload does on the architecture, and you can take that and calculate power — hopefully very fast — and do iterations on optimization.”

Validation goals
The chip industry has not had to deal with processor validation because when buying an IP core from a company like Arm, that already has been done for you. “Verification and validation are the same for Arm because they design and own the ISA, they design and write the RTL, and they build the reference model,” says Imperas’ Davidmann. “They validate the ISA to the reference model. And then they verify the RTL matches that. By definition, the RTL has been validated. Because Arm owns it and controls it, that’s a bounded problem. Just to make sure, before you tape out, you can confirm is it still an Arm, using validation technology provided by Arm.”

Levels of completion for validation may be different depending on if you are creating a general-purpose processor or one that is deeply embedded. “You cannot believe how much verification is done on processor cores that are provided as IP,” says Stahl. “But in other cases, you might say it is enough that it runs this workload and works in my SoC. That’s all I need.”

Often, the reason for defining your own processor is to reach a highly optimized power or performance profile for your application. “Overall system performance needs to be evaluated and tuned, and that is not just about providing a processor to the software guys,” says Frederic Leens, CEO for Exostiv Labs. “It is all about going through hardware/software design optimization cycles to reach the best performance in your niche. Multi-core designs can make software performance difficult to evaluate, even if you can profile software properly. That would allow you to get execution ‘hot spots,’ and capture execution latencies and processor waiting/idle times. But optimization will require tuning the underlying platform together with software modifications so you can reach a real optimum.”

Few people have the ability to juggle so many things at the same time. “You have to verify the processor and the software together, especially in the context where you have AI involved, or very specific workloads,” says Cadence’s Schirrmeister. “You are balancing, ‘Can I do something? Can I extend the processor instruction set, and in the overall balance of hardware and software get a better end objective like better power or better performance?'”

But what makes it so difficult for RISC-V? “RISC-V has a small instruction set, but a very dangerous series of words throughout the documents, which says this can be implementation-defined,” says Davidmann. “This can be implementation-defined. This is implementation-defined. So it doesn’t define a lot of the choices that the users make. What happens when these two interrupts happen and the debug mode comes in implementation-defined. What this means is, it is very hard to prove or demonstrate conformance. So the challenge is, does it conform?”

New architectures
Many companies are defining novel non-von Neuman processors today. “This is very different than what the industry has done in the past,” says Jean-Marie Brunet, vice president of product management and product engineering at Siemens EDA. “Most designs are a derivative of the previous one. They are developed as an N+1 cycle, re-using a lot of the environment, and so the methodology doesn’t have to change much. But two things happened recently. The first is the insertion of new algorithms. AI inference, machine learning — they are a new architecture. They are more cluster-based, and for each cluster the memory and the computing are much closer together. They grow in function by increasing the number of clusters, rising to a very large scale. This architecture is very different when compared to a traditional CPU or GPU.”

A second difference is they exercise different workloads, Brunet said. “These workloads are exercising new paths. They are exercising new functions in the device. If a methodology, like power and performance, is tailored toward traditional architectures, I can rely on monitoring what I know matters on silicon. Those types of techniques are absolutely failing right now, because the preconceived notion that I know what I need to look at for performance, and I know when I need to look at power, is failing.”

The newness of the architecture requires a different approach to validation. “If I want to investigate a new architecture with new processing elements, like an AI processor, I need to look at the performance of the entire system,” says Stahl. “For a single RISC-V core, you probably don’t get involved in that level of system performance. What is being investigated at an architectural level is the AI part of the neural network processor, not the RISC-V processor.”

You need to be careful with what can be done in a virtual environment. “Shift left means you want to run an actual workload on the processor as early as possible,” says Brunet. “You may be able to get away with a virtual model of the processing engine, but for the AI engine that will be at the RT level. What you verify is what is new, and the more derivative type of things can be in a virtual model. This reduced the amount of RTL, and you can run faster sooner, and then you can look at the workload.”

Several companies have created tool chains that start from a definition of an ISA and micro-architecture, and generate the models and the RTL. “You simulate at multiple abstraction levels,” says Gert Goossens, senior director for ASIP tools at Synopsys. “One abstraction level is the input of the tools, which is more like the algorithmic descriptions that you have in languages like C or C++. You execute them natively on the x86 host computer, and then you use those simulation or execution results as a reference. Then you use a tool to automatically provide you with the compiler so that you can compile those applications onto your ASIP architecture. The tool generates an instruction-set simulator. Now you simulate the behavior of the generated code on the processor, and it has to match the reference you got from the native execution. That is the kind of comparison that you do for verification. Finally, you have the Verilog model, which is automatically generated, and you can execute that in an RTL simulator, and you again check the results by comparing it against the reference.”

Depending upon the size of the processor, hardware-assisted verification may be necessary. “Emulators are used for the architectural exploration and verification, and a little bit of validation, but less so on validation,” says Brunet. “For validation you need workloads, and those workloads are an order of magnitude more complicated now than they were 5 to 10 years ago. How those workloads behave in silicon will dictate if the silicon is going to be a successful commercial play or not. You need validation, you need cranking cycles. This requires a different category of hardware, probably FPGA prototyping or enterprise prototyping. And on those you’re really cranking in parallel a lot of workload validation, where a lot of those workloads are new to this type of designs. You are tracking if the performance of the device is matching what the workload is looking for. When you find an issue, you go back to an emulator to debug.”

This is not the same focus as verification. “Verification requires a clinical approach,” says Schirrmeister. “Can I access these memories? Have you covered all the code? Do I have the right bandwidth for all the data to be brought into the CNN, DNN? Then there is a question of the function that the network performs, the AI performance itself. Is everything connected? Does it compute? The functionality is almost like verifying that the processor works without running software on it. Real workloads cause a new set of verification challenges.”

Extending the focus
The huge cranking power is key. “FPGA prototypes that run at, or close to, the target speed of operation are the only way to run a meaningful number of cycles with the system prior to delivering it,” says Exostiv’s Leens. “Another reason for adopting FPGA prototyping as part of the flow is that you can hardly place an emulator into a realistic environment. A lot can be said and done about testbenches, models, and slowing down the environment to accommodate for the slower speed of emulators and prototypes. We hear engineers wanting to interoperate with the ‘real’ environment at speed, because the hard truth is that models are flawed and that fundamentally a processing system is ‘emergent,’ which means that its operation as a whole can reveal unsuspected behaviors – especially when reaching realistic speeds of operation.”

To get the necessary level of visibility requires considering the whole system. “You really can’t look at the processor just by itself,” says Schirrmeister. “You have to include the interconnect to the environment, and the compiler in the middle matters, because that defines how the function is actually mapped to the core. You have to, in this particular case, elevate the testing beyond a specific compile of a very specific software routine, because the processor needs to be generic. You need to extract the characteristics of the workloads to then specifically verify the processor is good for that set of characteristics from an I/O perspective and so forth.”

The compiler is part of the system optimization. “We see this intersectional workload, RTL, and compiler in spades for the AI engines, or AI accelerators,” say Stahl. “It really matters, because when you do a new neural network engine you will have a new compiler for that engine. There’s a lot of work that is unique for the compiler. And that’s where people spend a lot of time also validating the compiler together with the workload.”

Another way people monitor processor performance is by using on-chip monitors. Again, the approach is different depending upon if the design is incremental or fundamentally new. “The methodology associated with this has been around for 30 or so years,” says Brunet. “You are inserting logic, or monitoring functionality or capability, in your device. For most derivative architectures, they know historically what they want to monitor because they have done so much post-silicon characterization. That methodology is challenged when you have new architecture, because you don’t have that post-silicon characterization. You don’t have years of knowledge. Large processor companies have perfected this over the years, but someone has to do the study at the architectural level and determine what to monitor.”

RISC-V provides an interesting opportunity because some of these validation problems are being addressed in a public forum. “RISC-V needed a golden model, which everybody can read and see and understand,” says Davidmann. “The committee chose a language out of Cambridge University called Sail, which is being used to describe a formal model. This is the golden reference. If anybody needs to know how something is meant to work, they should be able to look at that model, push it and prod it, and see what it does. But it is hard, and they are relying on volunteers to do it. This work is years away from being complete.”

Conclusion
Validation is essential, and in a world now dominated by domain-specific processors and novel architectures, it suddenly is becoming just as important as verification. Today, the industry is using tools and techniques that were developed for verification and trying to apply them to the validation problem, but that only gets the industry so far. Established processor IP companies have spent years coming up with formal, or formalized, definitions of their processors. Tool chains then attempt to generate or verify that implementations match those.

If more custom architectures are required in the future to solve specific workload problems, then better approaches will be necessary.

Related
How To Optimize A Processor
There are at least three architectural layers to processor design, each of which plays a significant role.
Why Comparing Processors Is So Difficult
Some designs focus on power, while others focus on sustainable performance, cost, or flexibility. But choosing the best option for an application based on benchmarks is becoming more difficult.
Which Processor Is Best?
Intel’s support for RISC-V marks a technological and cultural shift.



1 comments

Dr. Punam Raskar says:

Reconfigurable computing

Leave a Reply


(Note: This name will be displayed publicly)