Cracking The Tough Nut Using Formal Methods

The EDA industry has a message for researchers in formal methods: There’s a right way and a wrong way, and if you don’t pick the right method, formal can be hard to crack.

popularity

Pranav Ashar, CTO of Real Intent, assured a packed room of researchers and practitioners of formal methods at the recent FMCAD conference: “Static verification is being used in the verification of designs. Every major chip out there is using static methods for sign-off today.” He used an analogy of cracking a nut. “There’s a right way and a wrong way and if you don’t pick the right method, formal – just like a nut – can be hard to crack. By combining the separation of concerns and using a solution driven approach, SoC verification becomes a simpler nut to crack.”

Ashar talked about how the design paradigm has changed over the past few years. No longer is there a single application driving the high end of design. Instead, there are many domains including mobile, servers and communications. Even though the recently announced Apple A7 chip contains about 1 billion transistors, complexity is more than just transistor count. Systems are composed of multiple sub-systems. “There are 100 or more clock domains and hi- speed asynchronous interfaces, there are voltage islands and dynamic clock and power gating, and so much more such as timing constraints, making sure that the design comes up in a deterministic state, design for test (DFT), hundreds of IP cores and analog, all integrated into a single system on chip (SoC).”

While this is a formidable challenge, it is not insurmountable according to Ashar. “There is a system to it. This provides a clarity and separation of verification concerns. Each of these is well-defined, is a substantial sign-off requirement, and some of them are fertile ground for the application of static methods. It is the narrowing of scope that makes formal methods viable.”

To make an application viable you need three things: a spec, powerful analysis tools, and a quality user experience. “By taking a narrow focus, it makes the formal engines shine. As an example, if you are verifying the asynchronous transfer of data across a clock domain, the amount of logic that you need to bring in to verify the correctness of the transfer is not the complexity of the chip. It is much smaller, it is very locally focused.”

When a formal analysis is not completed, there is a lot of information that the engine has computed internally that can be articulated to the user in a manner that is customized or focused to the verification problem the user is trying to solve. Ashar believes that not enough is being done with that data today and that it puts formal at a disadvantage. He described the divide between static and dynamic methods very succinctly: “Simulation is used when you don’t understand the problem. You look at the output to make sense of things. When you understand the problem you can find better methods.”

Asynchronous clock domains
There has been an explosion in the number of interfaces in an SoC and it is no longer possible to send the same clock to all parts of the design because the delay would exceed the clock period. With parts of system using variable frequency, the proliferation of clock control and the desire to run each interface at its own frequency, there are typically about 100 clock domains in a chip and that represents 100000 signal crossings, each of which have to be verified.

This is where functionality and timing are intertwined and verification requires that the time dimension is explored due to metastability. This has caused a breakdown in the existing flow of RTL synthesis and gate level simulation or equivalence checking.

Fatal errors caused by clock domain crossings. Figure courtesy of Real Intent.

Clock domain issues are troublesome because the failures are intermittent. Synthesis can also insert problems when it performs retiming, so even if the RTL was good, it does not mean that all of the issues have been identified. “You can’t go after each of the problems individually, but instead you need a complete strategy for dealing with classes of problems.” Ashar told the audience. “This starts with some anchor that can be identified, and in this case it is clocks that cross a boundary.”

This is usually the place of a synchronizer. “Once the anchor has been found, identify the protocol being used and check to see that it is being used correctly. Much of this must be done automatically. As with all formal methods: when a violation is found a compact example is created to demonstrate the problem.” Ashar made a point of noting that how the results of an analysis are presented is important. “Traces need to be minimal and make it clear to the user what is going on.”

Other targets for formal
Another problem that is bubbling up, according to Ashar, is timing constraints management. “In the past getting this wrong was not critical. It was a quality of results issue. Today it is becoming a critical problem that can result in catastrophic failure. This requires the constraints being ready earlier in the flow (before RTL rather than in layout), but this is often dependent on disparate groups within a company.” An example of this relates to multi-cycle paths where setup and hold timing can be problematic between cycles. This again involves multiple clocks that could be drifting. Ashar reiterated that in this situation, the spec is implicit and the scope of the problem is bound, making it ripe for exploitation using formal methods.

“One of the areas that we focus on” explained Ashar, “is where we can bound the problem. Block level verification involves a lot of time developing a test bench but generally has simple debug, but when moving to the system level, the debug difficulty escalates.” The implication of this is that bugs should be found as early as possible and formal should not be saved until all of the other verification techniques have been exploited. By using formal earlier in the process, costly simulation can be avoided because issues have already been resolved using automated methods.

Another area where formal can shine is where simulation has limited ability to handle the semantics. X propagation is one of these areas and simulation can be both optimistic and pessimistic. This is especially a problem during initialization where a lot of Xs exist in the circuit. Gate-level simulation is both slow and pessimistic. Static methods can do better.

X-Hazards can produce optimism and pessimism. Figure courtesy of Real Intent.

Ashar concluded by talking about how important formal methods have become in the SoC flow and he had a message for the audience “the FMCAD community has contributed to this outcome, but could also learn from how it came about. EDA companies are the consumers of the technology that this community has created and yet they are just engines. If you realize the types of problem we are trying to solve it will create an anchor for your work.”

The audience reacts
Moshe Vardi, Professor in Computational Engineering at Rice University, asked “Of the total verification effort, which some claim to be 70% of the design, how much is dynamic and how much is static?” While Ashar said he did not know, Vardi claimed that without this kind of information it is difficult to know the impact of their efforts. Vigyan Singhal, CEO, Oski said he belives about 10% of that is static. Ashar added that it is related to the amount of setup required. “Formal is required for signoff. So the percentage is irrelevant.”

Vardi countered “When problems are found, we work out a way to solve the problem. This is the same as the Microsoft buffer overflow problem. But the discipline only comes after the problem has been found.” Ashar responded, “You can define discipline but you can’t force engineers to accept it. The time that is spent in formal is small because it works. These tasks would take a lot of time in simulation.”

Another member of the audience asked about the improvements in engines that are needed by EDA? Ashar asked them to see how the debug experience can be improved from the information collected from an analysis. “Even if the analysis is incomplete there is still actionable information that can be given back to the user. Simulation has the advantage that it always provides something to look at.”

Ashar concluded with a sobering message for attendees. He told them that they should be looking at solving small problems in the SoC design flow rather than assuming they can become the sole tool used for verification. By looking at focused problems, it is possible to achieve tangible results that are better than simulation.

See also:
Do Students Need More Formal Education?
Coming up with the Wrong Formal Answer