Can Formal Replace Simulation?

Exclusive: Formal leaders discuss the ways in which they are stretching formal tools to the limit to solve an ever-increasing array of tasks.

popularity

A year ago, Oski Technology achieved something that had never happened before. It brought together 15 of the top minds in formal verification deployment and sat them down in a room to discuss the problems and issues they face and the ways in which they are attempting to solve those problems. Semiconductor Engineering was there to record that event. A year later, it held the second formal summits and again Semiconductor Engineering was asked to be the fly on the wall and to capture the essence of the discussion.

In order to promote free conversation, the participants, who are listed below, asked not to be quoted directly, so all comments have been paraphrased and made anonymous.


Pictured from left to right are Dave Parry, Oski; Parimal Gaikwad, Arteris; Mandar Munishwar, Qualcomm; Prashant Aggarwal, Oski; Swapnajit Mitra, Broadcom; Syed Suhaib, NVIDIA, Jia Zhu, AMD; Vigyan Singhal, Oski; Anurag Agrawal, Barefoot Networks; Vikram Khosa, ARM; Erik Seligman, Intel; Kaowen Liu, MediaTek; Roger Sabbagh, Oski; Viresh Paruthi, IBM; Naveed Zaman, Qualcomm; and Kamal Sekhon, Oski.

Vigyan Singhal, the CEO and founder of Oski Technologies, started off with a brief history. He said formal verification has evolved from equivalence checking, which basically compared one logic cone with another for each flop in the design. Then formal migrated to assertions, which analyzed cones of logic over a few flops.

“Recently, we are seeing companies be successful with block-level verification where it replaces simulation and can achieve exhaustive verification,” Singhal said. “Today, we are beginning to see architectural-level formal verification where it can help with problems such as cache coherence, absence of deadlocks and the interconnect of blocks. In the future it will not be possible to verify the entirety of the system at the RT level and we will have to solve them smartly by breaking the problem down and decomposing it.”


Fig. 1: From flops to systems. Source: Oski

Singhal pointed out that we have come a long way and that today it costs $100M, and a lot of time, to get a chip out of the door. “We have become lazy, which may or may not be a good thing, because we have relied on simulation for far too long. We sometimes forget to apply the right tool for the job. I have seen people try to solve a high-level system verification problem, which is better solved with formal, using a hardware emulation box. The opposite happens as well when formal is applied to problems that it cannot readily solve. “

The evolution from flops to cones to blocks to systems creates different problems, and they each have their own sweet spot. “Because we have relied on simulation for so long, we have forgotten that as designs become bigger, that formal has reached the sweet spot for block-level and IP verification. Formal verification allows all of the leaders in the industry to put the responsibility back to where it belongs and apply the right tools to the right problem.”

Roger Sabbagh, vice president of applications engineering at Oski, was one of the participants last year when he was with Huawei and has since joined Oski. Architectural verification is one area that is on the cutting edge of formal today. “Does anyone have experience with architectural verification and what is your opinion of it?”

“In the current projects we are developing IP for architectural verification of cache coherence. You cannot do this at the top level because the problem is too complex for formal,” said one participant. “One day we may get there. So you have to decompose the problem. For each component you create an architectural model that concentrates on what you want to verify. You have multiple such components and you stitch them together so that you can prove your IP or the protocol. Then you prove the architectural models against the actual RTL. The final step is to enable the models in simulation. This works and has enabled us to find many architectural and RTL issues. We have also been applying it to other pieces of the system where there is a lot of interaction between different modules. The bugs found were difficult to recreate in simulation.”

Another participant saw architectural verification equating to protocol verification. “You have to get protocols right from the very beginning. It becomes baked into the process and any new protocol that comes along has to go through the rigor of being verified. The tieback to RTL is not clear to us. The checkers get created at the architectural level on the abstract models, but to what extend can those be validated against the RTL?”

There was general consensus that this is one of the areas in which there are difficulties because not everything can get validated against RTL. “The majority of the pieces were written in such a way that we could prove them against the RTL.”

Part of the difficulty is associated with abstraction. The closer the abstract models are to RTL, the easier it becomes. “We push hard with the architects to not think about RTL when they are modeling. If you do, you lose the benefits of abstraction. If you constantly have to worry about RTL you tend to create more RTL-like models and you lose some of the benefits. It is a balancing act.”

The participants had several ways in which they saw solutions, including this one. “We had to chop up the architectural model keeping in mind that we would want to verify against RTL. Making sure that there were clean well-defined interfaces, that the model was broken up into blocks that would be tractable for formal tools and had reasonable Stimulus Constraints and checkers at the periphery of those blocks. In the end we were quite successful. For the portions of logic that were verified, I am not aware of any bugs found in the architecture or in the RTL, so it was highly effective.”

Not everyone was convinced by this or its general applicability. One person concluded that “we cannot push everyone to turn architectural verification into RTL verification. We leverage it when we can, but not make RTL be the ultimate end goal. We are trying to verify at the architecture level to head off design problems later.”

Other participants pushed back. “Closing the loop between the architectural level and RTL is important. The whole goal is to verify that what is produced in silicon meats your system-level requirements. How is that done?”

It appeared that terminology was getting in the way of consensus. “It is confusing to me why we call it architectural verification? We are talking about a system where you are defining a new protocol and you are questioning the validity of the protocol and you don’t have RTL. Does this protocol satisfy my deadlock conditions? If that is the goal, I don’t see where architectural comes into it. If I am developing the next generation of a protocol, why do I care how it gets implemented?”

Some of the disagreement appeared to come back to the distinctions of what architecture means and the abstractions used. The person focused on protocol verification attempts to clarify. “When you are doing architectural verification, you may be trying to work out the right sequence of messages in terms of guaranteeing that properties of the protocol hold. When it comes down to RTL, you can define some of those checkers, but that is more of an implementation check. There is no automatic application of what you verified on the abstract model that gets directly applied. So essentially you carry them forward, but not in an automated refinement manner.”

As the conversation progressed, it seemed as if there were two camps. In one camp were people who were both defining the architecture and implementing it, and in the other camp there was a separation between those who defined the architecture and those who implemented it. This changes how information can flow between the two groups and the models that can be provided.

The first camp posits that “the abstract models, used for architectural verification, model some behavior or feature that must be true in order to meet the specification. You may have made assumptions and then when you implement it, you have to prove the assumption is valid. That closes the loop. If you can prove the RTL meets that, then the overall system-level feature would still be valid.”

The second camp struggled with the gap between the two groups. “It is not that straightforward. You have to refine it. We talked about messages being passed at the abstract level, but at the implementation level you are talking about the queues that hold the messages. There is a huge amount of refinement that goes into them. You can do some refinement manually but it is not an automatic process.”

Sabbagh then steered the conversation to a new issue. “Should we use formal for bug hunting or for signoff? What are the pros and cons?

The first response was “we started to invest in some bug hunting techniques. That has proven to be quite successful. The downside is that there is no clear metric so it is difficult to measure progress.”

The participants had some problems with the definition of what signoff means. One attempted to provide a definition. “It means I am taking responsibility for finding every bug. If you find a bug in sub-system verification or emulation or even silicon, then I expect to be in trouble. It could be a human mistake. It could be that I forget to include a checker, but there had better be an answer.”

They were also divided on the roles of formal and simulation. “In our world, blocks are parts of a larger system and those need to be verified with simulation. So even though you may carve out a piece that can be verified with formal, the fact is that end-to-end simulation still needs to exist. So we would not deprioritize the simulation effort.”

They discussed how to achieve the right balance. “It means there is a place where simulation may be the wrong technique. We still have to do system-level simulation. When holes are found, they need to be addressed. As an example, consider some arbitration logic that sits in the fabric. We may verify this with formal and prove it to be correct. End-to-end simulation needs to exist even though we may have verified the starvation correctness of it. The only part that is taken away from simulation is maybe a few checkers.”

There appeared to be little agreement on this. “Whatever level of hierarchy you draw the line at, be it block- or sub-system- level, you are trying to eliminate bugs. If you find bugs at a higher level, after doing weeks of pseudo- and constrained-random simulation, you should be asking what you can do to improve verification at the lower level. Would you be better off doing block-level simulation or block-level formal verification to find those bugs and get a greater sense of confidence that all bugs had been found?”

“From a project management standpoint, when you say signoff, what can I assume? Wherever formal does a good job, it should be taken advantage of to fully verify that piece completely.”

One participant tried to relate to an IP block. “Think about it in terms of Coverage. You have a piece of IP that is silicon proven. Now you are incorporating that into a system and you will verify that the system works, but you do not need to measure coverage inside the IP. You can black box it and you don’t have to hit all of the corner cases.”

Some participants disagreed with that. “When you put it into the system, you are introducing dependencies which are not visible at the block level.”

“But that means you have a problem with your constraints.”

A fair portion of the discussion revolved around coverage. “The reason one has coverage is to measure the strength of the verification environment. You cannot throw away coverage from a simulation environment because then you wouldn’t know the strength of that environment. Those same coverage events need to stay active in the simulation environment and in the formal environment.”

“To me when we say if it is bug hunting versus signoff, that is a management decision rather than a technical decision. What is the harm for any block if both simulation and formal coexist? Does it have to be one or the other?”

Most participants were sitting on the fence. “When the applicability of formal is questionable, you may swing over to simulation. But when formal is more applicable… In general, if you do formal and find bugs, are you bug hunting or going towards signoff? Probably both.”

There were several objections to this because it implied replication of effort. “You have to work out how to do things with fixed resources. There are cases where formal works very well and yes there is an overlap between simulation and formal, but you don’t have to retarget what formal has covered with simulation. That is overkill.”

One participant attempted to clarify the flow. “It is really unit-level versus system-level verification. We have a general principle that you want to find every bug at the lowest possible level and as early as possible in the project. What we are saying is that formal is strong at unit level. Nobody is arguing that formal at unit-level excuses you from doing system-level validation.”

“You depend on the higher levels of simulation to find problems with the unit level verification!”

The conversation turned full circle when it came back to the issue of signoff. “You are doing bug hunting until there are no bugs. What are you doing at that time? You still have to ask: is it complete enough to be called signoff? So why does it have to one or the other?”

One response was: “It is a continuum. You start off with the goal of signoff and you write some properties and find some bugs and add to that until you have a full property set. But you may run into a complexity issue and…”

“It is a project timeline issue. You don’t want to be in a stage where you stop finding bugs until you complete the investment in the abstractions you are doing. You can push the proof bound further and keep finding bugs deeper in the system.”

The conclusion appeared to be that one emerges into the other. Perhaps the only real difference is the original stated goal and the quality of tracking that goal for completeness.

Related Stories
Whatever Happened To High-Level Synthesis?
What progress has been made in High Level Synthesis and what can we expect in the near future?
Hybrid Simulation Picks Up Steam
Using a combination of simulation and emulation can be beneficial to an SoC design project, but it isn’t always easy.
What progress has been made in High Level Synthesis and what can we expect in the near future?
Formal’s Roadmap (Part 1)
Panelists look at the progress made by formal in the past five years and where we can expect to see new capabilities emerge.
Fault Simulation Reborn
A once indispensable tool is making a comeback for different applications, but problems remain to be solved.
Gaps In The Verification Flow (Part 3)
Panelists discuss software verification, SystemC and future technologies that will help verification keep up.



  • Kev

    As someone who builds simulators, I’m surprised anyone still uses standard Verilog 0/1/X/Z style simulation anymore, it really doesn’t tell you much that you shouldn’t be able to get with formal techniques as far as implementation goes. Simulation time is better spent verifying all the analog effects (power, thermal, EM) that formal can’t tackle.

    Unfortunately, the simulators aren’t very good at mixed-level abstractions, but the EDA companies would rather sell you a lot of simulation licenses rather than one formal one.