So Many Waivers Hiding Issues

Experts at the Table, part 2: Domain crossings can produce thousands of waivers. How does a team put in place a methodology for dealing with them?

popularity

Semiconductor Engineering sat down to discuss problems associated with domain crossings with Alex Gnusin, design verification technologist for Aldec; Pete Hardee, director, product management for Cadence; Joe Hupcey, product manager and verification product technologist for Mentor, a Siemens Business; Sven Beyer, product manager design verification for OneSpin; and Godwin Maben, applications engineering, scientist for Synopsys. Part one can be found here. What follows are excerpts of that discussion.

SemiEngineering roundtable: (Left to Right) Joe Hupcey, product manager and verification product technologist for Mentor, a Siemens Business; Godwin Maben, applications engineering, scientist for Synopsys; Sven Beyer, product manager design verification for OneSpin; Alex Gnusin, design verification technologist for Aldec; and Pete Hardee, director, product management for Cadence.

SE: If development teams are forced to do both structural and functional checks of domain crossings, how should they be coordinated?

Maben: There is information that structural checkers can pass on to the functional checker, such as assertions. There are certain things that can be done formally, but others that cannot. For example, assumptions are the big thing in CDC or RDC checks. Now the question becomes, ‘Can I verify those assumptions formally?’ No. So we have to see how much can be done with static, formal, and pure functional. The less pure functional the better because coverage is difficult, and you cannot guarantee vectors to cover everything. So you want to use structural checks as much as possible, then formal checks, and then the least possible number of checks left to functional simulation. Another component that complicates this is power. The problem with power being added onto to the top is the number of tools in the flow. From RTL to silicon there are about 15 different tools. How do you ensure when you add UPF that every tool in this flow looks at it in the same way? You need some isolation and the simulator says I need 10 isolations. But the synthesizer sees the need for 5 isolations.

Hupcey: I agree with the phases that you mentioned. We start with structural, go to the control layer, and then the data path and actual crossing layer. That can all be statically analyzed. We also have the ability to export metastability models to simulation. There are ways to benefit from this that you may not expect as a waiver validation flow. If you get too aggressive and say that this is the CDC path and I know it is okay for whatever reason, this provides a double check when you inject these models into the regular regression runs to see if something fires. You may not always have chosen the right synchronizer. You may have needed three flip-flops or a grey code or something. Lastly, we are heavily investing in signoff CDC — the post-synthesis, post-route CDC. Things can be perfect at the RT level — statically analyzed, do waver validation, do extra validation with simulation — but synthesis comes along and throws a curve-ball every now and then. Now there are buffers that weren’t there before. Now there is a test domain. It is usually out of the way, but there are times when it does get in the way.

Hardee: Test logic is among the biggest vulnerabilities for security leaks, as well.

Gnusin: There are different techniques to find CDC issues, but the question is, ‘Can you provide a complete CDC solution?’ I still believe that from a metastability perspective, we have a near-complete solution, but for non-determinism it is a tricky issue. Even if you generate assertions, assertions and checks are only as good as your test stimulus. Your assumptions may not be ideal, so you cannot guarantee ideal assumptions or complete stimulus – especially if you go to gate-level simulation. So, can we provide a complete CDC solution? Maybe we can think about a solution combining simulation, formal and also hardware verification. I call this CDC amplifiers. It is basically random injection of delay into CDC to mimic what happens in hardware. This gets as close as possible because we need to prove good results for mission-critical companies. CDC is a statistical problem.

Maben: Another thing that is critical is that most of the time the design — which has the CDC, RDC, where all of these checks are done — the number of violations seen by the users and the number of waivers that are being written by the users is too many. The challenge is, if you look at CDC, RDC itself, there are solid solutions that exist that can get you to where you want. But to get there I have seen cases where there are 300,000 waivers that are being used. That needs to be avoided. Typically, in any CDC, RDC or even power, we do not recommend waivers.

SE: Are you suggesting that there is something that can be done about it?

Maben: If an architect or designer can provide more information, such as qualifiers, rather than just having an SDC and UPF, there is a lot more information that an architect can provide. This tells you that these crossings should not be worried about, based on the architect’s experience. The guy who architects the design is not the same guy who is running these checks. But more information can be passed down about the architecture in terms of qualifiers. Then the number of waivers can be reduced. Waivers are the killer. Many times, in those hundreds of thousands of waivers, there will be one that could get you. Minimizing waiver and with the power domain on top of it, the waivers will increase. Today we have more than 10 to 15 reset domains. Then, each power domain has their the own reset mechanism that is used after you power down and then back up. If you have zero retention flops, then you will clamp reset and clock lines. Clocks lines are okay because they are synchronous, but reset cannot be clamped. That leads to other problems. We have to look to see if we are adding the necessary technology. Did it impact my RDC? Do I need to do something? We are looking at a platform where we have different engines, all used to solve the problem.

Hardee: I agree with the vast majority of what you said, but not all. When you talk about wavers — absolutely. Too much noise, too many violations, which too easily gets waived is definitely a big problem on designs of any size and complexity. How the architect can define which are not a problem has to be a waiver. That waiver has to be recognized as a distinctly different waiver to the waivers that may happen from the implementation people, but it has to be persistent and it has to stay with the IP block to which it applied so that it is not resurfacing again and again as a new problem. The mechanism inside of CDC tools is waivers. Waivers are inevitable. The key, where we can use formal technology, is not just in the functional checks after the structural checks, but sorting out and filtering, grouping the violations, and actually presenting them in a way that is a more meaningful analysis. We are also creating assertions to see if certain kinds of violations are a real problem or not. We then have autowaiving technology for some things. For example, in CDC, if I have a quasi-static signal, is that signal static or does it change? We can create an assertion for that, and if the assertion passes, then fine. You can waive it. If it doesn’t, then you have a violation. That is how we are now combining formal technology to improve the structural analysis.

SE: It seems as if something is missing here in the methodology—the notion of expressing intent all the way through the design flow. We have that with UPF coming from the bottom up, but it appears as if these waivers are being used for documentation and would appear to be the wrong mechanism.


Hupcey: There is a little bit of the tail wagging the dog, but it is different for each customer. Some customers want a specification-based flow where, in addition to constraints files, you specify the number of clocks and you can provide more information up front. The objective is to reduce the noise on the back end. But some customers do not trust themselves, and they want the tool to automatically report that back and to say there are only N clock domains. And when the tools discovers N+5, if it is true, then it is a bug where there was some kind of split. So some customers want to specify everything up front and other want to do a back-end waiver process and audit. We have mechanisms where one individual can waive it, but that goes into a queue and someone else has to sign off on it. Some customers that are super risk-averse prefer that even though it creates a lot of review. There is an automotive customer that does both. There are some customers who are still in denial. We see CDC coming into the high-end FPGA world and they are realizing that it is not a static timing issue, it is CDC, and we need to find this in 20 minutes and not spend 3 weeks in the lab chasing a CDC bug. Customers need to think about this and their preferred approach. Should they be more proactive and specify things up front, or should they do both? It highlights the fact that you do need to consider these domains — especially with low power that can bring you surprises. So, I agree, waivers are no fun for people, but some prefer that flow and some prefer a specification.

Maben: One customer says they want zero waivers. That is what they want for signoff. To get there, we have to change the intent. Is it possible? Yes. But you need to get to the point where the intent is becoming more complex. I provide more and more information so that when I get the information at the end, I have no waivers and everything is clean. Clean means that they are categorized into information, warning, error, etc. The goal is no warning and no error, but information is okay. So, it is possible to get there? Yes, but it means that intent become huge. If a third person, if the chip moves on from one to the next, the portability of that intent is unreadable.

SE: Is that particularly a problem when you bring in something like test? How does that affect everything that came before? 


Hupcey: The only customers that I have seen be successful with that strict are Mil/Aero customers who tend to have longer project lifecycles, less turnover, and are able to enforce that kind of design discipline to get there. But there are so many other factors that chip away at that. There are very few customers that pull it off. As the IP gets re-used, 5 or 10 years into the future the implementation changes, the built-in self-test (BiST) that gets inserted – I don’t know how that works out.

Maben: The notion of zero waivers was okay while there was no power domain. As UPF came into the picture, they could not live with zero errors. There is no way and that is a big change. That is where power, reset and clock concurrently is getting much more complex and cannot express everything, the interaction between all of them – that is why it is much more complex to solve this.

Gnusin: Another question is what we can do in terms of the whole industry. How can we help the customers be more productive with CDC issues? I can think of about two proposals. The first is a CDC library—a library of cells together with checkers. We are talking about simple synchronizers, pulse transmission and data transmission, etc. We can think about such a library. It will also facilitate the task of lint or static checking because whenever we find this cell across domains, we can say that we know it is a good design. We talked about hierarchical CDC, so what we do now is basically CDC formal. Synopsys has their constraints format, which is not bad for static clock tree, but we need to look at dynamic. For example, transceivers from Xilinx have a max of 16 variations about how you can connect clock trees. So you have to provide SDC files depending upon your static mode. A new CDC format for the industry to be able to encapsulate IP for CDC analysis.

Hardee: We are seeing libraries of synchronizers available in certain ASIC libraries and also in FPGA vendor libraries and I also have multiple customers who create their own. They have very carefully laid out a trusted set of synchronizers. We have to ensure that the tool recognizes and does not mess with them.

Hupcey: That is probably the fastest-growing requirement as more and more customers are going to custom synchronizers that have been blessed, and may even use a flip-flop mechanism that has different characteristics. It is a double-edged sword. It helps us as a vendor to be able to integrate it into our flows, but if there is an issue, then that can be challenging to communicate back to the customer.