Experts at the table, final of three parts: CDC verification setup issues; debug; use models; methodologies; correlating with silicon.
Semiconductor Engineering sat down to discuss where the industry stands on clock domain crossing with Charlie Janac, CEO of Arteris; Shaker Sarwary, VP of Formal Verification Products at Atrenta; Pranav Ashar, CTO at Real Intent; and Namit Gupta, CAE, Verification Group at Synopsys. What follows are excerpts of that conversation. For Part One, click here. For Part Two, click here.
SE: What is the impact of time of market pressure on CDC issues?
Gupta: We have to see from a customer perspective — they have less time to market, the designer has very little time. I’ll keep on reinforcing this point because that was given as a requirement to us, why we chose to development a CDC solution. [The customer] said, ‘We have all the setup, why can’t we just run a CDC verification from you? Why do we have to create another setup? Why do we have to go through this limitation of all the methodology limitation for the IP handoff or the hierarchical flows, and all those kinds. Why can’t we just do the complete verification?’ The third very important piece is the debug. When the results are there, how can I make decisions based on the results? Debug has to be seamless… All of the native commands of the implementation tools are available…Then, can someone tell the designer when the violation comes out where the root cause is? They need the tool to be able to say, here is your problem and here is how you can debug it.
Janac: Basically, the semiconductor is not uniform so you have things that are mission critical: you have the automotive, the avionics where the verification has to be done at multiple levels. There’s all kinds of design safety features in these kinds of systems. Then you have things like mobility where you have to do very careful verification not because it’s mission critical but because you’re dealing with so much volume that the cost is incredible. Then you have things like the DVD chips which basically have become consumables. There are some customers that don’t verify their chips at all. One of our customers just uses the FlexVerifier tool. The way the FlexVerifier tool works, they just ship and they don’t test and it’s basically a very low cost methodology of getting a chip out and they view the chip as a disposable thing because it goes into a system that’s so cheap, if it doesn’t work, the consumer sends in a warranty card and picks up another one. So you have to really be careful about saying that the designer has no time to market, no time to do verification because in automotive, he has plenty of time.
Gupta: The designer has time but he has negative time.
Janac: He has tons of time. Basically, you have mobility where you have design cycles shrinking. The poorest performers of time to market are about 24 months. The best performers are about 12 months and people can now get derivatives out in less than that. For something of that complexity, that’s extremely good performance. On the other hand, you’re dealing with a car, which is a four year design cycle where the chip is going to be sitting there for a year before it ever goes into a car — there is plenty of time.
Ashar: With a car you’ll have a variety of chips. Now, if it’s a DVD chip, yes, I can forgive a bug — maybe some pixels won’t be correct and I can go until the next rev — but if the chip is a cruise control chip or an anti-lock braking system chip then you don’t want it to have any mistakes in it the first time.
Sarwary: It goes again to the fact that you cannot dictate various designs, various projects where it’s used, various companies, they have their methodologies and they know it better than us — they have been doing this for years. Sometimes they haven’t used a CDC tool and the key is to have an adaptable methodology that can satisfy one or the other use model.
SE: Is it possible to have just one use model for CDC verification?
Janac: Everybody’s got different situations, different economics, different cultures and other stuff. Back to the car, verification is not enough in a car particularly you go from infotainment, which is basically an application processor repurposed, all the way to the engine control and the brake system which are completely mission critical and there you need functional safety features that if the verification isn’t perfect, there’s fault tolerances inside the chip that keep it functional. You have this whole range … the reality of the engine control/anti-lock brake verification engineer is very different than the one who does the automotive infortainment or the one who is doing a disposable DVD chip. Their reality and their economics and their life is very different.
Ashar: The fault tolerance is something that you desire in the system but it’s not always the case that it’s implemented…The recent situation with GM cars where the ignition turns off and as a result the airbag doesn’t deploy, and stuff like that, is a situation of feature interaction and the fault tolerance not being thought through. Yes, maybe there are some fault tolerances, maybe there are some backups and so on, but you still don’t want the brake system chip to fail. You want to spend all the time you can to make sure [it works].
Janac: I agree that the verification is critical, most critical on this segment but it’s not enough. So, if it’s mission critical there’s some design features that have to be put in but the verification — imagine you’re on the verification engineer on that ignition problem; that can ruin people’s lives because somebody says, ‘Hey, why don’t you check that domain crossing on that ignition switch,’ so verification is very, very critical, but it’s only a segment of the automotive industry.
Ashar: That’s the automotive industry, then let’s say you talk about SoCs that go into phones, the volumes are huge and the stakes are high — if you make a mistake the first time around and miss your Christmas market. It can bring down a company.
SE: Beyond automotive, what is another example of this?
Sarwary: One example is a military application. We always see people putting a double flop synchronizer. Rarely three flop synchronizer. But listen to this. This military application, they were putting seven flops back to back. I couldn’t even imagine. Three, OK. Four? Five? I never heard anybody else talking about four, five, six but this this one guy was talking about seven of them that he put back to back. MTBF of trillion trillion trillion years that they wanted to guarantee–doesn’t matter latency wasn’t an issue, anything else they just wanted to end. What is important is that you must have a sign-off criterion that can be as strict as this application of seven flops requirement to something that somebody has putting an SoC in place and he has a time to market and he has those IPs that are not ready; we have to enable them to start their verification with a half-cooked design, if needed, and still get value out of it. To me, it’s absolutely clear that we need various methodologies, not a single one.
Gupta: The key thing again is that when we say the designer has enough time — believe me, I have two faces: one is the R&D side, another one I sit with customers (I’m a CAE, ); they literally do not have time. The reason is that CDC is one small piece of the whole puzzle. They have to deal with power, they have to deal with formal verification, CDC, so many things on simulation, this and that — on top of any problems downstream on the netlist, layout, post layout. Everything comes back to the designer, . And their primary job is design so this is on top of all of this. CDC gets very little window and if I tell them that there is a tool where you don’t have to do a setup, you don’t need to learn the tool, and the debugger will pinpoint, they will jump on it just for the fact of time — the critical piece. But the key thing…are the multiple CDC checkers [on the market] checking silicon? When we think about it, what is silicon? Silicon is what is produced by implementation tool. Are we checking the same RTL? Can we guarantee that we are not missing a single asynchronous path? I challenge this table, because we do correlate with Design Compiler that we are 100% matching in terms of crossing because it is extremely important for a new tool like us — people ask to tell them about the sign-off like during the same mindset. The critical piece is what is sign-off? What is silicon? We can prove it .
Ashar: The Real Intent tool can be run at the RT level, as well as the netlist level. , tools like Design Compiler can introduce problems in the netlist even when the bugs tend to exist in the RTL, and things like glitches and so on can happen as a result of the synthesis optimizations and Real Intent is designed to handle those things of things at the netlist level. We believe in progressive refinement of the sign-off confidence and we start with a structural sign-off, which basically makes sure that the data is controlled by a properly synchronized control signal, that there is a semblance of protocol there that is implementing this crossing. The second level is to apply formal analysis techniques to make sure whatever protocol seems to be implemented is working correctly, functionally. The checks get fed into a formal analysis engine that then makes sure sequentially everything is working correctly. The third step would be to glue this same structural formal analysis at the netlist level post synthesis.
Gupta: How do you correlate that you’re not missing anything in the silicon ? That is the key piece.
Sarwary: We are talking about a niche area. This crossing ins’t synchronized. What about convergence problems? What are you looking into in convergence? Who is reviewing and verifying that? In our view, the complete CDC verification, even glitches on clock trees, reset trees, you name it, a variety of problems are causing CDC issues or glitch problems or meta-stability, it’s a flow.
Janac: Correlation to Design Compiler is just fine but that’s the ultimate verification — that phone. Because basically if you’ve got your domain crossings right and that phone works, it’s got no problems, it passes all the tests at the system level, that is the true validation.