Experts at the table, part 1: Today’s approach to clock domain crossing; the importance of architecture; evolving CDC methodologies; the designer’s perspective.
Semiconductor Engineering sat down to discuss where the industry stands on clock domain crossing with Charlie Janac, CEO of Arteris; Shaker Sarwary, VP of Formal Verification Products at Atrenta; Pranav Ashar, CTO at Real Intent; and Namit Gupta, CAE, Verification Group at Synopsys. What follows are excerpts of that conversation.
SE: While not a new aspect of design, clock domain crossing is a big deal today and is one of the biggest reasons for respins. It seems to have reached a point of critical mass. Where are we at with CDC today?
Ashar: It hasn’t happened suddenly. It’s taken about a decade for it to come to the point where it has become signoff criteria. The reality is that it is now at a critical mass and is a signoff requirement. You cannot take shortcuts.
Sarwary: The problem has existed; it’s not new. As soon as we don’t time some path there’s an event that would propagate through that path, potentially you can violate timing and you have testability or glitch issues and so forth. The fact that it’s becoming more important; it has been gradually becoming more important, number one, because of the size of design, growing number of clocks and so forth, but also the explosion in terms of the number of power domains increasing and actually, verification of it has grown with it gradually. Both Real Intent and Atrenta have been in this space for probably 10 years. I think the original way people were looking into it and trying to verify it was simply taking a Prime Time report of all the paths that are not timed and just looking through and saying, ‘This is ok, this is ok…’ but over time, people realize that this is not possible…solutions have evolved. I do not think this is a problem which is solved, so challenges remain.
Gupta: I have a slightly different view on this because we announced a CDC product (http://news.synopsys.com/2014-06-03-Synopsys-Bridges-Design-and-Verification-with-Next-Generation-Static-and-Formal-Technology-for-Verification-Compiler). In the last couple of years this problem is burning actually, I must say. The reason is that as the technology nodes are shrinking, they are going below 20nm, more and more functionalities people are packing into the SoCs and most of the functionalities they don’t own — they are using third-party IPs. So, the sum of all that — and I do agree with Shaker that explosion of the clocks and the number of clock domains — they do not have a very clear idea of whether we are good from a CDC perspective or not. Now, if you’re not good, your silicon is at risk and you know that the competitive situation today is that if a phone is coming two months early, it can tap billions of dollars. So shrinking time to market has basically made the situation very competitive. And that’s why people are saying they wanted to see which one tool, or mechanism or methodology can help in achieving a CDC closure very fast because it’s a burning thing and they cannot invest time in closing CDC for a long verification cycle.
Janac: We’ve used clock domain crossings for a long time in the interconnect business, and those things have been used since 2006. You have to be very, very practical in how you implement them because the testing infrastructure has a hard time dealing with timing uncertainty. There are all kinds of technologies you can deploy: clock forwarding, fully asynchronous operation, etc., but if you aren’t careful you can cause problems so you have to verify it at the unit level but if you have a very practical architectural approach for clock domain crossing, you avoid the customer from getting into problems particularly in a time when they have to close timing or they have to test the resulting device.
Ashar: We would agree the three big reasons for this problem becoming important are 1) the presence of GALS (globally asynchronous, locally synchronous) on the chip: The fact that you cannot send a clock or signal from one corner to the other corner of the chip in a CDC clock cycle and you need buffering, handshaking in between. That’s where the on-chip interconnect comes in. 2) Diverse IP. 3) Power optimization and the large number of power domains that come with their own clock frequency adjustments, clock gating and so on. The combination of all these has created a situation where the clock domain problem has gone over the threshold to something that is a signoff issue.
Janac: If you have the right clock domain crossing architecture and you verify it at the unit level, and you document it so that the customer doesn’t exceed the parameters of the clock domain crossings, you should be okay. Yes, there’s a lot more power domains because you can’t afford to keep these high power subsystems running for very long particularly a CPU and a GPU subsystem, but we haven’t seen that many problems.
Sarwary: To your point, the visibility has definitely risen lately a lot and to a point where it’s no longer to just contrast to what Namit was saying earlier is to get to closure fast, it’s no longer a question of getting to that closure, it’s a question of getting to a correct design fast. The reason for that is because of the traditional view of clock domain crossings verification through simple review and maybe some waving of issues, there is always an overlook of a problem that eventually falls into the silicon and you take shortcuts – you will get hit by it, sometimes downstream. I think people now realize that and are much more serious in really going overboard to sign it off with criterions much more strict than they were doing two or five years ago.
SE: How does that methodology look today?
Sarwary: Various companies, various design groups, even various projects have their flow set up to go through certain verification to implementation and you cannot come and go completely vertical to that and say now you change your flow, and this is the way you would do it.
Gupta: It’s a very important point about design correctness time because time is the key. If you see designers out there, they have very little time actually to spend on — they have to do multiple tasks, and CDC is one such task. If a tool is asking them to become expert in the tool, that’s not what they want basically. They want to become more expert on their design side actually. One of the key challenges is the setup. How do you drive CDC? They don’t want to learn a tool. They just want to push a button and see the CDC results. Then they want to focus on their design to fix those problems. For example, the way we see it, if your design is up and running in the Synopsys tool, you can just push the button to get the CDC results. Why do you have to re-do the setup in a different format? You should re-use and get to the results quickly. The time to fix the violation will be the designer time — that is a key thing that no one else can do. Only the designer can do it.
SE: Do all designers want to be able to customize the tools for themselves? Should they be able to?
Gupta: Currently, the noise problem in CDC is mainly that the tools are oriented on certain designs and certain assumptions. One company says they care about design styles, they don’t care about other people’s design style. Can they exactly change or tweak the tool so the tool understands their design style? This is one of the key features that we have introduced and which we believe will change the game. Rather than providing a lot of options for the designers to play with, they can configure the tool one time based on their design style and whenever the results come out, it will be matching their standard.
Ashar: Sometimes it makes sense to have some degree of control in terms of how to drive the tool. Real Intent allows that to a small extent in the context of the design that is being checked. You have to be careful there that you don’t deviate from the basic principles that the tool is based on. One of the reasons why I think Real Intent has been successful is that the tool is based on an understanding of the basic failure mode involved in CDC, and whatever analysis it does of the design in the CDC context is driven out of that. Basically, the transfer of data in a clock domain crossing is, at some level, a simple protocol. What all these tools are trying to do is to understand what that protocol is. It needs a little bit of seed to grow an understanding of the implementation details of the protocol that has been implemented in the design. You have to be careful that you don’t give too much choice and too much control that the tool oscillates. My research advisor at Berkeley used to say that sometimes it is good to give the user a freedom from choice rather than the freedom of choice.
Gupta: It’s a choice between the waivers and the configuration. I always recommend all of my customers to choose the configuration because one waiver can kill the whole chip if not done correctly actually. Whereas configuration, they always paying a lot of time, a lot of cycles, a lot of thought actually and then they configure the tool the right way.
SE: What is the designer’s perspective here?
Sarwary: Taking a designer perspective on this, we see three types of users – people that are on a daily basis dealing with verifying and signing off CDC verification.
There are three categories. First, are traditional CDC verification users who use a variety of tools to go through extracting crossings, doing some rudimentary checks on those crossings, whether it is tweaking an option to recognize something or based on global scheme of synchronization — but at the end of the day, you’re still doing tweaking. At the end of the day, you’re still dependent on what you set up and at the end, you will still end up with some failures slipping through silicon.
There are two new categories of users that are emerging that we’re seeing out of our 150 or so companies that are using Spyglass CDC. These two categories are looking from two completely different angles. Why they are looking is because they have gone through this process of tweaking, making a mistake either in their setup or in their tweaking or setting up a waiver.
There is one company that is saying, you know what, I have my synchronizers. I verified them formally. I put an expert on it for months to just completely sign it off, absolutely working perfectly well. I do not want you to come back and tell me this is bad, so tweak this option so that I can do this. At the same time, I do not want to take any risk. I want to tell that this FIFO or handshake protocol will work specifically under these conditions. If you honor it, it is 100% guaranteed this design will work. So they adopted this methodology, which is gradually making the design correct by verifying very too early at the very small granularity and then raising to a scheme that is validated and is working. Obviously some effort involved, but this is their choice — they have done it. They are actually saying through their testimony that they get better quality.
The third category are users of tools that say by doing all the tweaking and by setting up all the knobs possible, I’m still having slips into my silicon. What I want to do is this intent that you are looking into is not a structural intent. There is functionality behind the protocols that Pranav was referring to, and that functionality must be verified and must be correct in order for this whole synchronizer to be correct. So what they are going after is a not only tweaking the checks or the products, but also capturing any assumption that they are making about the design while they are verifying it globally and those assumptions must be validated functionally either through formal tools or through simulation — dynamic or static type of verification.