Experts At The Table: The Growing Signoff Headache

First of three parts: Defining signoff and its problems; the impact of multiple power islands and millions of transistors; static timing analysis; the pros and cons of margin; how EDA companies choose their R&D targets.


By Ed Sperling
Low-Power/High-Performance Engineering sat down to discuss signoff issues with Rob Aitken, an ARM fellow; Sumbal Rafiq, director of engineering at Applied Micro; Ruben Molina, product marketing director for timing signoff at Cadence; Carey Robertson, director of product marketing for Calibre extraction at Mentor Graphics; and Robert Hoogenstryd, senior director of marketing for signoff at Synopsys. What follows are excerpts of that conversation.

LPHP: What exactly is signoff, and where are the problems?
Aitken: Signoff is when the design team has committed to themselves that their design will yield at whatever PPA (power/performance/area) target that they have—and that’s mostly a power and performance target these days. So when do you say you’re done? And what other assumptions lurk in there, not just in the tool flow? What else are you thinking when you say, ‘This chip has passed and now it’s time to sign it off?’
Rafiq: Signoff means completing all checks with respect to targets set forth before submitting the chip database to the foundry. This involves meeting timing targets with corresponding on chip variation and flat margins in regards to timing; meeting and verifying foundry design rules and recommendations; voltage island implementation and verification; static, dynamic drop and EM rules targets, and full chip IP integration checks. Signoff also means your physical checks meet your criteria, and your power targets, speed and area meet your goals.
Molina: Signoff is the last time you touch a design before you can’t do anything more to it. You have to make sure it meets all the timing, power and manufacturing requirements. Whatever you’re doing has to be accurate and reliable, because you don’t get a second chance. And if you have to fix something after tapeout, it’s going to cost you a lot of money.
Robertson: For me, signoff usually involves verification. You have foundries, IP, and the actual design itself. The definition is pretty clear. It passes DRC and reliability checks. Extending that definition into the area of electrical is whether you’re meeting the specs you set out for yourself on timing, signal integrity and power. There is an agreement that this is the last time you’ll get to address it.
Hoogenstryd: There’s functional signoff, as well. In some industries there are things like EMI signoff, because it’s important to their business. But the bottom line is it’s whatever it takes you to be successful in the marketplace with products that yield and meet the cost objectives.

LPHP: We’re seeing chips with multiple power islands, voltages, processors, modes of operation and hundreds of millions of transistors. How does this affect signoff?
Aitken: The complexity is highlighting a couple of things that were hidden before. Signoff has always included a number of assumptions about how the silicon behaves relative to the models. Static timing analysis makes approximations that simplifies the tools’ understanding of what’s going on. As we add more stuff, a couple things happen. One is that the errors inherent in these assumptions are becoming better known and more obvious. Second, the general pressure to improve power or performance means we have to push harder on margins, and that means understanding those margins. There’s a need to find out where the margins are, and push them a little bit. There’s also the observation that when you start switching device types, such as changing bulk devices to finFETs, you realize you don’t fully understand those margins.
Rafiq: As we move down in technology nodes from 65 to 40 to 28nm, and then to 20/16nm finFETs, it’s all about whether you have a product you can sell that is built at the right node for the required application. We are certainly challenged with design complexities such as more modes of operation, variations of low power implementation techniques such as multi-vt, clock gating, multi-supply voltage, power shut off, dynamic and frequency scaling, multi-GHz speed targets requiring innovative clock tree implementation techniques and not fully understood process variations. All of this ultimately is tied to yield and therefore added signoff margins are needed to overcome the uncertainties or unknowns. Yield is how many good chips you can get of a given wafer. Margin is relative and can be somewhat tied to binning in regards to speed and power. You put in more margin, you spend more time. You put in less margin, you spend less time. It’s the implementation time and time to market that becomes a major factor for engineering tradeoffs. And the more sigmas you target, the more time is spent. You can somewhat compare it to the capacitive charging curve. You’re charging the first 65% with 1 RC constant. With 3 RC constants you get to 95%, which is essentially 2 sigma. And then you spend significant amount of time, i.e., 5 RC to get from 95% to 99+%. It’s important to keep time-to-market in perspective. You can decide how much yield you can afford weighed against time-to-market and that is the engineering and marketing trade-off in regards to sign off margins. The tradeoff objective is to get rid of the overly pessimistic margin against extra implementation effort. There is extra time in characterizing the libraries for extreme corners. We also know how much pain there is in modeling. So the bottom line is it’s important to figure out what your goal is, how much time you have, and how much you want to spend on that last +0.25% of the yield with respect to time to market.
Aitken: But you don’t really know what the sigma is, especially for 28nm and below.
Molina: This is all true, but for every customer I talk with they all have their own secret sauce for how they come up with OCV margin values, or how they account for process variation, temperature, tool error, and other things. What we can do as a vendor is try to address the time-to-market aspect of this. If you had an infinite amount of time, you could design anything the way you want and make it perfect. But customers are under a lot of market pressure to get their designs out as quickly as possible. Because of that, they have to make compromises and they have to be pessimistic in order to speed things up. With hundreds of millions of transistors, everyone is just trying to get through design closure in a reasonable amount of time.

LPHP: Have the tools kept up?
Hoogenstryd: From an EDA vendor perspective, of course we’d always say we’re keeping up. Our customers would say we’re not keeping up fast enough. It’s a continual race on their side. They’re expected to do more stuff in less time, so we focus on the fundamentals of performance and capacity and take advantage of the latest hardware, as well as coming up with other approaches to solve additional problems. The other challenge we face, particularly with advanced nodes, is we’re being asked by the foundries to experiment along with them. The challenge there is always trying to figure out where is the right place to make the investment because you need to have a return. What looks like a first-order effect may turn into a third- or fourth-order effect as the process matures. As EDA vendors, we have to figure out how we work with the foundries on the right things that will pay off. An example of this is DPT (double patterning technology) as regards to STA (static timing analysis). There was a thought we were going to have to model this inside of STA with a multi-valued extraction. As 20nm matured, it became clear that a more practical solution was to margin that behavior. As a result, you didn’t impact the throughput in the tool chain with more sophisticated models.
Robertson: You talked about complexity from a design and a manufacturing perspective. There are two issues here. First, can we address that complexity? And second, can we address that complexity with existing paradigms. If you look at the last 10 years of electrical signoff, that’s certainly much more than just timing. Timing alone wasn’t enough, so we did signal integrity and power analysis, and now there’s discussion about electromigration. In certain cases, we’re trying to address double patterning and smart corners with existing paradigms. With others, current density and electromigration issues will not be supported by timing, signal integrity and existing power solutions. We’ll have to create new ones. We are reacting to it.
Molina: What we’re hearing is that customers don’t want the extra margin for some of these effects because they’re already getting killed with OCV margin values. They want a solution that represents what’s happening in the silicon, not something that makes it easy for us vendors to provide a result. That doesn’t give them the performance they need. The challenge for the EDA industry is to provide accurate results without compromising performance.
Rafiq: At the tail end of the process, when we are spending significant amount of time optimizing to close the last picoseconds of the margin portion of speeds, we are limited by time. You can leave yourself more room for that if you address certain issues upstream. One of the main pain points is a good clock tree development, which is tied to a good SDC. Most of the time is spent on the front end and the back end, and there is a limited amount of time in between where constraints are being developed. If you can accelerate the constraint writing portion putting extra effort on the quality, you can leave yourself more time toward the end for timing closure and lowering the power by recovering via positive slack paths. That’s an area that needs more attention by EDA—how you write those constraints, how you automate the process and how you streamline it. Constraint writing is boring and mostly manual effort, and it’s something we need automate  and address through the tools. The guys writing the constraints are not the same guys developing the clock tree during physical implementation. We need to take the human element out of this as much as possible. Faster implementation of the clock tree with lower power is also something we’d like to be addressed.
Aitken: The better your validation/verification strategy is, the more likely it is that you’ll have a good design when you go into the whole timing closure aspect.
Rafiq: And more understanding about how efficiently you can implement it.
Aitken: Where you wind up with challenges is when you get close to the edge. ARM is in an odd position with this. We internally make sure we can close designs for our processors, but we have to do that in a way that is believable by our customer base. So if we say that we’ve designed it at 2GHz, we have to be able to demonstrate exactly what that means. And simultaneously we have to support a number of customer verification methodologies, all of which contain a secret sauce. A lot of the secret sauce is really, either explicitly or not, a means of dealing with the statistics or the variability involved in manufacturing. So if we look at the history of this, we went from plain vanilla timing to SSTA. Everyone said SSTA was wonderful. But it wasn’t wonderful enough. It wasn’t worth all the work it required to produce all the data it needed. There have been a variety of things that have tried to approximate it. OCV is one. AOCV is another. Now there’s POCV and SOCV. From a tools standpoint, it’s good the tools and methodologies are trying to get closer to dealing with all of these statistical weeds in a manageable way. Our perspective, as an IP vendor, is we want to constrain this so we don’t have three vendors and five customers inventing a solution.