Experts At The Table: The Trouble With Corners

First of three parts: Shades of variation, limits of existing approaches in design, stop-gap techniques, optimize now vs. fix later.

popularity

By Ed Sperling
Low-Power Engineering sat down to discuss corners with PV Srinivas, senior director of engineering at Mentor Graphics; Dipesh Patel, vice president of engineering for physical IP at ARM; Lisa Minwell, director of technical marketing at Virage Logic; and Jim McCanny, CEO of Altos Design Automation. What follows are excerpts of that conversation.

LPE: Corners appears to be getting more challenging as move to advanced nodes. What are the main problems?
Srinivas: When you talk about corners you need to talk about variation. There can be three kinds of variation. One is operational variation, which includes different modes of operation. Your iPhone may be taking a picture, it could be in standby mode, or you could be talking on the phone. To tackle these issues you don’t need corners. You can deal with different timing constraints in different modes. The second is global variation. This is variation from fab to fab or chip to chip, but it’s identical variation. These are typically handled by corners. They could be slow-slow corners, fast-fast corners, or a mix and match. The third is on-chip variation (OCV), or local variation. These are variations in transistor length or Vt variations on the chip. These are the most difficult to model and the least understood. One progression is advanced OCV where you try to model random variation component by means of a state-dependent OCV. That’s where we see statistical modeling might help.
McCanny: The key thing driving the need for corners is process variability. Managing that is very difficult as you go down into these new process technologies. Global variation and local variation are problems. There’s also the need to address power and voltage scaling. Some of the corners we create are a way to bring accuracy to the analysis. What is my timing if my voltage is at this level? And what if I scale my voltage? People are creating additional corners just to deal with the voltage scaling issue, and maybe that’s a problem. We just keep using the same technology to scale. We’ve seen people looking at 60-plus corners. That’s already unmanageable. And then the analysis tools have to deal with all of this. How do you optimize a design when you have so many choices? If we keep using the same technology at 28nm and below, it’s going to break.
Minwell: From an IP perspective there are customers who require a lot of PVT (process, voltage and temperature) corners for all of those reasons. But early in the technology node there’s also the accuracy of the model that you’re using. For us, we have to use other methods to predict where the process will land and set our architecture that way, and to be able to provide the proper estimates for our customers and their designs. There are other challenges that come from the technology that lead up to statistical timing analysis and Monte Carlo analysis. We’re also seeing an increase in PVTs.
Patel: There are two sides to corners. One is application-driven, which is the end user need. Low-power drives a lot of decisions where people end up partitioning their chip into multiple voltage domains. That requires a different set of corners for the voltage domains, and it’s the first set of partitioning that happens. What does the SoC partition look like and how many voltage domains do you have, and do you need to sign off the voltage domains in a different manner? That’s the first level. The second level is what happens in the voltage domains. That’s where the variability effects come into play. In the past, we used to sign off with three or four corners per voltage domain. As the variability has increased and the mechanics of dealing with that have become more sophisticated, the number of corners per voltage domain is going up. People are now requiring five to seven corners per domain. And early in the process, you don’t know where that process will end up so you need more margin in the design.

LPE: Is it necessary to deal with every corner, particularly in a consumer electronics device where a failure is not life-threatening?
Patel: Consumer devices have less stringent requirements than a missile guidance system. But the issue then becomes the places that a device can go into. The same device can be sold in Finland where the winter gets cold or on the equator, so the extremes in temperature have to be taken into account. There is a basic set of sign-off criteria that everyone has to comply with.
Srinivas: The problem is that models are so simple that if the temperature goes up by 10 degrees your delay will increase by 10% according to the model. But devices don’t behave linearly like that. There are temperature inversion problems when the mobility of the electrons changes, for example. This kind of non-linear behavior complicates the modeling. Without understanding how the different parameters behave under different operating conditions it’s very difficult to create a generic model.

LPE: So the model needs to be more granular to deal with this?
Srinivas: That’s right.
McCanny: You have to deal with the extremes, but when you’re trying to optimize your design across very extreme conditions you’re going to have a design that isn’t competitive. We’re seeing people creating corners for optimization and a different set of corners for signoff so they can optimize around the sweet spot of where their product is going to be sold. They’re probably going to sell more phones in America than in Finland. But when it comes to signoff they have a different set of criteria. That adds to the massive amount of data and work that’s needed to get the design done. From the time a new process becomes mature enough and they need libraries, we have technology to create those libraries quickly. But there’s a point where you can’t keep up. We need to look at this differently as an industry. Can we be more sophisticated in the tools we use to look at things like voltage variation in a more accurate way without requiring different corners for voltage domains?
Patel: We have also seen is a move to deal with the least amount of optimization and deal with corners in signoff. There is a downside to that. People spend a lot more time doing ECO (engineering change orders) at the back end. If I simplify my design with optimization I have to spend eight weeks in the ECO process. It’s a chicken and egg problem. If you don’t see all these things in the beginning, then you keep adding work on the back end.

LPE: There’s a whole separate movement toward design for variability. How does that affect corners?
Minwell: There are various design techniques that may be used to adjust sensitive circuitry to the variability within the die. Those design techniques are being incorporated so that as the chips are tested and the variability is understood, they’re able to recover process variation. Design techniques also play a role in variability and the loss of predictability in the model. There are things the EDA world can do. There are things the foundries can do. And from a design perspective, there are techniques that can be incorporated, as well. But it definitely is more complex for all of us. Being able to provide reliable IP is really important.
Patel: Going to design for variability, that’s a key issue we will have to resolve in the next few years—especially at 22nm and 20nm. At 28nm, we have managed to use some of our existing techniques and gotten away with it. The variability is there but we know how to deal with it. For 22nm, where the variability will be even worse, the solutions you can provide are many-fold. It’s what can you do at the physical IP level. What sort of sophisticated strategies can you use to design physical IP? There also are techniques we can use at the architectural level. With Razor the CPU is designed with failure and recovery in mind. That’s all about recovering some of the margins all of us are designing into silicon. It’s when you finally get to silicon that you know how much it’s going to weigh because you don’t know quite where in the bell curve it’s going to fit. Those techniques will become more important.
Srinivas: One other trend we have seen along with the low-power trend is the growing complexity of signal integrity. At 40nm and 28nm, signal integrity closure is a very tough problem. When you couple that with variability and low power, it gets even more complex. All of this makes timing closure more difficult. It’s now a question of whether you make the runtime faster by merging stuff or deal with ECOs at the end.