Second of three parts: Statistical modeling, voltage variation and where future advances will come from.
By Ed Sperling
Low-Power Engineering sat down to discuss corners with PV Srinivas, senior director of engineering at Mentor Graphics; Dipesh Patel, vice president of engineering for physical IP at ARM; Lisa Minwell, director of technical marketing at Virage Logic; and Jim McCanny, CEO of Altos Design Automation. What follows are excerpts of that conversation.
LPE: How does software affect all of this?
McCanny: Software has the ability to regulate the voltage. That creates an almost infinite number of scenarios for testing the chip and to do timing before you manufacture it. The only solution we have right now is to build more and more corners. There is no worse problem for systems integration. If you don’t pick the right corner, it may create a nightmare. We should be looking beyond corners to the model. The voltage keeps changing or it isn’t correct. Maybe we need better models that can deal accurately with voltage variation without having to build tons and tons of corners. We have seen better models for systems integration. Then there are the models for statistical modeling, which has its own set of issues. It does cut down the number of corners you need, but it doesn’t cut down the amount of work you need to do to get to those corners. As an industry, we need to address this. Doubling and tripling the number of corners at every node doesn’t work. We all fail when the industry hits a wall.
Minwell: I agree.
Srinivas: There’s one more issue I want to mention. Another complication comes from combinations. When you have 50 blocks and various permutations—this block is on, another is off, another is in sleep mode—you have to verify your chip for each permutation. Your block will look different with Block A on, Block B on, and Block C on. As the number of blocks increases, so do the number of permutations. The interplay between the various voltage domains becomes critical.
Patel: The interplay is important, and a lot of people are using asynchronous communications between those domains. We can’t expect a synchronous behavior to exist all the time between the domains. This is asynchronous in that the communication between the domains doesn’t have a relationship in the timing sense.
LPE: Are there tools for this?
Patel: There are clocks, but it has to be dealt with at the system level.
Srinivas: The low-power methodology using UPF or CPF is very well known in terms of how you do level shifters and things like that. The challenge is in the verification.
LPE: Then it’s on both sides of the design process, the architectural model and the verification?
Srinivas: Absolutely.
LPE: Who’s responsible for creating the model?
Minwell: It starts at the foundry level. It’s a combination of that and the simulation world of EDA. But being able to understand and extract the correct parameters to support a model is extremely important. The models really haven’t changed much in recent years. We’ve enhanced that model and added some elements to it for statistical purposes, but the time has come where the foundry needs some help and some sort of new methodology to comprehend the variations we are seeing. We’re seeing a significant falloff in temperature, for example, when we’re looking at timing corners. How are you going to meet any kind of timing with that? There has to be something in the model or the data collection where there is something between 0 degrees Celsius and -40 degrees Celsius. You need to be able to look at the variation by the various poly layers. That contribution is really important. We really haven’t seen innovation there.
Srinivas: On the interconnect side the problem is very well understood. There are much better models for the interconnect—how the resistance changes, how the capacitance changes, various local layout effects. The problem is on the device side.
LPE: But at 22nm and beyond, there are issues like electromigration at the interconnect, right?
Srinivas: There are models available for that. But at the device side the information is lacking because it is much more complicated. The L gate radiation has been well studied. The design community has never elected to adopt this model. They don’t like statistical models. They want to tape out the chip with a real number.
Patel: The problem is more complicated. We can model as many variations in the SPICE model as we want, but if you can’t find a way of representing them at the cell level it’s not going to be useful. That’s the issue we’re facing. Right now if I was told to generate models for something, the amount of effort it takes is enormous. We’re looking at a 5x increase in cost of characterization, validation, disk space, compute requirements, and it mushrooms out of control. If no one is willing to make that investment it doesn’t happen.
LPE: Some of the large IDMs are doing that independently, though.
Patel: Yes, because they can afford to do it. But the mass market is not. They’re not sure if something that is statistically significant will fail when it comes back. As soon as you get into that discussion, they go back to the old approach of margins and certainty.
Minwell: We definitely see a reluctance by companies to move in this direction.
Srinivas: Companies like TSMC are taking a lead in that. They are looking at L gate radiation. They give you tables, but it is not everything. They are looking at the dominant effects. But there is no comprehensive effect to make sure everything is covered, so we still need some margins.
McCanny: There needs to be work done on the process model side, but there’s a need for better modeling on the EDA side. We do use cell-level extraction, which is what all the EDA design tools use today. That can be improved. There are some people who believe you can do the modeling more accurately if you do transistor-level extraction. SSTA (statistical static timing analysis) is one innovation on the EDA side. It hasn’t been done on the IP side. We know the effects are real. Some of the consortiums in Japan have studied this. But there is still a roadblock among the IP guys. And I think 5x is an underestimation.
Srinivas: About five years back Synopsys introduced a CCS (composite current source) model that handled voltage variations much better than before. Things are progressing from that side.
Patel: But CCS is still limited in the voltages it can handle.
Srinivas: Yes. And there is no one cure for everything. There may be corners and nearby corners. A good way to predictably model corners allows you to go from 100 corners to 15 corners. That’s a practical approach.
Patel: From an IP perspective, some of the solutions we need to come up will have to be architectural. I don’t think we’ll ever get down to a small enough number where people will be happy. We need to be able to tune to whatever the operating conditions happen to be. If the temperature is up a little bit, the device is slowed down, so you can deal with it that way. But it has to come from that level.
LPE: Do you bring process variations into that, as well? At 22nm you may not be depositing the same amount of metal in a transistor.
Patel: Yes, and that’s going to show up as a difference in behavior for that transistor from a timing perspective or a leakage perspective, and the impact of that on the whole design. That could be treated as extra corners, or you can say that you expect this to happen and you can treat your circuit accordingly.
Srinivas: I disagree. It’s not a corner. It’s a random variation. It’s an on-chip variation.
LPE: But temperature can be a variation, too.
Srinivas: There are two components to temperature. One is the global temperature. That is global variation. There is an on-chip variation in temperature because of the activity. You may have one component of the chip that is highly active, so the temperature gradient will be more and for some areas it will be less. That is a local problem. When we think about these problems it makes sense to think about whether it’s a global problem or a local problem. Another example of random variation is dopant fluctuation. These are the ideal candidates for statistical modeling. For a more normal distribution you can model that.
McCanny: We can certainly reduce some of the corners with some of the variables. The temperature can be measured in situ and use some feedback mechanism to slow down the clock of the circuit.
Patel: As long as the user experience doesn’t get impacted. If you’re playing back a video and it stops momentarily, it’s okay. If it happens all the time, that’s not okay.
McCanny: Temperature and voltage are areas where we haven’t done enough in the modeling. Statistical timing reduces the number of corners but you still need corners.
Leave a Reply