Last of three parts: Self-adapting circuits, the growing burden of timing closure and constraints in multimode designs; redefining what is a failure.
By Ed Sperling
Low-Power Engineering sat down to discuss timing constraints with ARM Fellow David Flynn; Robert Hoogenstryd, director of marketing for design analysis and signoff at Synopsys; Michael Carrell, product marketing for front end design at Cadence; Ron Craig, senior marketing manager at Atrenta; and Himanshu Bhatnagar, executive director of VLSI design at Mindspeed Technologies. What follows are excerpts of that conversation.
LPE: Will it be economically feasible to continue working on tools in this area for fewer and fewer customers?
Flynn: One thing we’ve been working on, which may end up in a product, is based on, if you start something and it doesn’t work you can recover. We have customers who are experts in yield and other customers who are much less expert and would love help in that area. They don’t have tools to analyze and build. Maybe we can get to the stage where you can work in a more conventional design flow and still work.
LPE: That’s a lot of the momentum behind 3D, right?
Flynn: Yes, but this would be more in the conventional flow.
Hoogenstryd: You’re talking about self-adapting circuits?
Flynn: Yes. If you’re not failing you may have overdesigned it. But you do have to adapt it.
LPE: Are we redefining failure, as well? Is it a chip that doesn’t work according to previous models or one that doesn’t work according to a different set of standards?
Flynn: This is parametric failure where you’re not quite where you hoped to be. Failure is coping with the very first errors creeping in. I think the terminology is error-resilient structures.
Bhatnagar: For everything you’re building redundancy.
LPE: But you can’t do that anymore at advanced nodes, right? You kill the power and performance with that approach.
Bhatnagar: That’s a big problem. We’re using an ARM A9 and trying to get to 1GHz. Performance is one problem. Power is another. The other thing is variability. Filler adds more complexity into your timing closure. That comes at your end stage and then you have to extract it. The volume of data becomes so large there’s no way to deal with it. If it’s a small chip it’s okay. But with 13 x 13 at 40nm, there’s no way to deal with it.
LPE: Is there an acceptable failure rate?
Bhatnagar: We budget by using uncertainty and OCV (on-chip variation) factors. The chip grows, power grows, everything grows.
Craig: Or you run multiple back-end tools until you get the answer you’re looking for.
Bhatnagar: Today there is no real solution for it. If you ask what margin you should use at a particular node, there’s no quantifiable number.
LPE: So what is good enough? That seems to be changing.
Bhatnagar: Every transistor is behaving differently. OCV has changed to AOCV—advanced on-chip variation. People have taped out using this and it seems to be working. The penalty is you have to create thousands of tables.
Hoogenstryd: That’s better than creating the library to run them.
LPE: Who gains control out of all of this?
Craig: The foundries are the only ones who understand it well enough to make it work. It’s like a vendor putting their engineers on-site at the customer location to make sure they do the right job. That’s a trend I see continuing. The vendor does 90% of the flow.
Bhatnagar: Timing wise, the foundry drives all these numbers. This is what you must have.
Craig: But that said, they can’t do it alone.
Bhatnagar: They do the test chips.
Flynn: Unless you can build IP that can adapt itself. That’s what we’re trying to do with less-expert customers.
Carrell: That’s very valuable to people. If you can get help integrating IP, that’s more valuable to a lot of customers than flexible IP.
Flynn: It is attractive, but the delivery is hard.
Bhatnagar: Ideally I want to have silicon-proven IP right up front. But I don’t want to be the first one because then I have to compete with TI. So I go with unproven IP and take the risk and my yield is terrible. So at 40nm you’ve burnt $3 million just in mask costs.
Flynn: And it gets harder and harder to get timing closure with a revised model because everything is getting so complicated.
Bhatnagar: That’s absolutely right. At 40nm version 1.0 of my chip ran pretty well. When I got back version 1.2 it ran 20% slower.
Flynn: But it might yield.
LPE: As a percentage of the overall chip design, how much is timing closure?
Hoogenstryd: It’s getting smaller as a percentage, but that’s only because everything else is growing. Timing closure has always been the leading problem for the biggest customers working on high performance. But everyone now has to wrestle with issues they didn’t have to deal with in the past. Power is part of the equation whereas it wasn’t in the past. If the power is growing, timing will decrease. But in absolute time it’s increasing.
Flynn: And all of those techniques impact timing.
Bhatnagar: I think timing closure is getting bigger. You have all these massive chips—DVFS (dynamic voltage frequency scaling), ACS, all the UPF-CPF headaches, then your CMP timing closure. When you add it up, it’s about 9 months of work. Your timing closure starts when you’re doing your architecture. Some of the IP like ARM’s is already there. You start with that. Then you move to the next level.
LPE: Are we getting to the point where we can’t push market windows any further?
Bhatnagar: Absolutely.
LPE: So it’s no longer just the cost factor of getting to the next node. Now it’s also the time it takes to get there.
Carrell: The function, the cost and the schedule have always been balanced.
Bhatnagar: Time to market is the most important thing. You may sacrifice something, which is why the number of revs goes up.
Craig: That’s what Apple does.
Carrell: They design it from the application down. The hardware guys are working hard on the silicon, but it’s almost like they’re moving differently there.
Flynn: It’s also helpful if you don’t compete on megahertz. Those headline marketing numbers are not relevant anymore.
Carrell: That’s right, which is why on the box of the iPhone the company doesn’t talk about how many megahertz the processor is inside of it.
LPE: What else affects time to market from the process side?
Flynn: Things like double patterning are having an effect.
LPE: But even at 180nm there are changes to the process involving power. Will we see timing issues there?
Craig: Handling the modes is tricky. How do you merge all your constraints together?
Bhatnagar: It’s not the timing closure. It’s the number of constraints from the various modes that goes through the roof.
Craig: Do you have a pessimistic set of timing constraints in the end? If you do, how do you come up with that? Is it too costly to try to merge everything together?
Bhatnagar: You can have different performance with different voltages.
Carrell: And do you really need to put that in a different voltage? You need to figure that out at the beginning.
Craig: We had one customer tell us that the only way we could solve their timing constraints was to work with them. They were the only ones who knew their process well enough.
Carrell: You can’t just work up a magic formula to say how 17 different modes are going to work together. It depends on the design.
Flynn: For years we’ve been talking about asynchronous, locally synchronous design. You get timing closure in complex windows. We’d love to see more work done in this area.
Bhatnagar: Timing closure starts at the micro-architecture level. How you design your clocking scheme is the most important.
Flynn: And how you provide the overrides.
Leave a Reply