Margin Of Error

Guard-banding makes integration easier and faster, but too much margin is a bad thing. Why it’s becoming difficult to keep it under control.


By Ed Sperling
Adding extra circuits and silicon area to a chip has always been frowned upon by chipmakers. Extra silicon means extra money, and for most chips the least expensive is always the better choice. But at advanced process nodes, margin also can slow performance, increase power consumption, and make it harder to achieve timing closure.

The obvious solution is to reduce margin throughout the design, but the reality is that margin budgets for a complex SoC will never go down. The best that design teams can hope for, in fact, is to keep margin constant from node to node and across stacked configurations. While this will require constant vigilance on the part of architects, it also will increase challenges from the conceptual stages of the design all the way to achieving acceptable yields in manufacturing.

What can’t be fixed
In some cases excess margin is out of reach of design teams. With more and more third-party IP now included in designs—and as much as 90% of the design now a combination of third-party and re-used IP—it’s difficult to even get a firm handle on the amount of guard-banding being done. So far, this hasn’t been a problem because most of the industry still isn’t producing 28nm chips in volume.

“Right now it’s only really a worry for the ‘star-IP,’ because if my USB controller is a bit bigger and power hungry than it might be, it is still peanuts compared with the overall platform figures,” said one architect at a large chip company, who spoke on condition that he not be named. “Even the sum of the power of all the little things doesn’t approach the star-IP. And here’s a thing about the star-IP: It may be big and power-hungry, but it there’s still a case for it. Some IP has a well-defined job to do and has to get that job done as efficiently as possible. But with star-IP, it’s mainly ‘faster is better.’ So sure your Web browser would be more power- and area-efficient on a Cortex-A8 than a Cortex-A9, but I bet you’d rather buy the A9-based tablet.”

Those kinds of choices, as well as time-to-market pressures where IP can be re-used quickly, make guard-banding almost inevitable. What’s surprising is not that it still exists, but that it has remained relatively constant given the explosion in the number of components on an SoC.

Where margin matters most
But margin still causes signal propagation issues because there is more silicon and more wires that signals need to be driven through. That, in turn, leads to the need for wider buses.

“When you guard band you need to ratchet up the intended operating frequencies and increase the clock frequency,” said Neil Hand, group marketing director for Cadence’s SoC Realization Group. “All challenges are made worse. In some parts of the design there is no impact. If you have a low-speed peripheral you probably don’t need to worry about it. But with something like high-performance PCI Express, gen 3, you have fast protocols and huge pipes and margin becomes a critical issue. You have a hard time meeting closure even with no margin. Margin makes it worse.”

He said the key is not so much reducing the percentage of guard banding. The rate has been relatively constant, with about 20% margin at 65nm and 90nm, and at least 15% at 28nm and 20nm.

“With that number there’s a lot more slack,” he noted. “You need to know where the slack is and where it’s going to impact the design. Where you do have room to move it may drive different IP use. There may be better IP externally.”

He’s not alone in that view. In fact, all of the Big Three EDA vendors are counting on the need to trim margin to boost their IP sales over internally developed IP blocks.

“There are a lot of challenges working with 28/20nm because of the variability in processes,” said Navraj Nandra, senior director of marketing in Synopsys’ Analog and Mixed Signal IP Solutions Group. “Reducing margin makes a different for getting performance out of analog. You also want to be competitive in price-performance-area. The question is how much margin you can accept in IP to meet those goals but not compromise on yield or variability.”

This becomes a difficult engineering tradeoff, however. Do you design IP for a specific chip, or do you add enough margin to allow it to easily plug into other designs? For commercial IP, the answer is clearly versatility, but there is a cost to that flexibility.

“You can’t be competitive and have slop in the design, but you can’t build something so competitive that it will only work for one design,” Nandra said. “It’s like a drag car where you run it for a half mile and then you have to replace the engine, the tires, and add more nitrous oxide. You can do the same for super high-performance chips for one temperature range and one process, but it’s useless for anything else. The goal is to build in enough circuit techniques with just enough margin not to risk performance problems if there is variability in the process.”

Process variability has become particularly troublesome at advanced nodes. Coupled with double patterning at 20nm, and the likelihood of triple patterning at 14nm, margin takes on entirely new dimensions.

“We’re trying to characterize process corners and design around a nominal target,” said Jean-Marie Brunet, director of product marketing for model-based DFM and place and route integration at Mentor Graphics. “Third-party integration is a real challenge. Fill used to be a simple process where you insert it at every layer. But you don’t know what is in the IP these days, so fill has to be re-done. That doesn’t help with the integrity of the IP.”

He said that for most IP, there usually is guard-banding on the periphery of the IP to deal with fill. That impacts timing, area and performance.

“This is really an issue for the big chip companies that do 300 to 400 tapeouts a year, not for the microprocessor houses that can take their time to eliminate margin. The problem is there is no magic bullet for everyone else. And when we get into double patterning, this is really going to be an issue because you’re overlaying two masks, and any shift of the overlay will have a dramatic impact on the chip.”

The future
While pressure to reduce guard banding will continue, there is at least some hope for dealing with the problem more effectively. One involves new materials, such as graphene and silicon on insulator, which help reduce power, and new structures such as finFETs and carbon nanotube FETs, which minimize the effects of leakage and thereby make up for some of the power drawn by the extra margin.

A second approach is better tools. Knowing what the variability is in a process allows engineers to design in a minimum amount of margin. Building more accurate models can help, particularly in conjunction with analysis tools for exploring one IP block versus another.

And finally, stacked die will alleviate at least some concerns because portions such as analog can be developed at older nodes where they make more sense, rather than trying to fit everything into the latest process node.

Leave a Reply