The Easy Stuff Is Over

Everything now has to be rethought, including methodologies and technology that was taken for granted; Moore’s Law becomes less relevant for more companies.


By Ed Sperling
Doomsayers have been predicting the end of Moore’s Law for the better part of a decade. While it appears that it will still remain viable for some companies—Intel and IBM already are looking into single digits of nanometers and researchers speculating about picometer designs—for most companies the race is over.

Progress will still be made in moving SoCs from one node to the next, but at 20nm and beyond all of this will happen more slowly than in the past for most chipmakers. Rather than change process nodes ever two years, that time will be stretched out, at least for most of the chip. Among the factors driving this shift:

  1. While it’s possible to create chips with hundreds of processing cores, it’s almost impossible to write most non-commercial software applications to take advantage of more than two. The concept of dark silicon is well recognized, but there is debate about just how efficient it is.
  2. It’s impossible to get more performance out of a single core without turning up the clock frequency to a point where it cooks the chip. That means more cores, which adds cache coherency issues. It costs both power and performance to keep all data in sync, meaning pushing to the next node doesn’t necessarily result in an improvement in performance or power.
  3. The cost for a performance increase/power reduction is too large to rely simply on shrinking features. It’s too expensive to put everything on one piece of silicon, particularly analog IP, which is one of the big attractions of stacking die, and at 20nm the cost of double patterning can send the total cost of designing and manufacturing a chip into the stratosphere. That cost may go up significantly higher, still, if extreme ultraviolet lithography is not ready by 14nm, even for chips that will remain on the Moore’s Law road map.
  4. Routing congestion around memory has become particularly troublesome after 28nm because wires don’t scale like transistors. There is no simple solution to this problem, despite massive improvement in routing tools, in part because of the resulting physical effects such as electromigration, electromagnetic interference, heat and electrostatic discharge. Packing wires more tightly only exacerbates these physical effects.

Thinking different
While all of these problems together are raising questions about continuing to do things the way they were done, they also are opening up new opportunities in multiple directions.

“We’re definitely seeing an end to Moore’s Law in terms of silicon process technology,” said Mark Throndson, director of product marketing at MIPS Technologies. “It’s a challenge to even hit the same specs in power and frequency under the same conditions at the next process node. There is a challenge on the manufacturing side. And when it comes to multi-core, part of the motivation there is to provide scale and performance. It’s not just a simple matter of moving to the next node anymore and getting two times or even 1.5 times the frequency or less power. The scaling still works on the area side. The other aspects are not scaling.”

Throndson sees a number of options for continuing down this path, such as adding more aggressive libraries and memories, or accepting lower yields and selling lower-performing chips in niche markets. Another option is to target implementation specs or operating conditions that are outside what is considered an acceptable range. “You can design for typical silicon corners, not worst cases.”

The push to reduce margin by not designing to worst-case scenarios can have a big effect on performance and power. But it is taking on some new wrinkles at 28nm and 20nm as routing congestion and physical effects increase. One of the unusual new twists involves a temperature inversion, which began showing up at 28nm, that allows chips to run faster at warmer temperatures while slowing down at higher temperatures..

“We’ve always had to deal with process, temperature and voltage corners,” said Steve Roddy, vice president of marketing at Tensilica. “Standard operating procedure is you use your worst-case signoff condition. In the past, hot meant slow and cold was fast. At 28nm, your normal signoff conditions are 70 degrees Celsius for commercial chips, 100 degrees for industrial, and 125 degrees for extreme mil/aero, and minus 40 degrees on the cold side. But with a temperature inversion there is a question about whether it makes rational sense to do an ASIC signoff for the worst-case condition.”

He said that even at minus 40 degrees, the device will heat up quickly enough. At worst, it will be a slower power up cyle. And at high temperatures, it no longer kills performance.

“The problem is that we’ve been blindly following one thing,” said Roddy. “This affects all logic and it impacts anyone doing an IC design.”

Architectural focus
But there also are questions about whether to stay on this course at all. One of the most striking changes at 20nm is that general-purpose processors with many cores no longer are viewed as the only approach for improving power and performance, and in some cases they may actually lower performance and energy efficiency.

“At smaller geometries you may not see a reduction in per transistor cost anymore,” said Barry Pangrle, a solutions architect for low power design and verification at Mentor Graphics. “Instead, what some companies are finding is that they get better results my more closely matching the applications and the chips. So you may see hundreds of cores, but they may be very specific. In the past, we’ve always balanced compute power with bandwidth and memory. But for different applications, if they’re just doing computation on data, there may not be as much opportunity for caching.”

That’s a big change, and one with enormous ramifications for SoC designs. In a data center, the limit on the number of cores is how much data needs to move on and off a chip. In an SoC, the alternative may be through-silicon vias or interposers, which offer better bandwidth to memory and power, coupled with very specialized transistors.

Another possibility is lowering the accuracy of an individual core or cores for applications that don’t require it. That can significantly speed up performance and lower power.

“People get comfortable with what seems to work and they do things because they’ve been done before and have been shown to work,” said Pangrle. “With single-threaded processors we were doubling clock speed every year or two. With multiple cores, it’s hard to see a decent payoff. Even with GPUs not all applications map well, and in FPGAs, some applications do better because the architecture is more flexible. All of this opens the door to more innovation in the way people use transistors.”

Design changes
All of this puts more pressure on tools developers to do more analysis so they can understand tradeoffs earlier in the design cycle.

“What we’re looking at is a system that is much more complex and trying to integrate multiple cores on a single piece of silicon,” said Mike Gianfagna, vice president of marketing at Atrenta. “If you have to go off chip it costs you, so you want to integrate as much as possible on a single die. But there’s also no homogeneous architecture for performance and power, so you need to profile or analyze the different building blocks for power, performance and area.”

He said that’s particularly difficult in mobile devices because you don’t know how it’s going to be used from one minute to the next. A phone call can interrupt a video, which can be interrupted by text.

ARM fellow Jem Davies agrees, but he also said the power budget of a phone won’t change because there is only so much heat that can be dissipated.

“What we see is a massive case for a general-purpose CPU that runs a big operating system surrounded increasingly by specialized parts—the right task in the right place,” Davies said. “The winners in this game will be people who do the best analysis of the problems. This is like the old system analysts whose job it was to understand the problem before writing code. Is it a general-purpose processor problem or a massively threaded problem or a floating-point problem or a latency or throughput problem?”

Moore’s Law always has been viewed as both an economic equation and a technology direction. From a technology standpoint, it will always be possible to add more transistors to a piece of silicon, although at a certain point the laws of physics create many more complications. And just shrinking features is no longer a guarantee of better performance or lower power because of that increased complexity.

From an economic standpoint, though, the question is whether it really pays to continue down the same path. The answer is yes, for some companies, no for others, and a combination of both for still others. At the very least, the path seems to point to more re-use, better integration of hardware and software, and more rationalized use of resources with a better understanding up front about what designs are trying to accomplish and how to get there. It also seems to point to stacking of die, an emphasis on flexibility, and better synchronization of hardware and software to save energy and improve performance.

What is clear is that no single factor will provide all the benefits, such as Moore’s Law provided in the past. The easy stuff is done. From here on, everything will have to be rethought and analyzed rather than just accepted as the way things are done.