Established Nodes Getting New Attention

Work is under way to improve energy efficiency and boost performance without relying on multi-patterning or finFETs.

popularity

As the price of shrinking features increases below 28nm, there has been a corresponding push to create new designs at established nodes using everything from near-threshold computing to back biasing and mostly accurate analog sensors.

The goals of power, performance and cost haven’t changed, but there is a growing realization among many chipmakers that the formula can be improved upon without the need to venture into multi-patterning and high-mobility III-V materials, not to mention increasing process variability and shrinking yields. There are tradeoffs no matter which path is taken. Still, the ability to maintain or reduce the cost of designs—while still eking out performance and power benefits using mature process technology—is gaining traction as a future direction that could persist throughout the decade and even beyond.

“From 1994 to 2004, CPUs were doubling in speed every two years,” said Barry Pangrle, senior power methodology engineer at Nvidia. “It was easier to jump to the next node because basically it was free. As a result, people have not spent as much time squeezing more out of each node.”

And there is plenty left to leverage. For one thing, 28nm is still a new node for most companies. So far, only Intel is in commercial production with finFETs. Production nodes for many companies are still at 40/45nm and 60/65nm, with some reaching back to 90nm and 130nm. Being able to move to the next node, or eke more out of those nodes, will require the same advanced tools that have been developed at the leading edge, but the processes are mature and yields will be much higher and the costs significantly lower.

Companies such as Qualcomm, Intel, Xilinx, Altera and IBM will still push forward to the most advanced nodes due to the need for more density due to mobile form factors and raw performance, but the vast majority of companies aren’t rushing forward. In fact, many may not progress downward until ‘EUV’ is commercially available, proven, and the cost comes down enough to allow comparable designs using single patterning. And given that EUV already has missed four or five process nodes, depending on who’s doing the math, there is a chance that may never happen at all.

But even some of the larger companies are beginning to pull back on the reins for shrinking die. “More advanced customers have been pushing new node, new node, but we’re starting to see them doing the same design again, especially at 28nm,” said Mark Milligan, vice president of marketing at Calypto. “There are multiple product revisions and new products coming out at that node. We’ve got a few people looking at finFETs, but there is a lot more discussion at 28nm about which way to go next.”

The good news is there are plenty of options for staying at established nodes without having to learn multi-patterning and finFETs. The not-so-good news is that some of those are difficult to master, too.

Near-threshold computing
The concept of near-threshold computing is hardly a new concept. It has been talked about for at least the past half-decade publicly, and two years ago Intel introduced its Claremont processor that employed near-threshold voltage techniques. The technique works like this: Rather than waiting until a CPU or GPU is fully powered on, it is operated just above the threshold voltage of the transistors—generally in the 300 millivolt to 400 millivolt range for many processors. There are no fixed numbers here—it all depends upon a chip’s requirements. The higher the voltage, the greater the performance and the lower the energy savings.

ARM Fellow Rob Aitken said energy savings are significant using the same exact processors differently. Moreover, it’s most effective at established process geometries where there are no finFETs. But it isn’t a simple process. (For a deeper understanding of this subject, click here).

“We can get 4X to 6X improvements in energy,” said Aitken. “That doesn’t come for free, because we have to make some design changes to allow the design to operate down at this low energy point. But if we do this wrong, the overhead we have to add to get these savings is more than you get in terms of a benefit.”

An almost polar opposite approach, which has been widely employed because it is simpler, is to cram as much computing as possible into a short burst when a processor is fully on—or multiple cores are working in sync—and then power it down quickly. That approach maximizes performance, but energy savings typically are based upon the processor remaining most of the time in some version of sleep mode and therefore are nowhere near as significant as the NTC approach.

“Where NTC works is in an area where performance is non-critical,” said Bernard Murphy, chief technology officer at . “But that’s only one of the reasons why it hasn’t caught on. The second is that the foundries don’t characterize well in that region. They only characterize in the linear region. As a result, who knows how many corners there are?”

Murphy noted, however, that NTC might be very useful for  applications, where performance is less of a concern than having to plug in a device to recharge it. But there is another upside, as well. “The problems we’re dealing with today are in the linear region, and that’s where we’re seeing on-chip variability.” The downside, he said, is that it will take less charge to cause single-event upsets at low voltages, increasing the susceptibility to a greater part of the energy spectrum of neutrons.

Body biasing, FD-SOI, and other options
Body biasing is another technique that is being recycled. It entered the picture at 90nm, when design teams found they could reduce leakage by as much as 30%. Even at 45/40nm, savings were in the 20% to 25% range. But at 28nm, the benefit for bulk CMOS dropped to as low as 2%, said Mary Ann White, director of product marketing for the galaxy design platform at .

“When you use biasing, you are using extra tracks and extra resources, which may include one or two extra rails,” said White. “But the benefit was falling off at 28nm using planar CMOS. Interest is picking up again with FD-SOI. What’s interesting about this is we used to insert biasing at the back end, where you require a bulk n-well or p-well pin. But recently we’ve had a request to add that capability into UPF. We rolled that out with skepticism on our part, but we’ve been getting a lot of interest. If you’re adding 5% area for better power, it’s worth it.” (See related discussion.)

STMicroelectronics experimented with finFETs, stacked die and FD-SOI before opting to throw its weight behind FD-SOI with body biasing, starting two years ago. Since then, FD-SOI has been under evaluation by a number of companies. It was given a big boost in May when Samsung joined forces with Soitec and ST on the technology.

Even existing bulk processes are getting a boost through different processes, though. TSMC introduced a 28HPC process earlier this year that it says is comparable to benefits of FD-SOI with a 10% die reduction.

And there are different techniques within established processes to increase efficiency. “We’re seeing a lot of customers trying dual flops or quad flops sharing a clock to reduce power, which is something you do at synthesis and place and route because you’re replacing a group of flops all over the chip,” said Krishna Balachandran, product marketing director at Cadence. “Another option is to reduce the power of the clock tree with lower Vdd, which is one of the big sources of power on a chip.”

The bigger challenge, though, is adding enough granularity to rationalize utilization of resources. For example, it is possible to decrease the supply voltage for a non-critical path, while keeping the voltage the same for the critical path. “The problem is that a lot of this is not completely automated,” Balachandran said. “There is some automation, some flow restructuring required, and we need specialized IP in libraries.”

Architectural considerations
Perhaps one of the biggest sources of confusion is the addition of more low-power cores in processors, particularly at the most advanced nodes. While the cores are more efficient, adding more of them increases the dynamic power density.

“If you’re using finFETs, the supply voltage scales down and there is lower leakage, but with six or eight cores the overall power does not going down,” said Aveek Sarkar, vice president of engineering and product support at ANSYS/Apache. “Moving to finFETs, you have power gating and RTL analysis, and you’re looking at the netlist to possibly squeeze out even more power. That’s a given. But what we’re seeing is that even companies not looking to move to finFETs are looking at all of these things now—40nm is still a workhorse technology and you do need power savings. We had a visit from a large auto parts company and they said there are two things they care about: 1) Cost, because 50% of the cost of a car in the two years will be electronics, and 2) Power, particularly how the power is consumed and radiation. The less power, the less EMI it generates, which is important when you already have miles of cabling.”

Sarkar noted that one customer in Brazil designed an RFID chip at 20 microwatts, but it didn’t win the contract because the cutoff was 10 microwatts. He said this was not a technology node where power has been a concern, but it is becoming a competitive issue even at older process geometries.

A second consideration is whether to build a particular function in hardware or software.

“Migrating a function from software to hardware can have a dramatic effect,” said Russ Klein, director of engineering at Mentor Graphics. “If your software is a power hog, moving it into hardware can have a huge impact on reducing power consumption. In some cases, it also may allow you to reduce the size of the processor needed to run it, which further saves power.”

Klein stressed that the ability to impact the overall power budget with that kind of decision needs to be done very early in the design process. “You can’t do meaningful tuning of the design. You really have to start baking it in, and the best way to deal with that is to bring software to the mix and model it.”

There are other ways to approach this problem, as well. One involves rethinking how memory is accessed, what kind of memory is used in the first place, and how to make better use of that memory.

“ allowed a lot of software and hardware engineers to be lazy because you knew that memory density would increase and bandwidth would increase and you could get by with general-purpose capacity,” said Craig Hampel, co-chief scientist at Rambus. “What’s changed now is that there is more awareness about the structures for storing data and how you access it.”

One of the big opportunities in this area is more application-aware memory. “The vector for scaling memory is reliability,” Hampel said. “With DRAM, you maintain redundancy but you don’t typically use that outside of manufacturing. There is no notion of adaptive redundancy in DRAM like there is in flash, where you have dynamic redundancy. But because we page in a lot of data and really don’t use a lot of it, there is an opportunity adaptively reorganize that data to be much more efficient.”

Other options
There are many more options, too. Semi-accurate computing has gained more attention lately as a way to save power and cost. Rather than making sure input is 100%, there is a sliding scale for what is considered good enough for a particular application. The less accurate, the less expensive.

Jan Rabaey, professor at the ”UC, talked about this approach in a video interview in 2012. http://semiengineering.com/tech-talk-faster-accurate/

“Some chips only have to do a few things well,” said Nvidia’s Pangrle. “Every time you want to make it do something else, you have to give up functionality or performance or power.”

Finally, there has been much written about stacked die—both 2.5D and , which are essentially hybrid approaches that can combine both old and new, and anything in between. But as more work gets under way to improve performance and reduce power at older nodes, distinguishing what’s old and new may become a lot harder.



Leave a Reply


(Note: This name will be displayed publicly)