Low Power-High Performance

Power Moves Up To First Place

At 28nm and beyond, the main technical hurdle to contend with is power. And no matter what tools or methodologies are thrown at it, it’s getting more difficult to manage.

April 10th, 2014 - By: Ed Sperling

Virtually every presentation delivered about semiconductor design or manufacturing these days—and every end product specification that uses advanced technology—incorporates some reference to power and/or energy. It has emerged as the most persistent, most problematic, and certainly the most talked about issue from conception to marketplace adoption.

And the conversation only grows louder as line widths and feature sizes continue to shrink or new manufacturing processes are introduced. The push into 2.5D and 3D stacked die is a discussion about performance plus energy efficiency. And the move to 28nm FD-SOI and 16/14nm finFETs is all about current leakage, power density, and the energy efficiency of processing data.

“The priority used to be performance, cost and then power,” said Wally Rhines, chairman and CEO of Mentor Graphics. “Today it is power, cost, and performance, in that order. There are still people squeezing the last ounce of performance out of designs, but the vast majority of chips is more limited by power than performance.”

That has a profound effect on how chips are designed, what IP and process technology they use, and how the use cases are modeled.

“Energy is about distance when it comes to data,” said Chris Rowen, a Cadence fellow. “But different types of data have different characteristics. Choosing computing architectures to correspond to data we care about is a big lever. Dedicated logic is only more efficient with very little data, and the more you can do in software, the less dedicated logic you need.”

Put another way, a 32-way core requires less effort and energy than 32 one-way cores—providing the data can be parallelized—because the processors use less energy in the instruction set and the power needed to feed data into the processor, Rowen said. The problem is that most data still cannot be parsed onto different cores or processors effectively, so processors will continue to proliferate around an SoC.

This is business as usual at the bleeding edge of SoC design, and even in multicore processor design, where cores can be turned on and off as needed. But in all cases, energy has moved to the forefront of architectural decisions. In fact, power frequently is the first problem to analyze, followed by a discussion about how to do it in a cost-effective and timely manner. For some companies, that analysis means not moving to the next node as quickly, not acquiring certain types of IP, or not configuring a chip in a certain way because it would risk an increase in leakage current and dynamic power. Both of those factors produce heat, which in turn can reduce the long-term reliability of a chip, lessen battery life in the short term, and greatly accelerate electromigration, which can kill the chip outright.

Krishna Yarlagadda, president of Imagination Inc., phrased it rather succinctly: “Power and thermal will be dominant at 28nm, but it will define sub-20nm SoCs.”

This problem can be addressed using three different approaches. Each has its own unique set of challenges that will be addressed below.

FinFETs and other 3D transistors
FinFETs have, by far, received the most attention because they are a continuation of the feature shrinking approach that has been the mainstay of Moore’s Law since 1965. FinFETs control current leakage by using either two or three gates instead of one for planar transistors.

But while finFETs solve one problem, they create others, notably higher power density, greater heat, and less conductive materials to get rid of that heat.

“With 16nm finFETs, there is about a 20% higher current drive capability, but there also is an increase in local power density and current density,” said Norman Chang, vice president and senior product strategist at Ansys-Apache. “But finFETs use a different substrate material. Bulk CMOS has better thermal conductivity than the finFET, so heat is easily trapped, and there is not an easy way to force it through the substrate and vias to metal 2 from metal 1. As a result, the thermal problem is getting worse. We’ve been working very closely with the foundries, which have more measurement data on thermal distribution than anyone else, because you cannot solve this using traditional approaches.”

While high-mobility materials from the III-V sections of the Periodic Table will help, thermal distribution problems will still grow worse at 10nm. At 7nm, there are questions about whether finFETs will work at all.

“I don’t think the FinFET is going to cut it for 7nm (foundry node),” said Greg Yeric, senior principal engineer at ARM. “Magic contact resistance reduction, for instance, absolutely has to happen for 7nm because the transistor resistance is quickly getting swamped by the contact resistance. This is one of my favorite areas of technology development, because it is so wonderfully counter-intuitive. You put an insulating layer into your contact in order to reduce the contact resistance. I’m glad I’m not the contact reliability engineer. This is a bigger deal for the high-performance crowd because it primarily translates into power reduction for performance-challenged designs.”

That raises a big question about what comes after finFETs. One of the leading candidates is tunnel FETs, or TFETs, but there is scant material available on assessing variability. “They have a long way to go on the drive current side,” said Yeric. “My gut feeling is that when you use band-to-band tunneling for your device operation, your variability is going to make a quantum leap upwards. I am also not a huge fan of the alternate materials universe. Once you put Ge or InGaAs or whatever into realistic 7nm dimensions, most of their fancy mobility evaporates. Quantum Well versions of these might work, but that’s two levels of complexity—the material and the structure.

He noted that the best alternative might be gate-all-around silicon nanowire, particularly the horizontal version, because there is no physical design disruption and an option to stack nanowires to improve performance.

“The good news on the power side is that the parasitic capacitance is already so bad that the penalty going from FinFET to GAA at 7nm will be a whole lot less than the penalty we paid going from planar to FinFET,” Yeric said. “Maybe then that will give the TFET people time to conjure up sufficient drive current for the 5nm node—or some kind of C/B/Mo/S/P 2D wonder-material.”

FD-SOI and LP processes
For companies that don’t need to migrate down to finFETs, there is plenty of investment at 28nm these days. In fact, there is even investment under way at more mainstream nodes—40nm and 55nm, where some of the same low-power techniques and processes are being applied.

The question now is how long 28nm will actually suffice, and there are mixed opinions about that. It depends, to some extent, on pieces of the equation that are normally well outside of the design engineering world, such as the availability of extreme ultraviolet or multi-beam lithography, which would eliminate the need for multi-patterning all the way down to 10nm, and more information on high-mobility materials that can offset some of the quantum effects that make the movement of electrons slower and sometimes unpredictable.

Those quantum effects are particularly important when it comes to memory, which is now scattered around SoCs to improve performance of multiple processors or processor cores. Design has always been about solving bottlenecks, and the so-called memory wall beyond DDR4 is an issue that has ramifications both for power and for performance.

“It looks as if 28nm is going to have a long lifespan,” said Frank Ferro, senior director of product management at Rambus. “But we’re also seeing a lot of attention being paid on the memory side to the Hybrid Memory Cube and high-bandwidth memory to solve the memory bandwidth problem. We still need to figure out how to solve this, and the answer—at least in the short term—appears to be serial SerDes because it can use everything from HP to LP processes.”

There are a number of things that can be done at all nodes to improve energy efficiency, as well. Mark Baker, director of product marketing at Atrenta, points to seven such areas even before jumping into more exotic approaches—and most of them can be done using existing technology nodes.

“VT optimization is generally solved, and the FD-SOI process will further address leakage,” said Baker. “But there also is DVFS, which is commonly used in core processors but still not mainstream across all design applications; clock gating, which is a significant focus today for RTL-based power optimization; shutoff and sleep, which is commonly used, but there is further ongoing investigation into light sleep and other sleep modes; power-aware verification, which can impact domain verification across multiple power states; emulation and software optimization, which provides a tighter integration for RTL analysis and signoff tools; and design partitioning for power and thermal, which is an underutilized area today.”

ARM’s Yeric said that especially in light of variability and Vmin and SRAM, FD-SOI is very useful for low power. That, in turn, could allow chipmakers to wait until some of the bugs are worked out of finFETs at 10nm—and hopefully new lithography options are available and commercially viable to avoid multipatterning on the design side.

As Synopsys chairman and co-CEO Aart de Geus put it, “finFET is absolutely unstoppable at this point.” What isn’t clear, though, is for which node and by which companies. It all depends on when and where they want to tackle some of the big technical problems. “The big issues are performance, power, area and yield. Of those, power is the most restrictive.”

Stacked die
Inevitably, every discussion about power boils down to the architecture, and every architectural discussion eventually includes stacked die. Bigger pipes and shorter wire distances, either with through-silicon vias in 3D-ICs or interposers in 2.5D, and make this a huge knob to turn.

“TSV manufacturing is not that straightforward, but it will certainly have to happen because people will stop paying $10 million for ASICs,” said Taher Madraswala, COO of Open-Silicon. “There is a huge incentive for fabs to invest in TSVs. The ability to use off-the shelf parts to create something new and still add value on the die is huge.”

There are some problems to solve before this approach really takes off, though.

“With 3D-IC and finFETs, there are two issues that have to be dealt with,” said Apache-Ansys’ Chang. “One is the thermal runaway issue caused by power density. You have a new thermal profile, and leakage power is dependent on temperature. If you increase the temperature, you increase the leakage. The second problem is thermally induced electromigration. If you increase the temperature, you decrease the electromigration limit, so you have to find the hot spot on the chip and fix it for EM violations. But you also have to know where that hot spot is to put the thermal sensor.”

As ARM’s Yeric put it, “2.5D/3D definitely appears to be a key way forward on the power problem. Paul Franzon (professor of electrical and computer engineering at North Carolina State University) had a nice tutorial and paper at IEDM studying the memory power of 2D versus 3D, and it looked fairly positive. Like many things, though, the more you think about it the harder it gets. Should memory be split across cores? And depending on the ‘right’ answer to that, how do you do ECC? Is hybrid memory cube itself a sufficient first step? I am not sure about ‘reasonable time and cost’ working out for mainstream 3D design at 10nm, but surely it will happen by 7nm, because really there aren’t a lot of other options.”

Ed Sperling

(all posts)
Ed Sperling is the editor in chief of Semiconductor Engineering.

2 comments

Richard Trauben says:

April 10, 2014 at 12:44 pm

A significant chunk of chip power is in the embedded memories.
Even after specifying a foundry process(28nm), a memory type
(SP= 1RW), depth (8K) and width (144 bits) and read/write/idle
activity rate (40%,40%,20%) most compilers generate a power characterization in their .liberty which isnt a single number .It makes
understanding and estimating how array instances integrate into
full chip array power dependent on running additional tools.

Richard Trauben says:

April 10, 2014 at 12:53 pm

Moving ahead, contact resistance is the more difficult challenge
than the basic transistor. At 10nm, metallurgy material science
will need to pull a rabbit out of the hat to reduce RC to acceptable
levels. The lifetime of that rabbit (#process nodes before it becomes
a limiter again in following generation and the fab equiptment
investment to make it happen in the current generation) will get
alot of scrutiny in the next 18 months.

Power Moves Up To First Place

Ed Sperling

2 comments

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

RISC-V’s Increasing Influence

3D-IC For The Masses

Chiplets Add New Power Issues

Development Flows For Chiplets

New Data Center Protocols Tackle AI

Chiplet Tradeoffs And Limitations

Implementing AI Activation Functions

Die-to-die Interconnect Standards In Flux

Sponsors

Recent Comments

About

Navigation

Connect With Us

Power Moves Up To First Place

Ed Sperling

2 comments

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

RISC-V’s Increasing Influence

3D-IC For The Masses

Chiplets Add New Power Issues

Development Flows For Chiplets

New Data Center Protocols Tackle AI

Chiplet Tradeoffs And Limitations

Implementing AI Activation Functions

Die-to-die Interconnect Standards In Flux

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored