Seven Ways To Improve PPA Before Moving To FinFETs

Most chipmakers are rethinking how to boost performance while reducing power and area.

Henry Ford wrote in his autobiography, “Any customer can have a car painted any color that he wants so long as it is black.” And for decades, the semiconductor industry has marched to a similar theme set by Moore’s Law. But with the transition to finFETs harder than it first appeared, questions are beginning to pop up that is fueling a new level of confusion.

While the growing list of problems at 16/14nm can be solved with advanced tools and better understanding of what needs to change, it’s the sheer volume of problems and the tweaks in existing approaches that are prompting chipmakers to rethink their next steps. For example, well-tested multi-Vt approaches work well at 28nm, but they don’t work with finFETs. And while metallization is manageable but difficult at 16/14nm, it’s looking much more difficult at 10nm—particularly in conjunction with multi-patterning of more layers.

What’s surprising, though, is just how many knobs are still left to turn even at older process nodes to improve power and performance. And as more work goes into mature process technology nodes, there may be even more improvements possible. Here’s a list of seven of the most prominent approaches to surface.

1. Better place and route. It seems obvious that place and route has a significant role in improving performance and lowering power. Longer wires, leakage and the power needed to drive signals are all affected by placement, but the primary focus of most P&R teams is ensuring signal integrity and fitting a growing number of functional blocks on a chip.

“There is still plenty of room for improvement, even with known techniques,” said Koorosh Nazifi, engineering group director for the analog and mixed signal initiative at Cadence. “The placement of registers has implications from a clock standpoint that could adversely affect power. If they’re in the same region, you can get some power savings. So even known techniques can be applied to register alignment, biasing and the clock network.”

Clock gating, in particular, needs to be physically aware, Nazifi said.

2. Optimize power and area together. The usual route for optimization is to focus on timing, then power and area, or leakage and area. Dealing with multiple factors together is more of a system-level power approach, but it can offer huge improvements.

“Optimizing power and area is much more critical in complex designs, and cannot be done one at a time,” said Arvind Narayanan, Olympus SoC marketing manager at Mentor Graphics. “The alternative is finFETs, and there are a lot of restrictions there about what you can and cannot do.”

There are a couple of techniques that are gaining attention in this area. The first is sequential clock gating. Established solutions only consider the combinatorial logic between clock stages, but multi-clock analysis can shut down larger portions of the design. Adoption has been slowed by the lack of verification solutions necessary to make this tradeoff automated, an attribute that many people desire.

For those who want to explore alternatives, high-level synthesis could add even more savings by making architectural and micro-architectural modifications, optimizing register and memory accesses, and even ensuring the best algorithms have been selected. These tools require manual intervention and this analysis requires a considerable methodology change, starting with an untimed description written in SystemC rather than the RTL languages that designers are more familiar with.

3. Choose IP carefully. Slashing the power budget of IP in SoCs is possible, and collectively it can be a significant amount of power given the amount of third-party IP in a design. Witness the most recent non-volatile memory IP specs from Synopsys for ultra-low power. The IP reduces power by up to 90%. The tradeoff item here was performance, but power, endurance and sensitivity were all factors in the design, according to Angela Raucher, NVM product line manager at Synopsys.

“There are some system-level tradeoffs that need to be made with this,” Raucher said. “One is low peak power. If you do a read, can it be a single-bit read? And how often do you have to read data? This isn’t going to replace flash in an MCU, but it will save power in a small area. The more you understand the application, the more you can make these kinds of system-level power decisions.”

4. Forward/reverse biasing and voltage reduction. While this technique has been around for some time—articles began appearing as early as 1996—it can be used to lower leakage, and particularly to lower leakage after a device or block is turned off. The key here is controlling the flow of electrons across the P-N junction.

“The usage of these kinds of techniques varies significantly from one design to the next,” said Cadence’s Nazifi. “The problem is that leakage is a function of the process node and the cells being used. Even in the off state you could still have leakage. One way to deal with that is to minimize the threshold voltage.”

5. Hardware/software co-design. The perfect marriage of hardware and software development has been promised for decades, and ignored by most design teams for just as long. But the potential for what can be gained from improved hardware-software understanding is huge.

What makes this increasingly important is that more and more of the software is now in the hands of chipmakers. In years past, some of the best power-management hardware was ignored by device makers because there was no software to take advantage of it. That problem persists today. ARM’s big.LITTLE processor, for example, was designed to dynamically assign tasks to the appropriately sized processor core through intelligent software prioritization. However, the primary application of this technology—at least for the moment—is two separate processors for different functions.

“The software has to be smart enough to use both cores, but you can also decide up front which processor is used for what,” said Ron Moore, vice president of marketing for ARM’s physical IP division. “Over time you’ll see more functionality taken up with software. This is what’s happening in other areas, too. DVFS, power shutoff and voltage islands are all hardware-driven now, but that functionality will move to software.”

6. New materials. What used to be considered exotic or too expensive requires a second look at 28nm. That’s particularly true for FD-SOI. STMicroelectronics evaluated a number of options. It designed 2.5D and 3D-IC test chips and took a deep look at finFETs before deciding to change the substrate material at 28nm, where double patterning is not required and the process is already mature.

“FD-SOI is the best solution for now,” said Giorgio Cesana, ST’s marketing director. “14nm is more expensive, and 2.5D and 3D remain interesting because you can use the best technology for each part of a system. The downside is that you have complex interfaces that do not shrink, and it takes a huge amount of time to characterize. The clear advantage with 2.5D is that you can take complex technology at 55nm and put it with digital technology at 28, 20 or 14nm.”

The sticking point for most companies is the price of the interposer on 2.5D, which is being set by the foundries, and the extensive trailblazing design work needed on a full 3D-IC. He noted that for one of ST’s customers 2.5D was the best option due to huge memory requirements and the amount of power needed for I/O, but the customer decided that DDR4 was good enough.

“There are many different ways to get there,” Cesana said. “2.5D is one way, but right now market constraints won’t support it. A 20nm process with 14nm transistors is another, but you need a finer metal pitch and double patterning. 14nm is still gate first, so there are a huge number of masks. FD-SOI is the best choice for now. Today, the 28nm FD-SOI process is only as complex as the low-power process, and we’re finding excellent ultra-low voltage in a DSP implementation. This makes it one of the best technologies in cost effectiveness for the Internet of Things.”

7. Die stacking. For memory makers in particular, stacking die is the only choice available. The general consensus is that there will not be a DDR5, so big gains require a different approach. But there also is widespread experimentation in high-volume chipmakers and vertically integrated companies with stacked die—both 2.5D and 3D-IC—because they can bury development costs in the price of an end-user device and reap the benefits of shorter distances and bigger pipes.

The push to the next process node and the continuity of Moore’s Law do not appear to be in jeopardy. What is changing is the timing of who moves forward, when they move and in the case of stacked die, exactly what they move because not everything has to be done in the latest process. But what’s also obvious is that the big gains of the past won’t come from any single approach anymore, no matter how much money is poured into it.

Even Ford began offering other colors in 1926—18 years after the Model T was first introduced—as a way of staying competitive with other carmakers. There are more ways than one to move a business forward, and more chipmakers are beginning to examine and re-examine options they never would have considered in the past.

—Brian Bailey contributed to this report.

Related posts