Partitioning For Power

Emphasis shifts to better control of power when it’s being used, not just turning off blocks.


Examine any smartphone design today and most of the electronic circuitry is “off” most of the time. And regardless of how many processor cores are available, it’s rare to use more than a couple of those cores at any point in time.

The emphasis is shifting, though, as the mobility market flattens and other markets such as driver-assisted vehicles and IoT begin gaining traction. In a car, turning sensors off reduces the effectiveness of an active sensor network. That approach doesn’t work with a gesture-controlled gaming platform or television, either. And it doesn’t work in industrial control or healthcare. In those cases, as well as many others, the trend is toward real-time data processing.

But keeping devices in a state of perpetual readiness while maintaining energy efficiency is far more difficult than just powering down most of the device. Thermal and power-related issues, such as electromigration, electrostatic discharge and dielectric breakdown, can impact reliability and even functionality at the most advanced nodes. And as mainstream IoT designs move to 55nm and 40nm, those kinds of issues are starting to creep in there, as well.

“As dimensions get smaller, more things interact with each other,” said Norman Chang, vice president and senior strategist at Ansys. “At 10/7nm, you have more and more wires that can interact in terms of thermal migration. If you look at the channel bus, you have a lot of wires in a small area. With the thermal migration effect, the temperature can be higher than expected. One customer measured the thermal gradient on a chip at more than 45 degrees, and that was at 45nm. That can affect resistance, which is dependent on temperature, and push a design past the electromigration limit.”

So how exactly do you lower the power in a device while keeping it in a state of active readiness? The answer increasingly relies on better partitioning of design elements for power. There are a number of options to enable that. Some are new. Some have been talked about for years but never widely implemented because there wasn’t a compelling enough reason to justify the added cost. But everything is now on the table and under serious consideration.

Software vs. hardware
One of the fundamental partitioning decisions that needs to be made in any device is whether a particular function gets developed in hardware or software.

“If you can offload to hardware, it helps performance and reduces energy consumption,” said Markus Levy, president of EEMBC, an industry group that benchmarks embedded systems. “You add overhead because you’re adding gates, so this is definitely not free. And you have to determine how you run something, how often you run it, and whether it has an impact on performance, latency and joules consumed. So a real-world profile is critical. If you just run something over and over it draws too much power.”

Software has some clear benefits, including easier bug fixes and updates, and consequently more flexibility in time-to-market scheduling and dealing with engineering change orders. But it is almost always more efficient to perform a function in hardware than software.

But which hardware? That choice is becoming more complex, particularly with the increased availability of advanced packaging. While the clear choice in high-performance, low-power systems historically has been to put everything on one die, that’s no longer an option for some companies because there isn’t enough volume to sustain the rising cost of design and manufacturing. It’s also not the only way to save power.

“With partitioning you start at the highest level and break it down into hardware and software, and then figure out what kind of packaging you want and what you want to put on a board or on a chip,” said Dave Wiens, business development manager for Mentor Graphics‘ Systems Design Division. “We still have designs with hundreds of boards. When you partition the board, you create a hierarchy. So you may have single or multiple chips, and you may want to optimize the process from the chip level up.”

System vs. block
All of this has an impact on another set of decisions, which is how to do the architectural analysis in the first place. In the past, partitioning has been done using pre-defined IP blocks, which in many cases includes a set of black boxes. Some of these blocks are well-characterized, while others are not. But as power budgets tighten due to a mix of always-on, sometimes-on and mostly off features, partitioning becomes significantly more complex. Rather than working at the block level, partitioning decisions now have to move up to the system level.

This is particularly difficult with sensors and analog components, because they don’t always power up or down the way digital components do. As a result, some vendors are including both analog and digital circuitry in a block, with an understanding that each has its strengths and limitations.

“One topic we’ve been discussion a lot lately in Synopsys is how to partition more IP,” said Navraj Nandra, senior director of marketing for Synopsys‘ DesignWare Analog and MSIP Solutions Group IP. “For digital, there are a lot of configuration options. You can program the current up and down, and you can do that at smaller geometries. If you look at a high-speed SerDes, that is already partitioned. There is an analog block, and then the equalization is done in the digital domain. With digital, you get more flexibility and programmability.”

Consider a high-speed interface between DRAM and the CPU, for example. By adding digital circuitry into the physical layer (PHY), things like skewing and power/voltage/temperature (PVT) compensation can be done at the RTL level, Nandra said. “So you can use RTL to connect a memory controller to a system.”

The same type of scenario plays out on the power side. In smartphones, for example, one of the big problems is how to switch an application on and off quickly enough without draining the battery. “Power and wake-up time are system decisions,” Nandra noted. “So you have to speak to latency in a way that it is not sucking down power all at once. There is a spectrum of usage on a smartphone—you have a phone, text, and maybe augmented reality. All of them are different applications, and if you do it wrong, on and off can quickly turn out to be a problem.”

Partitioning also opens up a new set or problems at the system level, though. “You’ve got complex power distribution, multiple voltages, power ICs,” said George Zafiropoulos, vice president of solutions marketing at National Instruments. “How do you know it all came up at the right time? You can add ADCs and monitor the voltage rails, but that adds cost.”

Thinking differently
Looking at partitioning from a system level requires a different mindset, as well. Complexity is rising in many devices, regardless of whether it is a smartphone or what used to be a simple industrial meter. That complexity requires a deeper understanding of how power is used in a device, which often is represented as a distribution plot rather than a fixed number because it can vary greatly from one user to the next.

But this isn’t a linear progression like shrinking features. The growing interest in machine learning and neural networks has shaken up the entire electronics industry, creating new models for where processing is done, including what data needs to be moved and what can be processed at edge nodes versus in more powerful processing locations such as the cloud—or even an on-board centralized computer in an autonomous vehicle.

“Partitioning of computational elements is suddenly more important,” said Chris Rowen, a Cadence consultant. “The question becomes how you hook together multiple blocks. Synchronous is easier. There is more investment there and it is more predictable. There is some GALS—globally asynchronous, locally synchronous. But that becomes tactical. There is always some timing gap. The goal is ultra-scale computing across different boxes, and we expect that to be asynchronous.”

That opens the door for other options, as well. There has been much discussion about whether designing for worst-case scenarios is an optimal use of resources because it requires extra logic circuitry, which can impact both power and performance. Yet there has been comparatively little discussion about how to eke much more power out of individual functions and circuits.

“The big question is how to get operations done at acceptable power figures,” said Drew Wingard, CTO of Sonics. “There are a lot of idle operations, and you need to make sure those do not consume a lot of energy. That requires active power management, where you take advantage of very short idle moments.”

Sonics is backing an approach called an energy processing unit, a subsystem that maximizes idle states in an SoC based upon power gating, clock gating, and voltage and frequency management within a device.

Physical separation
The increasing use of system partitioning into multiple chips connected by high-speed buses rather than putting everything on a single chip raises some other possibilities for managing power.

“System architects are looking at the problem in a different way rather than just relying on silicon technology,” said Kelvin Low, senior director of foundry marketing at Samsung. “You can partition a system to achieve system-level performance scaling. So if you use a 2.5D approach with HBM2 (second-generation High-Bandwidth Memory), the system-level performance increases. It becomes a partition problem, but the distributed processing approach is an important enabler.”

This has a bearing on power, as well, because it takes less power to drive signals through an interposer than through increasingly narrow wires on a single die at advanced nodes. As a result, there are significant power savings in addition to performance increases.

“One of the advantages of HBM2 is that it is that you can move it closer to the processing, and you have 2 gig (gigatransfers/second per pin) rates,” said Frank Ferro, senior director of product management for memory and interface IP at Rambus. “The power of HBM2 is lower, too, and you can re-use quite a bit of technology. But it does require a new PHY design.”

Samsung introduced its version of HBM2 in January. SK Hynix announced availability of HBM2 in August.

Greg Yeric, an ARM fellow, believes that using multiple die in a package opens up a lot of opportunity to optimize power, performance and cost.

“Machines already can bond the wafers, and wafers with standard-cell-level partitioning are coming,” said Yeric. “If you are doing standard-cell-level partitioning, that can help pay for the technology. You certainly get a performance increase. But the question is whether this is a one-time trick. Ultimately, what we are looking at is a 3D IC, where you have memory over logic, such as what Leti is doing with CoolCube. You also can do nMOS over pMOS, or CMOS over fast CMOS.”

Power always has been a global concern in design because it affects every part of a chip, the package, the board, and increasingly the entire system that may contain many of these components. While most chipmakers have gotten the message that power needs to be dealt with early, partitioning for power rather than for functionality or performance has not been seriously considered. That is beginning to change, and it reflects a growing acceptance that power is now a first-order problem in all designs.

But partitioning for power at a system-level is no simple task. It is every bit as complex as system-level verification and analysis, with a few quirks of its own, such as accounting for thermal migration, thermal inversions, chip-package-board interactions, RC delay—and the ability to simulate all of this at a multi-physics level quickly enough to still get a device out the door on time.

Still, the future in semiconductor design has always been about efficiency, and partitioning for power is the next big challenge that needs to be confronted and effectively managed. At this point power partitioning is just getting going. It will likely take some time before enough engineers are trained on this approach to push it into the mainstream.

Related Stories
Thermal Damage To Chips Widens
Heat issues resurface at advanced nodes, raising questions about how well semiconductors will perform over time for a variety of applications.
Stacked Die Changes
Experts at the table, part 2: Different coefficients of thermal expansion cause warpage problems; known good die issues.
Electromigration: Not Just Copper Anymore
Advanced packaging is creating new stresses and contributing to reliability issues.
FinFET Scaling Reaches Thermal Limit
Advancing to the next process nodes will not produce the same performance improvements as in the past.

Leave a Reply

(Note: This name will be displayed publicly)