Thinking Differently About Power

Making a significant dent in energy efficiency may require a much broader approach to design and processing.


By Ed Sperling
Battery life and lower electricity bills are now marketing tools for makers of SoCs, the mobile devices they go into, and servers that power data centers. A smart phone battery that lasts through the day without a charge, even when the user is playing high-action games, is a lot more attractive than one lasting only a few hours. And a data center electricity bill that shows a sharp reduction in operating expenses will gain the attention of both the CIO and the CFO.

But achieving these kinds of gains in efficiency requires a different way of thinking about the architectures of these systems, whether they’re SoCs or rack-mounted servers. In all cases, performance has to remain the same or improve, while power is reduced. And with battery technology improving a mere 3% to 8% per year, depending on whose numbers you believe, better batteries won’t be the answer.

New technology such as finFETs certainly will help. FinFETs reduce static current leakage by increasing the surface area, providing better control to reduce leakage in the “off” state and the ability to turn on and off more quickly. Intel, which began using this approach with its 22nm “TriGate” transistors, boasts a 50% power reduction or a 37% performance improvement. Equally important, leakage translates into heat, which can affect everything from signal integrity to overall system performance and user experience. No one wants to hold a hot phone.

FinFETs are only part of the picture. Accompanying them are new techniques such as dynamic voltage and frequency scaling, which adjusts both power and speed as needed, and near-threshold computing, which utilizes transistors even before they’re fully powered on. Silicon on insulator technology and other new substrate materials also can help lower power and reduce physical effects. But even more improvements will be necessary. So where will they come from?

One answer is software.

“This isn’t just about the SoC anymore,” said William Ruby, senior director of RTL power product engineering at Apache Design. “When you look at chips and other chips on the board and the display, that’s still not the whole system. It’s a combination of software and hardware, and the tradeoffs between the software and the hardware. How does the software control the hardware, and can it make the hardware smart enough?”

Software is a huge factor in designing more efficient systems, but the software has to be looked at from two angles. One is what the software is actually doing to save power, because it’s software that turns on and off various power islands in an SoC and which effectively manages dark silicon. For example, it’s the software that makes a cell phone screen go dark when it’s held next to your head and to light up when it’s moved away. A second way of looking at software, though, involves the efficiency of the software itself, which includes everything from embedded code and firmware to the operating system and middleware, and the applications that run on them.

There is no single solution for all software. Each piece has to be dealt with separately, and that’s a large part of the problem. There are millions upon millions of lines of code—some of it is bloated, some it is efficient. But software engineers don’t necessarily think about power consumption when they’re developing code, and the higher up the software stack the more removed they are from any kind of energy efficiency decisions. The closer the software is to the hardware, and the more closely it’s developed with the hardware, the more likely it will be written to be more efficient.

Applications developers, in contrast, write to application programming interfaces in the OS. Frequently they have no interaction with the hardware at all. Most don’t even take advantage of the multicore hooks in OSes. To some extent this isn’t their fault. While relational databases and the large commercial applications that utilize them scale quite well across multiple cores, most applications that run on smart phones, tablets or PCs do not. The exceptions are highly redundant applications, such as video rendering and graphics editing, as well as some games.

“What we need are energy-aware compilers that understand performance vs. power,” said Ruby. “With higher-level languages there is less opportunity to minimize power. You could always do that with assembly code because it’s not at the high level of abstraction.”

But assembly code is also slow to write and difficult to modify, which is why it was replaced with higher levels of abstractions in the first place.

Reaching outside the box
Another approach is to rethink processing. Cloud architectures already do this for storage. A next step is to figure out what processing has to be done locally and what processing can be offloaded into the cloud.

“One way to make batteries last longer is to offload computing to the server,” said Barry Pangrle, senior power methodology engineer at Nvidia. “You can put the graphics in the server room and play a game on a tablet by streaming the data. You can’t put a high-end graphics card in a mobile device, anyway, because it generates too much heat. And you can’t add that much processing power. This enables you to have capabilities you can’t get otherwise, and it uses less energy.”

The tradeoff is that it requires more I/O, but that still requires far less energy than a high-end processor and graphics card. It also requires better connectivity. Anyone with an LTE-enabled mobile device will notice a huge improvement in connectivity for Internet searches and streaming when they’re in an LTE reception area, but they also are painfully aware that LTE isn’t everywhere.

That’s likely to improve over the next couple years, along with the speed of the LTE connection. Eric Dewannain, vice president and general manager of the baseband business unit at Tensilica, predicts LTE will become ubiquitous over the new few years, and in the next five years that will improve further with the widespread introduction of LTE Advanced. LTE runs at download speeds of about 300 Mbps, while LTE Advanced is expected to achieve more than 1 Gbps.

“The goal is to have micro base stations that can provide service for everything with different bandwidths,” Dewannain said. “Right now you get plenty of bandwidth, but it’s divided between 100 users. In five years, those base stations will be on your home, which will make it far easier for people to enter this market.”

It also will make it more difficult to define the boundaries of a device. What gets built into an SoC, the device that houses an SoC and the cloud will vary, depending upon connectivity, how latency-sensitive an application is, and security issues involving the location of data.

“There are more things you can do in shrinking devices, but eventually you have to deal with the tradeoffs,” said Cary Chin, director of marketing for low-power solutions at Synopsys. “A lot of these decisions need to be made at the system level, and then outside the device at the product level. How much energy does it take to transmit a byte? That’s improving, but it’s not perfect yet. The communications path is the bottleneck right now.”

Chin said more pieces will need to be developed together to really make this approach work, such as how large the cache needs to be, how big the pipe is between the memory and the processor, and how the software takes advantage of all of this.

“You have to be able to get data back and forth quickly,” he said. “That will require a change in the algorithms that are used between cache, memory, and disk storage and at a macro level in communications with the cloud.”

Gene Matter, senior applications manager at Docea Power, said there are two categories of devices that clearly lend themselves to this kind of partitioning. One is latency tolerant, and may involve a mediocre, low-power sensor or a series of sensors. The second is sensor technology, such as gesture recognition, oil exploration, data mining and video processing, where precise data is part of the download but the processing is done afterward.