Writing Software For Low-Power Systems

Multiprocessing vs. multicore, scaling, up-front planning vs. back-end fixes, and other things you absolutely need to know to make systems work.


By Ed Sperling

Almost any discussion of software in low power systems these days involves some sort of multicore approach.

That is particularly true at 90nm and below. At 65nm, unless there is a very distinct purpose for a low-power single-core device, it probably is utilizing at least two cores, and at 45nm the numbers can continue to rise, depending upon how many functions the chip is being used for and how important processing power will be.

For developers used to working in the symmetric or asymmetric multiprocessing world, where single-core processors arranged in arrays within the same device and tied together by middleware and very fast connectors, moving everything inside a chip actually makes low-power design simpler. In the SMP or AMP world, it was impossible to turn processors on and off. That’s already standard practice in multicore chips, which is a more controlled environment for running software than the multiprocessing world.

But designing software for multicore devices requires a lot more up-front planning than back-end work-arounds to really save power.

First of all, it’s important to note up front that not all applications can be parallelized to take advantage of multicore, and of those that can very few can be compiled once and scale to more cores as they become available. It’s a great concept, and multicore chip companies like Intel and IBM say great progress is being made, but there’s a whole other group that will counter with, “Don’t count on it.”

Moreover, multiprocessing was optional for applications. At 65nm and below, multicore chips are the norm. If software can’t utilize more than one core, the other cores are useless.

Second, multicore can mean many things. In a system on chip, it typically involves heterogeneous cores. In a processor, the cores are generally homogeneous. Writing software that takes advantage of many cores requires a multiprocessing operating system and applications that can be run in parallel. In an SoC, the software can be divided up by function using everything from a multiprocessing operating system like Linux to real-time operating systems that are written for a very specific function.

“The real trick is that if you break up an application, you have to do it at the modeling level,” says Irv Badr, Rational senior product marketing manager at IBM. “Breaking it at the source-code level is very difficult. If you break it at the modeling level, it’s as simple as pushing a button. You want the ability to move things around by asking ‘What if?’ That is very important. You also need to make sure when you’re modeling that the software isn’t coupled to the hardware. Some hardware can be used by a lot of software.”

More problems, more tools

A number of tools have been created to help migrate existing software to multicore architectures. The most recent is Prism, which is made by Critical Blue. It allows developers to analyze and explore code changes to take advantage of multicore hardware doing everything from dependency analysis to recalculation of the scheduler on multiple cores.

“The software guys didn’t ask for multicore,” said David Stewart, Critical Blue’s CEO. “But the only way we’re going to get more performance is if the software guys react.”

Intel, meanwhile, has created its own programming language to migrate applications to multicore architectures. Known as Ct, the language helps to parallelize applications that can run in parallel. The key to working in this type of environment is understanding the application well enough to know what can be split off and run on multiple cores and what cannot—and how much overhead there is in pulling the pieces back together for the user.

Ct isn’t the first language to attempt to ease the burden of parallelization instead of sequential software development. Software engineers who have been working in the multiprocessing world for awhile say it probably won’t be the last, either.

In Europe, a consortium known as eMuCo, for the embedded Multi-Core Processing for Mobile Communication, is taking a different approach by developing a standard platform for future mobile devices based on multicore architectures. The stated goal is to develop the controller, operating system and application layers. Members include ARM, Infineon, Telelogic, GWT-TUD, as well as four universities.

Promises, promises

If all of this can be made to work, there is enormous upside from both a performance and a low-power perspective. In devices such as a smart phone, for example, cores regularly are put into sleep mode. That can extend the battery life from hours to days, and in some cases even weeks.

Already, work is underway that teams up some unlikely partners. ARM’s Cortex controller is being combined with IBM’s Cell processor, for example, in a 60-core deployment on multiple chips, said IBM’s Badr. He said that in the enterprise, multicore can reduce power consumption by a factor of three, which allows blade servers to run three times as long because they run cooler.

But there’s a catch, too. While there’s money attached to making it work right this time, the problem has been studied for decades without major breakthroughs. The jury is still out on just how many cores is enough and how many is too much, and which software will work in what configuration. But given the realities of physics on a piece of silicon, there will be at least some multicore headaches in every programmer’s future.