Taming The Multicore Beast

Work is under way to solve some of the thorniest problems in the history of software, but it won’t happen overnight.


By Ed Sperling

Multicore chips are here to stay. Now what?

That question is echoing up and down the ranks of tools vendors, design engineers, software developers and even among people who measure the performance and efficiency of semiconductors. There is now a Multicore Expo and a Multicore Association that includes a who’s who of electronics. And there are lots of working groups developing different strategies to tackle this Hydra-like creature that has befuddled the best software minds in the world for four decades.

Why multicore?

Multicore was firmly on the horizon for chipmakers when they hit the 130nm process node. By the next process node, they realized, it would be impossible to turn up clock speeds without cooking the chip. For all intents and purposes, classical scaling—gaining performance at each new process node—ended at 90nm. The solution was to add more processing cores at lower speeds, and hand off the burden to software developers to fix the problem. After all, it’s hard to argue with the laws of physics.

This explains why most major university computer science departments now are dedicating a significant portion of their research to solving the conundrum of how to program multiple cores. The problem is interesting and the payoff can be huge to anyone who solves it.

It also helps explain why Intel invested $218.5 million in VMware in 2007, which is a safety net for utilizing more cores on a chip. If software can’t be developed to run on multiple cores, at least multiple instances of an operating system or multiple operating systems can run on the chip using virtual machines. Intel is adding “turbo mode” to its upcoming chips, though, which allows more of a chip’s total horsepower to be utilized in bursts on a single core if the application demands it.

Designing multicore chips

What becomes painfully obvious as you descend from 60,000 feet on the multicore world is that one core is not necessarily the same as the next. It can be. There are homogeneous cores in semiconductors made by companies such as Intel and Freescale, and there are heterogeneous cores in systems on chip, and sometimes there are both in SoCs.

While it’s easier to design a chip with homogeneous cores—you simply develop it once and then figure out the best way to share memory and bus traffic patterns—that approach isn’t nearly as efficient for a multifunction device such as a smart phone. The reason is that every application requires a different amount of processing power, and assigning the maximum to each one isn’t an ideal strategy.

In the embedded world, ARM has taken a first stab at this problem with its ARM11 MPCore multicore processor, which can be configured for one to four cores.

And to simplify building of the chips, all of the major EDA tools vendors either have or are working on multicore elements to their flows. Mentor Graphics has been working in multicore debugging with its Seamless products, Synopsys has added multicore for verification, implementation and manufacturing, and Cadence has added multicore support for virtuoso. Expect to see more announcements from these and other vendors over the next few months, as well as virtual prototyping solutions and faster simulation.

Where’s the application software?

So now that the tools to make the chips and follow them through the verification and manufacturing are being prepared, what’s next?

The next piece is application software, and most of the code that has been written in the past has been written using a serial approach. There is no easy way to compile that onto multiple cores, although there are tools to help.

Criticalblue just introduced its Prism tool to help parallelize legacy code. While you still can’t push a button to make it all work, and you can’t rework applications for two cores and have them fully take advantage of 32 cores, this kind of tool is a step in the right direction.

Another important piece of the puzzle is mapping the software to the interconnect. PolyCore has developed a middleware layer and tools to do that, distributing functions to different cores—something that is vital in multicore topologies, where shared busses and memory create problems that never existed in single-core chips.

Finally, Virtutech has developed a simulated environment for multicore applications with its Simics tool, creating what-if scenarios for applications.

But all of these tools still don’t produce the kind of volume of new applications that can be scaled across many cores. Sven Brehmer, president of PolyCore, said the gap between hardware and software is larger than it has ever been—and it will take years to close that gap again.

“There is a broader group of developers using multicore but they don’t know how to develop software yet or they don’t want to spend money on this problem,” Brehmer said. “There is no magic bullet here, but the open source community sees a need to simplify multicore. We’ve solved a portion of the problem but there’s a lot of work to be done and it has to be done at a pace that works for software developers. You can’t go from two to six cores overnight.”

But at least there is an incentive. “With all the potential monetary rewards, something will come out of this,” said Markus Levy, president of the Embedded Microprocessor Benchmark Consortium (EEMBC).

Hype vs. reality

When multicore programming gains critical mass is another matter. For all the talk about multicore initiatives, the reality is that there has never been a consistent industry effort to making multicore approaches work. And the problems of parallelizing software in the past have been confined to a small circle of computer science researchers at universities and at companies like IBM and AT&T. There has never been a massive effort to solve the problem because for the most part it didn’t have to be solved.

Making matters even more confusing, it’s hard to get a straight answer about what’s real in multicore and what isn’t. Just because software can run on a multicore machine doesn’t mean it runs faster on four cores than on one. In fact, some software may not take advantage of more than one core even though it will work on a four-core processor. “There are a lot of companies taking existing stuff and putting a new label on it and saying it’s multicore compatible,” said Levy.


Source: EEMBC

What has worked exceptionally well in the multicore world are applications that can be parsed into specific pieces. Graphics and video rendering work particularly well, for example. Imagination Technologies, a U.K.-based IP vendor, builds scalable multicore graphics engines that parallelize the computing below the application level, Tony King-Smith, vice president of marketing for the company’s technology division, said during a keynote at the recent Multicore Expo.

“We can parallelize from 1 to 4 pipes and beyond, and we can multicore 1 pipe to 64 cores,” he said. “But to do this, you have to get the architecture right. If you get it wrong, you’ll spend too much effort on overhead.”

Freescale has taken a similar approach with its multimedia DSP technology. Kent Fisher, chief systems engineer for Freescale’s networking and multimedia group, said the big decision for his division is whether to use more smaller cores or a few larger cores. “It depends on your application,” he said. “And until the software tools catch up to the hardware, frequency and infrastructure per clock will continue to matter.”

He noted there is a problem in multicore power specifications, as well. He said that not everyone specifies power the same way.

Splitting the atom

From a software application perspective, there are several challenges that need to be considered. First, there needs to be a proper balance between splitting up different functions and splitting those functions into too many parts.

“What you really need to do is find the relative load of each function of an application,” said PolyCore’s Brehmer. “If you have a computation, you may be able to duplicate that on multiple cores. But you also have to look at data dependencies, because you can’t break a function out if it depends on data from some other place. Otherwise you’ll just be waiting for that data.”

That’s at least a major step toward understanding the resources that will be needed on a chip, which works well with homogeneous multicore systems. The next step will be better utilization of heterogeneous cores, which will require an understanding of application functionality all the way at the chip architecture level. It doesn’t make sense to have the same level of power for all parts of an application if those pieces are not identical in importance or the amount of processing that’s required.

And finally, some software development may be done with much thinner layers of an operating system—or even direct execution into the metal—as multicore SoCs become more integrated into system-level design.

The promise is better performance and ultimately lower power consumption, but it’s going to take time, committed effort of engineers and scientists, and collaboration from groups that in the past have never spoken the same language. Multicore is also multidisciplinary, and that’s a whole different problem to solve.