Multicore Programming: The Next Frontier?

If no one can figure out how to scale applications, there may be serious ramifications to the entire electronics industry.


By Ed Sperling

From a distance it looks like a game of hot potato. But this version is played by hardware and software engineers, who normally don’t have much to do with each other.

The hardware engineers say you can’t get any more performance out of a single core on a chip without cooking it, so they’ve added more cores and tossed the problem over the wall to the software engineers. But the software engineers say that while they can thread functions across cores, there are very few applications that actually will scale to use more cores without completely rewriting every software application at each new process node.

Companies such as Intel and IBM and most of the computer science departments at major universities are feverishly working on this problem. Unfortunately, they still haven’t come up with a solution, and the reason isn’t because this is a new problem. It’s been festering for four decades, and so far there isn’t a breakthrough. Programmers think serially, not in parallel, and there is no magic bullet to automate the programming.

David Patterson, the Pardee Professor of Computer Science at UC Berkeley and head of the parallelization effort there, calls multicore programming “the El Dorado of computer science” and refers to parallel computing as “an open research project.”

That may prove to be a polite assessment of the problem. More to the point, if there’s no breakthrough in software there will be no compelling reasons to upgrade computers or even handheld devices such as cell phones. Without performance upgrades, sales cycles will slip and the tech boom of the past 60 years either will begin slowing at an alarming pace or there will be massive shifts in how technology is sold and used.

“There is no killer multiprocessor,” Patterson says. “But programmers needing more performance have no choice except parallel processing.”

Where it works, where it doesn’t

That doesn’t mean parallel processing doesn’t work. Some applications adapt exceptionally well to multiple cores. In the commercial enterprise, databases and search functionality, for example, are showcases for what can be done with multiple cores. The individual tasks can be parsed onto as many cores or processors as are available. Often referred to as embarrassingly parallel tasks, these kinds of applications can scale almost infinitely with minimal tweaking of the application.

The same is true in the simulation world. Mentor Graphics last week introduced a parallel version of its Olympus SoC timing analysis and optimization engine that shows very little performance reduction when parsed onto different cores. The result is that two cores offers almost double the performance of a single core, and four cores roughly quadruples it.

“The problem is parsing into independent tasks and then bringing it back together again,” said Sudhakar Jilla, director of marketing in Mentor’s place and route group. To no small extent, that means understanding the application and its interaction with the processor so well that it can be broken down into distinct processes.

The same will never be true for most personal productivity applications. While you might be able to split some functions off of an Excel spreadsheet or Microsoft Word to take advantage of two cores, the same process would have to be repeated at four cores, eight cores, and so on.

UC Berkeley’s Patterson said people have been trying to achieve automatic parallelization for years. “We see hundreds of cores on a chip seven years out. Today, there is very little software taking advantage of the cores. Cores are idle almost all the time, and there’s plenty of reason for pessimism.”

Back to the drawing board

One solution may be a new language or languages to run on multicore chips. That ultimately may prove to be the best choice, but many people remain skeptical.

Intel has taken a first stab at the problem with a language called CT. Until now, Ct has worked largely on a shared memory system, but the company is considering whether to use a distributed computing environment approach so that an application can scale to every node on the system.

All of this will take time, of course. The first step is for libraries and frameworks to be parallel-enabled, which Intel believes will happen in the next one to two years. After that, it could take 5 to 10 years for the development language to become mainstream—something that will require lots of work on the part of Intel, its partners, and research currently being done by universities around the globe.

IBM and Microsoft also are working on their own versions of parallel programming. So far the companies have not released details of their efforts. But the goal in all cases is to “divide and conquer” by breaking down the pieces that can be run in parallel.

Add to that an inherent incompatibility between future chip strategies by both IBM and Intel. IBM has opted for heterogeneous cores in its future chips. Intel is focusing its efforts on homogeneous cores. It’s likely that the two worlds will merge with a mix of homogeneous and heterogeneous cores, but it raises some programming issues that are not yet resolved.

Security Issues

There are other challenges in the multicore world that don’t exist in the single core chip. Security, in particular, is much more of a concern because of the flow of data between cores.

“With multicore, there are new challenges to utilize the individual cores,” says Andrew Sloss, the ARM’s liason to Microsoft, said the difficulty is controlling communication across cores and avoiding “excessive broadcasting.”

“We define security as hardware protection that makes it too expensive to break into the system,” Sloss says, adding that in all systems important data needs to be isolated.

Business Issues

No matter how big this challenge looks, or how much pessimism accompanies it, most people involved believe the electronics industry has no choice but to solve it—or radically change their focus.

While corporate IT will continue to buy servers, the vast majority of electronics these days are sold into the consumer world. Typically, what sells new products are either dramatically lower power consumption and equal or improved performance.

If no one can figure out how to scale programs on multicore chips, or the uptake is limited to the current scientific and highly mathematical applications, then the road map for future chips shifts. Moore’s Law is still feasible, but it may no longer be relevant.