More Cores, Different Approaches

Tradeoffs become far more interesting as advanced process nodes open up additional real estate on chips.


By Ed Sperling

The general consensus among software developers is that some applications will never be able to take advantage of multiple cores, but that certainly doesn’t mean system designers can’t figure out ways to use more cores.

Nor does it mean that all cores are created equal. The picture that is emerging from multiple chipmakers shows the following trends:

  1. More cores have limitations for performance gains of many applications, but they can run multiple applications better simultaneously;
  2. Hardware accelerators can be added to some cores to boost performance of applications that are difficult to write in parallel;
  3. Advanced process nodes provide more on-chip resources to reducee some of the bottlenecks that existed in early iterations of multicore chips, and
  4. Scaling of cores to specific applications or functions can save huge amounts of power and boost overall performance of a system.

One of the most telling signs of what’s changing is evident in IBM’s Power7 architecture, which includes embedded DRAM (EDRAM) on the same chip as the processor cores rather than on a separate chip or somewhere else on the printed circuit board. For the portion of IBM’s customer base that includes datacenters, this is a significant shift in processor design because it speeds up overall performance by dramatically reducing the distance between the core and the memory.

Even without adjusting clock speeds, there are still performance gains by moving more functionality onto the chip and reducing the distance between various components. The speed of electricity and light are limiting factors in a processor, and the more that can be loaded onto a chip the better the performance. Memory is merely the first step. IBM is also looking at moving I/O functions onto the main processor, as well.

Percy Gilbert, vice president of silicon technology for IBM Semiconductor’s R&D center in Fishkill, N.Y., said that the addition of high k/metal gate technology at 32nm provided 2.8 times the performance and a 2x gain in performance in a dual core CPU vs. a single-core chip running at 45nm. He also noted that high k/metal gate will allow mobile processors—including multicore mobile processors—to run at clock speeds of greater than 1GHz.

“High k/metal gate is a game changer,” Gilbert said. “It can reduce gate leakage by more than 100 times and improve performance. We’ve seen a 70% improvement on PMOS (PFET) and a 47% improvement on NMOS (NFET). And by not putting in complex elements, you also increase overall reliability.”

Doing all of those things plus boosting clock speeds provides both energy savings plus performance gains. Intel, for example, has been steadily raising clock speeds since it first introduced multicore designs. In the past couple years, speeds of cores in multicore designs have risen from less than 1GHz to as high as 2.6GHz.

Sidestepping bottlenecks

Nevertheless, each process node brings new tradeoffs in design. When there was only one application using one core at a time, the bottlenecks were manageable—at least within the chip. Running multiple applications on multiple cores, using shared resources on the chip, adds a whole new level of complexity.

“The problem is that you have to increase bandwidth overall,” said Markus Levy, president of EEMBC, an independent benchmarking organization. “If you have two data intensive applications running at the same time, it can choke up the chip. You might have to time slice the application so the data intensive parts aren’t running at the same time. You may have one part that’s data intensive and another part that’s computationally intensive.”

He said the solution may look like load balancing on a chip or a series of distributed chips. Unfortunately, there is no standard for doing that kind of work and no public discussion at the moment about how it should even happen.

At least part of the issue also is that many attempts to solve this problem from a software standpoint involve legacy applications. In a relatively mature market for software, the number of new applications hitting the market and winning major market share is slim.

It’s possible to re-think applications from the ground up, but there needs to be resources applied to them and a clear business case for doing that. In many cases, it means risking market share with uncertain rewards at a time when other alternatives such as dedicated cores and better resource sharing offer significant gains. As with energy-efficient cars, change didn’t come quickly until the price of gas more than tripled and there was a clear business case for making those changes.