Rethinking Big Iron

Are low-power processors suitable for high-performance computing? In some cases, yes.


By Ann Steffora Mutschler

One size does not fit all when it comes to the server market, and that may be the best option for low-power processor makers to gain a toehold in a world that until now has been almost laser-focused on performance.

Even higher-performance versions of low-power processing architectures are starting to show up inside of datacenters. Many are application-specific uses with a focus on efficiency, but with electricity and cooling costs continuing to escalate, other options are beginning to cross the CIO’s desk at many corporations. And that’s only part of the picture of what’s changing.

“The first thing to observe is that there isn’t one server market,” said John Heinlein, vice president, marketing for the physical IP division at ARM. “There is a family of server markets. I liken it to an onion. At the core of the onion, you may have super high-performance databases, transaction processing—the needs are very specific. It’s a market that’s dominated by Intel and it probably will be for a while. But there is a wide range of markets, as you get outside of the onion, with different needs. And like the outer layers of the onion they tend to be bigger—end-point servers, datacenter file serving, vertical applications (Facebook, Twitter), all those kinds of vertical applications that are quite specific. They tend to run on a specific operating system. Those kinds of applications today ship in many, many more servers because they have so many end-point customers, so they are where the power matters. If you think about where to fix the power problem, you fix the power problem where the numerator is high.”

If the datacenter manager can increase the density and the throughput-per-unit power or throughput-per-square-foot in the datacenter or both, there can be measurable cost benefits. This shift is driving sales at Applied Micro, Calxeda, HP Project Moonshot, and a project between Marvell and Baidu—all with the goal of radically increasing the density and reducing power in the server domain.

Chris Rowen, a Cadence fellow, agreed. “It’s absolutely clear, because of the nature of scalability, that it is so much a function of how much computing power you can pack into a given volume of space. That is one of the metrics of how effective a total machine you can build. How tightly you can pack it really affects how you’re going to get power into it, how you’re going to get heat out of it.”

Of course, this is highly dependent on the nature of the system-level architecture and the nature of the application in terms of how well it can be scaled, he explained, because all of these high-performance computing and high-end server applications are relying extraordinarily on multicore configurations. That assumes you get nearly linear performance benefits from additional cores. “If you have one of those problems that is inherently parallel at the core level, then you want to find the best MIPS-per-watt or FLOPS-per-watt with appropriate communication that you possibly can. And what we have thought of historically as low-end cores are extremely interesting as the basic building blocks for that kind of a system.”

He believes it is inevitable that for highly scalable problems, people will migrate to architectures that drive power significantly lower even than where we are today because it will be possible to do so.

Dipesh Patel, executive vice president and general manager for the physical IP division at ARM, said this approach is in stark contrast to server discussions in the past that were centered around super high performance, which translated to super high-power requirements in the datacenter. “That was where the industry came from because it was all about up and to the right, but the total number of servers in those days was not that many. Nowadays, with the cloud and these hosted applications, things have changed.”


Along with the benefits of applying low-power processors to high-performance computing come challenges. Drew Wingard, chief technical officer at Sonics, there are two challenges in particular that need to be considered. “One is that there are some applications, which the computer scientists like to say are ‘embarrassingly parallel,’ and when you’ve got an application that is embarrassingly parallel you can continue to find new things to do with as many processors as you throw at the problem. In those domains all that matters really then is the amount of energy it takes you to do an operation, and a lower-power processor uses less energy to do the same operation than a higher-power processor. Now of course it can’t do it at the same frequency, it can’t get the same amount of work done per unit of time, but if your application parallelizes nicely so that you can use as many processors as you throw at it, then it becomes attractive. That’s what I would call the high-performance computing angle on it.”

Interestingly, he noted, the biggest competition here isn’t the higher-powered processors. It’s graphical processing units, because the GPUs have lots of little processors in them. “You already have little processors that often are missing some of the capabilities of the general-purpose processors, but for some of these embarrassingly parallel applications, you don’t care because you’re doing a lot of math.”

Another part of the challenge is related to the things that go into Internet server rooms, and those are not high-performance computers, Wingard pointed out. “Those are things that are focused on different kinds of transaction processing, web serving, and database operations. Both Google and Facebook have been presenting lots of papers at conferences about what kinds of machines they want to buy. But those machines spend a lot of time waiting, and what they care about almost more than anything else is the total power bill for the server room. A big chunk of that thing doesn’t go into electronics, it goes into cooling.”

In addition, there are the initiatives at a wide variety of semiconductor companies, some noted above, looking at building ARM-based servers. Those normally are not focused on high-performance computing. Instead, they’re about trying to manage the thermal budgets inside the server rooms.

“It has a lot more to do with trying to get the right mix of capabilities at the right thermal budget,” said Wingard. “So low-power processors have a role to play in there. Are low-power processors great at high-performance transaction processing? No they are not. If your throughput is limited by your disk drives, like in a transaction processing environment, then you probably want higher-performance processors so you can get as much done, so you can keep that disk drive as busy as possible. Those things aren’t so embarrassingly parallel that an array of smaller processors is a better thing.”

Looking ahead, Rowen expects to see extraordinary gains in efficiency for these large-scale systems, as well as evolution and specialization because both commercial, high-end cloud server and high-performance computing have similar characteristics in the ability to exploit many, many processors. “There’s really a lot of independence of those threads that those will evolve separately because one of them—the commercial server—needs almost no floating point performance, for example. The HPC, in contrast, needs huge floating point performance, and so we will see diversity of these ultra-low power, ultra high density, ultra large scale kinds of applications over time.”

This represents a significant departure from the past. “It’s a very exciting area and there’s lots of activity there as people move down through these stacks,” he noted. “There are huge amounts of throughput-per-watt compared to where we are today to the limit where the processors will barely show up. Then it shifts the problem to: Where do I get the memory bandwidth at lower energy or how do I do the core-to-core, chip-to-chip, board-to-board, rack-to-rack communication at significantly lower energy? Those are all going to be the next round of questions.”