Better performance is a relative term, depending upon the type of chips, what algorithms they’re using, and how much it costs to run them.
Data centers are beginning to adjust their definition of what makes one server better than another. Rather than comparing benchmarked performance of general-purpose servers, they are adding a new level of granularity based upon what kind of chips work best for certain operations or applications. Those decisions increasingly include everything from the level of redundancy in compute operations, such as search, to highly individualized and prioritized architectures that can be reprogrammed on the fly.
What’s driving this shift is the rising cost of powering and cooling hundreds or thousands of racks of servers, coupled with the need to improve performance of critical operations using some or all of the cores of a multi-core processor without sacrificing performance of other less-critical operations. And none of this can impact the overall power budget.
The first wave of efficiency improvements in data centers was based on virtualization, which provided linear improvements in efficiency due to increased utilization of servers. New approaches that are being developed take those efficiency improvements several notches further, adding a level of optimization that has never been seen before in the commercial big iron world.
The changes roughly parallel developments that have become almost standard for SoCs in high-end smartphones, where most of the chip is dark most of the time and hardware accelerators are used to improve performance of individual functions. Those accelerators—ASICs, DSPs, or some other custom chip—typically are co-developed in hardware and software to improve performance using less energy and fewer clock cycles. But maximizing those capabilities in conjunction with the resources of a vast and ever-changing data center is an ambitious new concept, and big chip companies ranging from Intel and AMD to newcomer ARM, as well as a number of big server companies, are hard at work on bringing this concept to market.
“Just throwing more MIPS at problems is not working,” said Kevin Krewell, principal analyst at Tirias Research. “We’ve already seen some of this in the high-performance computing space where FPGAs and GPUs are being added to get more compute power. But some algorithms don’t scale, and the efficiency of the core is not going up. We’ve thrown almost everything we can at these algorithms, and now we need to do things differently.”
The changes involve a number of different kinds of chips—a mix of FPGAs, GPUs, heterogeneous CPUs, and hardware accelerators co-designed with software, which AMD has termed Auxiliary Processing Units. They also potentially involve new architectures for chips, including 2.5D and 3D-IC stacked die configurations where there is more room to add specialized processors into a stack, along with a shorter signal path and higher throughput with interposers and through-silicon vias.
“There’s a new level of experimentation to get more performance in the same amount of space,” said Krewell. “One approach is to extract more out of the data faster. Baidu, Microsoft, Google and Facebook are all doing this. Facebook is working on better pattern recognition so that when you look at images it will identify your friends faster. There are still a lot of applications that are standard, but if you look at the algorithms for computer vision, pattern recognition and other forms of artificial intelligence, those are all new.”
Data centers always have been somewhat heterogeneous. There is frequently a mix of hardware from different vendors ranging from commodity PC servers to mainframes and specialized hardware. But most of that hardware has been used interchangeably, depending upon what’s available at any point, in some cases with priority given to a particular operation. The emerging strategy is to add hardware for specific applications and purposes, rather than just shuffling processing jobs from one server to the next, or stacking them up using virtual machines. And the prioritization of individual operations will be much easier to change, depending upon the needs of the organization.
What’s driving these changes is money. Until power costs became an issue, the priority of many data centers was consistent power and almost uninterrupted uptime. That explains why there are a large number of them are still located in Arizona, where there is almost no seismic activity and an abundance of nuclear-generated power. Over the past decade the focus has shifted to cutting cooling costs with ambient air flow rather than with powered chillers.
A second way to cut energy costs, and one that is now beginning to gain attention, is to locate datacenters closer to hydroelectric generators, where the loss from A/C transmission over long distances can be sharply reduced and the overall cost is lower. The bottom line is it’s cheaper to move data than electricity. And it’s cheaper to operate data centers using new chip architectures and configurations, different server rack configurations, and more flexibility in which banks of servers are utilized, when they’re utilized, and for how long.
“The new metric is computation per watt,” said Wally Rhines, chairman and CEO of Mentor Graphics. ” The aggregate cost of building a data center is equal to about 2.5 years of power for that data center. These big data centers can save hundreds of millions of dollars in power costs and air conditioning. And the cost of computing has decreased over the years, so power as a percentage of the IT budget is increasing. That’s why a lot of people are developing new server architectures these days. These are special-purpose computers for special-purpose functions.”
One of the challenges here is to understand where there are massively redundant operations and where those processing resource requests are unique. It’s the highly redundant ones that can be optimized. That makes customized chip architectures an obvious choice for big search and social media companies, and most are designing their own chips to optimize search rather than floating point, or visualization rather than search.
For everything else, there are incremental benefits, but as with everything in a data center humming with row after row of server racks, it all adds up. “A good way to look at this is standardized customization,” said Anand Iyer, director of marketing for the low power platform at Calypto. “They’re not completely getting away from what is being built, but they are offering customization. This whole industry started as a vertical industry with processors. Then everything became standard. Now we’re going back to verticals.”
Re-verticalization allows for other kinds of optimization, as well. “Several companies have their own C compiler teams,” said Steve Roddy, senior group director for the Tensilica business unit at Cadence. “If they can get 5% to 8% better code performance, that boosts the overall efficiency. They’re working off of search time or energy efficiency per square foot, so building their own compilers helps.”
Specialization is critical to improving those performance metrics, and in many cases that includes the server architecture, and often multi-server architectures. How fast does the network processor need to be? What is the throughput between different chips in a server or between different servers? And what kind of memory should be placed where? In many respects, this is place and route on a grand scale.
“It’s a per-unit calculation,” said Roddy. “It’s not whether the CPU is running at 3GHz. It’s whether you can run it at lower power and have greater efficiency per gigawatt. That requires load balancing, pairing, and specialty compute engines where it makes sense. The software investment in all of this can be massive, depending upon the class of company. This is where we’re seeing software-defined networking being used.” https://en.wikipedia.org/wiki/Software-defined_networking
In a data center, how data is moved within a chip, between chips, across the data center and out to the rest of the world can affect performance as much as the individual compute processors, which is why there is an increasing blurring of the lines between networking and compute hardware in data center design.
“It’s not just about the processor, it’s about the connectivity fabric and the networking fabric,” said Arvind Shanmugvel, director of applications engineering at Ansys. “This is why we’re starting to hear more about optical for high-bandwidth Ethernet, where the optical signal is transduced and transmitted over Ethernet. It’s still a nascent market, but people are working on it.”
These kinds of metrics also help explain why Intel is buying Altera [see reference 1] and why Huawei has licensed the ARM core to build its own chips in-house.
“There has been a lot of experimentation in Microsoft Labs with FPGAs, and with other algorithms on GPUs,” said Tirias’ Krewell. “ARM is designing chips with varying performance levels, too. Each is trying to define its own place in the data center.”
ARM certainly has generated some ecosystem momentum with its microserver architecture. AMD now offers x86 and ARM-based chips with its Opteron X and A series processors. Hewlett-Packard likewise offers ARM-based Moonshot servers as well as powerful ProLiant x86 servers. And other companies ranging from Lenovo and Dell have thrown their hats into the ARM ring, as well, with announcements about ARM-based server development.
But it’s also easy to lose sight that there are other very significant players in this market, such as IBM’s POWER architecture, which is very prominently used in its Watson supercomputer and in its mainframes, and Oracle’s Sparc-based all-in-one systems.
As with an SoC, this is no longer just a simple place and route of known components. It’s a tradeoff of where racks are positioned and how they’re constructed, the air flow within and between racks, the overall power consumption, estimated usage within performance ranges, as well as how it will run a stack of software. There are no metrics for that other than the bottom line cost of power and customer satisfaction (internal and external) on response time.
“The amount of simulation required and the thermal constraints on all of this are what keep us awake at night,” said Ansys’ Shanmugvel. “We’re always pushing boundaries.”
But when it comes to the modern data center, even the boundaries are increasingly movable and measurable—and infinitely more complex.
Reference 1: One of the discussion points across the semiconductor industry involves the whether Intel’s acquisition of Altera was due largely to a concern that Altera would move its business out of Intel Custom Foundry, thereby impacting Intel’s ability to offset some of the costs. However, numerous sources confirm that Altera’s current volume is likely in the single or low double digits of the overall foundry’s chip output. While still significant, it’s not enough to seriously impact the economics of Intel’s foundry business. More important is the possibility that combining Altera’s FPGAs with the Intel Architecture could drive future growth in existing and new markets.