Improving Energy And Power Efficiency In The Data Center

Optimizing energy and power efficiency in server chips and software is a multifaceted challenge with many moving parts.

popularity

Energy costs in data centers are soaring as the amount of data being generated explodes, and it’s being made worse by an imbalance between increasingly dense processing elements that are producing more heat and uneven server utilization, which requires more machines to be powered up and cooled.

The challenge is to maximize utilization without sacrificing performance, and in the past that has been achieved largely by adding capacity. But with an estimated 3% of the energy production in the United States consumed by data centers, and nearly 30% of that number wasted due to low utilization, even the most efficient data centers are looking at where they can reduce margin. So as the world’s insatiable appetite for data increases, and as more data centers are built out on the edge as well as the cloud, there is a concurrent effort to push beyond previous acceptable numbers with better monitoring and control and more efficient architectures.

A significant portion of this falls on the shoulders of the semiconductor ecosystem, which plays an increasingly important role in improving energy and power efficiency.

Fig. 1: Where energy is being wasted in data centers. Source: Cadence

“Most people don’t think of cooling,” said Arif Khan, product marketing group director for PCIe, CXL, and Interface IP at Cadence. “They just think the data centers are out there running. To the average person, it’s something amorphous in the cloud. They say, ‘Hey, Siri,’ but nobody thinks what’s happening at the edge or in the cloud where all of the processing happens. It’s like fast fashion. We get used to making numerous redundant queries, but everything costs something in terms of energy.”

A number of different approaches are emerging to improve energy efficiency. Google, for example, has been using machine learning since 2014 to optimize cooling in its data centers in order to maximize power usage efficiency (PUE). The company also uses smart temperature, lighting, and cooling controls to further reduce the energy used at its data centers. Google reports these efforts have yielded promising results, and on average, a Google data center is twice as energy efficient as a typical enterprise data center. Compared with six years ago, Google said it now delivers around seven times as much compute power for the same amount of electrical energy.

“They’ve been eating their own dog food,” Khan observed. “They throw machine learning at the problem to figure this out, because done a formal way, it becomes an intractable problem. Can they do things to make that aspect better? How do they consume less power and make this a little greener? Google has a list of 19 factors going into their neural network, and they try to optimize parameters for that. If you’re trying to build a formal model for this, it would be insane. It’s difficult to say what the algorithm will do, and to precisely manage what comes out of these machine learning models. But if it gives you a better outcome, then we can learn to live with AI/ML models.”

That’s just one of many energy-related improvements. “On one side, more and more application-specific processors are needed,” said Andy Heinig, department head for efficient electronics at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “There are two ways to implement them. For larger volumes it is possible to design and manufacture specific SoCs. On the other side, for smaller volumes, the chiplet approaches is a good way to get such systems. In that case, different pre-designed and manufactured components can be combined into a new system and only a small part — the new accelerator — must be developed from the scratch. With this approach, less design effort is necessary, instead of designing and manufacture a whole SoC. On the other side, the memory interface should be optimized because currently a lot of energy is wasted on the SoC memory interface. Different new package concepts are available on the research side for better implementation of the chip memory interface.”

From the semiconductor point of view, the focus is on squeezing as much as possible from every possible place that you can, even if it’s just 3%,” Khan said. “It’s a challenge on every front. Design teams want to figure out the last bit in terms of microwatts that can be extracted out of their design. PPA is squeezed for both active power and standby power. Why are you putting it in standby? Why can’t you actually turn the power supply off? Those are the kind of tradeoffs that have to be made in all of the different power modes.”

Tony Chen, product marketing director for PCIe and CXL IP at Cadence, said more users are looking into the standby power state as well as the deep sleep power state, which wasn’t the case 10 years ago. “Now, every single conversation we have, at least with the customers in the high-performance computing space and data center space, are asking how fast they can go into the standby state and come out of it. When something goes idle, they want to go into standby immediately. But when the workflow presents itself, they want to come out and recover quickly, because they still have to keep that response time in check.”

Similar to a mobile phone, energy consumption is increasingly important. “If it takes forever to wake up, then we get tired of the phone,” Khan said. “You want to save battery, but you also want to respond quickly from the black screen to coming back up active. There’s a tradeoff between the sleep state and the resumption time. If you have to resume, that means all the data must be retained somewhere. It can go up into some faraway memory that is in backup and then come back close to the CPU, where it can really light up the screen. If you have to keep it close somewhere, you can’t really turn it off. It has to be in some retention state somewhere, which is again, eating away at your batteries. The same thing goes for computer workloads.”

This helps explain the push to develop new computer architectures. “These large databases that used persistent memories are because of that. Database workloads are being moved to persistent memory so that power can go off. They can have the work sets in persistent memory, and they can switch context easily, so on and so forth,” he said.

One result is increasing pressure on the x86 architecture and more widespread adoption of the Arm architecture. “The x86 was always more powerful, but more power hungry,” said Marc Swinnen, director of product marketing at Ansys. “Arm was power-lean, but not quite as strong as the x86 performance. But Arm has been coming up in performance, and now they fight at the boundary. Today’s power concerns have tilted the playing field a little bit in their favor, and Arm deployments are being seen in the data center space, including the AWS Graviton, for example, along with Fujitsu’s Arm-based supercomputer, Fugaku. There, Arm leverages one of its strengths, namely power efficiency.”

But more compute elements and more granular compute architectures also add to the complexity of designs, as well as increasing the amount of work required to develop new server chips.

“It’s not just multiple physics,” said Rich Goldman, director at Ansys. “It is multiphysics interacting with each other. You need to run your simulations considering all these different physics together.”

With specific models, the power output of a chip can be modeled. “But that depends on the temperature, and the temperature depends on the power output, so it’s a bit of a chicken-and-egg situation,” Swinnen said. “If you want to simulate that in the board and system, there must be a model for the chip ‘at this temperature, with this power,’ along with all the other components radiating heat on their own. The CFD fan cools it, you simulate the entire thing, and come to a stable convergence point where the chip is at the operating point on the power output to that point. Again, it has to all be simulated together. It’s not as if the chip’s power output is independent of temperature, which is sort of odd. It is sort of the backwards way most people look at it. The temperature depends on the chip power output, but it’s also the other way around.”

There is definitely a role for CFD in data centers to simulate the movement of air, Goldman said. “That’s really key. If we can simulate that before we build data centers, and build an optimal data center for air movement, that will go a long way toward cooling those things.”

That makes improved utilization both possible and economically viable. “A traditional CPU in the good old days was doing one job at a time,” said Cadence’s Khan. “Those days have been gone for 10 or 15 years now. There was multi-threaded programming, but there was still a dedicated task, a dedicated owner of one machine. We then moved to hypervisors that allow different programs to run on the same machine. And if you look at what happens in the cloud, even with companies that have their websites running on Amazon, for example, a small business doesn’t actually get the entire server. What you’re getting is a little slice of a server, a shared server. You’re running in context on some machine. Many different web servers are actually running on the same server as virtualized servers.”

This is one way to increase utilization, he noted, by allowing different jobs to run on the same system. “Docker and other such micro services take that to a further level, where you’ve got micro services that are run and jobs are written far more efficiently to communicate across some orchestration software like Kubernetes, or the like. Also, hypervisor technology has improved quite a bit, which allows you to extract as much of the CPU that’s available on a server. All the capacity, once it’s on, it’s on. So you’ve turned on the CPU, the power supplies are on, compute capacity is available, and you can run additional jobs. The marginal cost of running an additional job is low, so you can migrate more processes on it. That’s really the essence of getting the incremental utilization on it.”

Energy in the data center
Qazi Ahmed, principal product manager at Siemens EDA, observed that while much of the focus has been on power, there is a growing focus on energy and energy management. “There has been a lot of chat about energy for some time, primarily derived from the fact that the overall goal is to achieve energy-efficient designs. The lifecycle energy requirements of, let’s say, a server or maybe a mobile phone, has to be low. What’s happening is that people typically tend to focus on power at the IP level, and they might try to reduce and optimize power by a certain percentage. Let’s say they save 20% dynamic power. How much does that actually contribute to energy efficiency at the system level? Sometimes it might not do much. You might see overall energy efficiency gains of even less than 1%.”

As the energy equation is power into time, it’s important to note that time is an important factor because it tells something about the way a block functions. “In the overall scheme of things, at an SoC level, some of the blocks might not have that high toggle activity, but they might be active for quite some time,” Ahmed said. “Other blocks may have bursts of information coming in, or may have high toggle activity, and then they just remain silent. If we look at the amount of work done, in that case, probably the amount of work done by a block that does not toggle so much, but dominates the functionality most of the time might actually end up consuming more energy. And that might be a good place to start for optimization.”

Scott Durrant, DesignWare IP solutions marketing manager at Synopsys, likewise pointed to energy efficiency as a rising concern. “There’s a big drive right now toward net zero carbon footprint for data centers. This is a huge challenge because data centers are large consumers of power today, and every element in the data center is going to come into play around that. As SoC designers are putting together products for deployment in data centers, especially server products because you multiply those by tens of thousands in a typical hyperscale data center, even the switching infrastructure, there’s at least one switch in every rack. There’s a lot of silicon in that switch so the network infrastructure, the compute infrastructure, and storage infrastructure as well, which is growing hugely. In all of these devices, we are going to have to increase the energy efficiency of each of them.”

This is just one of the drivers for alternative architectures. The data center is shifting toward what has traditionally been a mobile device architecture in terms of optimizing for power.

“Mobile devices for years have been trying to maximize battery life, and doing that by minimizing power consumption of devices — being able to shut down certain pieces of the device that aren’t in use at given point of time,” Durrant said. “And we’re seeing similar implementations in the data center today in order to maximize power efficiency. Also, there have been new processor architectures introduced that historically have been used in mobile devices, like Arm processors now targeting the data center infrastructure. Arm and RISC-V are sufficiently open that you can optimize them for a particular workload.”

Cadence’s Chen agreed. “Engineering teams are measuring the energy efficiency right now. Ten years ago you could get away with 10 picojoules per bit. Today when the customer comes to us, they give us a hard requirement that the high-speed I/O should only take about 5 picojoules per bit. That will be their requirement. So that compared to 10 years ago, they have half of that. We have to be more efficient in terms of energy usage per bit.”

There are various aspects to this. “Different kinds of architectural decisions are taken at system level and IC level,” said Ahmed. “Typically at the IC level, users might have one, two or maybe four different kinds of scenarios. They might run them. These might come from compute average power, and they’ll see an idle power case, a peak power case. They will then try to see, for instance, in the idle power case whether there is leakage power, even though work isn’t being done. In the peak power sense, which means maximum average power, I’m doing the most amount of work and consuming a lot of power. In between, when I have different kinds of normal functional modes, I might be spending some power. Is this linearly scaling depending upon how much work am I doing versus how much power I am consuming? That often turns out to be untrue, because more power is consumed while not much work is being done, and that happens potentially due to wasted toggles in the design. To eliminate that, there are many strategies for both optimizing power for registers and for memories. There may be data that is written into a memory quite often, but it’s just a portion of the configuration. Maybe you can just write that on a flop and keep the memory only working when it’s needed — when you have large sets of data coming in. Or you might want to do fine-grain clock gating or some micro-architectural changes to reduce the number of toggles to achieve some power reduction.”

Still, this doesn’t always translate into energy savings at the system level.

“When you look at the system level, it is entirely possible that some of the blocks you were focusing on at the system level on a real use case scenario,” Ahmed said. “Let’s say you have an emulator that runs out a really long trace of a mobile phone like when someone is playing a game, in that case some of the blocks, might be active only some of the time, but a lot of blocks might be active most of the time. Considering that kind of information, you want to architect the power in a way where you can optimize the workload to maximize the energy efficiency. In order to do that, there are standard techniques at the system level for hardware including assigning voltage islands so some of the blocks that are not performance oriented might actually work at a lower voltage, and that saves a lot of power. The SoC can be divided into multiple power domains, and some of the blocks that are ideal, and are working can actually be power gated to save the leakage power as well. Interconnect memories you can state the first toggle, you can put the memories to light sleep. There are many methods to deal with that at the hardware level.”

If energy is taken as the primary metric, design teams should set a target for energy efficiency (i.e., 10%) in the SoC. “This could happen on the software side, and some of it could happen on the hardware side,” Ahmed said. “You need to have all the kinds of workloads that you expect the device to actually operate in. The better you can make that happen, the more likely you are to understand which blocks in hardware are good for optimization, where you need to focus, and how much power you need to reduce in order to achieve that kind of energy efficiency. It’s almost like people look at power reports today. In the near future, people will be looking at power/energy reports. That will become really important.”

For power, engineering teams look at power at a summary level for the design or hierarchical level, along with leakage power, switching power, internal power, and total power. The same thing is needed for energy, such as switching energy or total energy or total memory energy. That energy number will be just as important as the power number.

The challenge is that when just one decision is considered, energy only makes sense if done at a higher level, such as the SoC level. This means 50 or 100 workloads should be run at the same time to make better decisions.

On top of that, analysis is needed, Ahmed said. “The behavior of the SoC must be captured or, at the IP level, how energy actually changes when a particular microarchitectural choice is made, or when a different algorithm is chosen. These are the kinds of analyses people should do when the goal is energy efficiency. It’s also important that downstream tools are aware of energy and power optimization. Current implementation tools are basically made for performance, so whether you do synthesis, whether you do place-and-route, the performance is the primary criteria. If you want to have energy-optimized designs, then the downstream tools have to start taking power or energy as the primary criteria. Any optimization, or any downstream techniques to choose the cell, for example, has to be driven from the point of view of energy, not just performance, depending upon how it is set up. For example, glitch power is a major contributor to dynamic power in a lot of designs. One way to avoid glitch power is to reduce the number of toggles, or re-architect the design a bit to reduce those toggles. At the same time, at the physical level, high-speed cells should be used to allow glitch power to be managed. High-speed cells also can be used for power, but in this case you’d choose a high-speed cell not just because you want performance, but because you want to reduce glitch. This is one small example of how downstream tools need to get aware of energy/power optimization.”

Other applications, such as AI training, are pushing the boundaries with performance tradeoffs and power. “We’re still in the early stages of the AI training movement,” said Frank Ferro, senior director of product management at Rambus. “The model sizes are growing exponentially, so the processing just can’t keep up with the size of those models and the training times are taking longer and longer. We’re still scratching the surface on that problem.”

From an application level, as some of these AI algorithms mature, the networks will get pushed out to the edge, resulting in trained networks there. “You push that network out to the edge, and now you’re starting to see a little bit of uptick in AI inference,” Ferro said. “Where you’ve got the models trained, they’re running a specific application, those models are where you now have to deal with lower power, lower cost. You want to save money in the data center infrastructure where cost is always a concern but relatively speaking, you have more flexibility on cost. Power is always a big concern, but at the edge of the network then you’ve got to process faster, cheaper, at lower power.

Conclusion
Optimizing and achieving optimal energy and power efficiency in the data center is a multifaceted challenge with many moving parts. New physical techniques for cooling, such as immersion liquids are beginning to surface more frequently, along with advanced AI/ML algorithms to manage the data center. For hyperscale providers building their own data centers – including everything from chips to servers – it’s likely all energy and power measurements, analysis, and optimization techniques will be utilized. That, in turn, will help the rest of the ecosystem understand what’s needed to meet current and future energy, power and sustainability goals.

Related
Can Coherent Optics Reduce Data-Center Power?
The good and bad of replacing electrical signals with optical ones.
Changing Server Architectures In The Data Center
Sharing resources can significantly improve utilization and lower costs, but it’s not a simple shift.
Shifting Toward Data-Driven Chip Architectures
Rethinking how to improve performance and lower power in semiconductors.
RISC-V Targets Data Centers
Open-source architecture is gaining some traction in more complex designs as ecosystem matures.



Leave a Reply


(Note: This name will be displayed publicly)