Unpredictable ambient temperatures are impacting data center architectures.
Data centers are hot, and they may get even hotter. As climate change impacts temperatures around the world, designers are changing the computing hubs that are tied to nearly every aspect of modern life to make them more efficient, more customized, and potentially more disaggregated.
These shifts are taking on new urgency as the tech industry grapples with months of sweltering temperatures on multiple continents. In July, record-breaking hot weather in London led to outages at a Google data center and another one used by Oracle. That same month, European and Japanese heatwaves triggered consumer warnings from Valve and Nintendo to prevent overheating in gaming devices. And earlier this month, authorities shut down factories in the Chinese chip manufacturing hub of Sichuan to ease heat-related strain on the power grid.
Experts contend this is just a preview of things to come. Since 1981, the global average surface temperature of the earth has increased at a rate of 0.32°F (0.18°C) per decade. That small change on a global scale can cause extreme weather events and myriad other problems. Scientists say the warming trend will stabilize if — and only if — countries around the world follow strict emissions targets set forth by the United Nations. However, a recent report from a U.N. panel shows those goals are still a long way from being achieved, and many regions can expect to see more heat waves in the future.
Concerns about the temperature inside data centers may seem trivial in comparison to other climate change consequences — think heat-related illnesses, crop failures, and mass migrations — but in the age of cloud computing everything from health care to warfare requires data centers to process and store huge amounts of data. Dramatic swings in ambient temperature can affect how much power is required to keep hardware cool enough to work, and extra electricity often is not available during heat waves. For data centers located in typically cool places, such as Oregon’s Columbia River Gorge or Iceland, global warming may push data centers beyond safe operating temperatures.
“Climate change is having an impact on data centers broadly in the sense that it is limiting the potential for providing free cooling opportunities,” said Dustin Demetriou, senior technical staff member of Hardware Sustainability and Data Center Innovation at IBM. “Many of the energy efficiency directions that were made by limiting or eliminating mechanical refrigeration will be diminished. Regarding increasing chip power, we’ve reached a point where system architecture including layout, components, and configurations needs to consider thermal management up front. Staying with air cooling means we’re going to see changes in form factor, the need to limit components, and/or airflow rates that exceed the ability to deliver in the typical data center. Because of this, we’re likely to see an acceleration in the need for liquid cooling.”
Limited range
Data centers are highly controlled thermal environments, and for good reason. In addition to risk of damage to electronic components, heat increases electrical resistance and restricts the movement of electrons in complex systems. Chips run slower, they age quicker, and they are less reliable in general. Applications may become unresponsive. Hot data centers are also less secure. In 2015, researchers used heat emissions to demonstrate how electronics could be hacked, even if they are air-gapped.
Designers across the semiconductor industry already are fighting a battle against rising temperatures even without factoring in climate change. As digital logic becomes smaller, dynamic power density increases. In many of these advanced designs, the whole purpose of migrating to the next process node is to be able to process more data faster using the same amount of power. But with finFETs and GAA FETs, heat can get trapped between vertical structures, making it harder to dissipate and requiring more active cooling.
Regardless of whether it is cooled with air or liquid, there is more heat to eliminate, which in turn requires more power. Data center power usage estimates currently range between 1% and 2% of all global energy usage, a large portion of which is related to cooling operations.
There is no single solution to this problem, but there are multiple steps being taken to improve energy efficiency, from processing to memory/storage. In most cases, the key metric for new chip and system architectures is performance per watt. The whole edge buildout, and the ensuing disaagregation by hyperscalers such as Amazon, Google, and Microsoft, is aimed at reducing power consumption by shrinking the distance that data needs to travel.
There is growing interest in performance and power monitoring systems in data centers that would provide server-specific intelligence to improve performance and cooling efficiencies. This is particularly important in data centers that utilize load balancing to reduce thermal density. “There are monitoring sensors that can be integrated in IC devices that can alert the system to dynamically improve the cooling by changing the fan speed, for example, when the temperature of the parts are elevated“ said Rita Horner, director of marketing system solution group at Synopsys. “It’s not just about using monitoring sensors, or making lower power devices, but the full system needs to be evaluated holistically and how the different parts work together and be further optimized”.
And more domain-specific approaches, such as custom accelerators for specific data types, and hardware-software co-design can further improve the energy efficiency of systems. “A time-proven method of reducing system power is through chip integration,” said Andy Jaros, vice president of sales at Flex Logix. “Data centers are a huge user of field-programmable gate arrays, so we see FPGA integration into future data center ASICs. It’s a natural progression as it removes redundant power-hungry circuits found in FPGAs, such as SerDes, and instead leverages those circuits that are already on the ASIC. A smartly designed ASIC eFPGA also can reduce the amount of FPGA fabric required, reducing power even further.”
Global warming also is pushing other technology to the forefront that has been largely sitting on the sidelines. Case in point: There is much more focus on digital twins to help identify anomalies in performance caused by thermal variation. This was a key driver behind Cadence’s recent acquisition of Future Facilities, which performs data center cooling analysis and energy performance optimization using 3D digital twins. Such technology allows companies to model out various new conditions before they happen — like a previously uncommon weather event or a new range of seasonal temperatures— and change the data center architecture accordingly.
Fig. 1: A digital twin simulation of cooling effectiveness inside a data center. Source: Cadence.
“Assessing the seasonal impact is one of the items where digital twins of the data center help,” said Frank Schirrmeister, senior group director of Solutions & Ecosystem at Cadence. “You can add a higher heat level in the summer and see that you need to move your loads around. As everything becomes more programmable, the networking and even the organization of data can impact energy consumption. We’re predicting that early when we can look at the software running on the virtual platforms. That’s where digital twins become more and more critical, because you can impact the change early on. You can change the networking architecture to adjust where you process which part of the software. That helps optimize the overall thermal load. We can optimize the airflow based on where the heat is on the board.”
The fidelity of the model becomes crucial in this scheme. How accurate is the digital twin to the actual data center? Artificial intelligence is another critical component, and there is growing demand for digital twin applications aimed at optimizing the network, and how data is stored and transmitted, Schirrmeister said.
Techniques that have been used in other markets are now being deployed in data centers, as well. Piyush Sancheti, vice president of system architects at Synopsys, says dynamic voltage and frequency scaling (DVFS), is now being deployed at a more holistic level in data center chips. “Engineering teams are also using adaptive voltage frequency scaling, not just reacting to it, but predicting what optimal voltages and frequencies should be. AI algorithms are being explored here as a way to look into the workload and be able to predict the optimal operating point. What you’re dealing with is at a mega scale, and so little bits of savings at the chip level multiplies by thousands or hundreds of thousands.”
Using resources more intelligently
Energy consumption and water usage are often top of mind when it comes to discussions about climate change as those issues impact hardware on macro and micro levels. The public’s perception of how water- and energy-hungry data centers contribute to climate change are impacting architecture and design all the way down to the chip level.
“With climate change on everyone’s mind, especially given another drought-plagued summer, the impact has not been lost on data centers,” said Brig Asay, director of strategic planning at Keysight. “At odds is the unrelenting need for more data versus the difficulties that climate change is presenting. Data center capacity must continue to expand to meet consumer needs. However, data centers no longer have the luxury of infinite power as rivers begin to dry and water becomes more scarce. Accordingly, data center designers today are contending with more difficult power requirements that are being pushed all the way through the chips themselves. The drive for lower power is enabling a resurgence in chiplet and co-packaged optic technologies. It is also pushing more disaggregation into the data center to maximize efficiency. Data centers must be able to stay connected.”
Conclusion
Sustainability is a hot topic among tech companies these days, and energy usage inside of data centers is an important piece of that discussion, whether the data center is a hyperscale cloud, an on-premise collection of servers, or some near- or far-edge facility. In all cases, climate is a factor, even if it isn’t always obvious.
“The climate-related adaptations data centers make has become part of the competitive advantage for the company behind it,” said Synopsys’ Sancheti. “Sustainability is no longer something that is being talked about as a cost or burden. This has now become a competitive differentiator because everyone is looking at the efficiency metrics. As a result, it’s not an after-the-fact consideration. It’s a headline conversation in any type of business engagement with a data center.”
And those concerns will only increase as the impact of climate change become more obvious. More data needs to be processed somewhere, and that needs to happen using the same or less energy, regardless of the ambient temperature. That poses a significant challenge to the entire electronics supply chain, and particularly to the semiconductor designs that will make it all work.
Leave a Reply