Every aspect of data center energy use must be optimized to reduce power consumption and enable more sustainability, from chips to transformers and edge compute.
Data centers and high-performance computing (HPC) are the primary enablers of today’s power-hungry AI-driven technology, but chip designers, EDA vendors, and the data centers themselves have a long list of options available to them to help curb AI’s power consumption.
Chip designers play a critical role in ensuring energy efficient processing from the bottom up, whether that is hardware-software co-design techniques, optimized hardware and software architectures, better AI models and data management, or low power techniques. The fundamental problem is that power is a finite resource, and AI data centers are not considered essential users. Case in point: The state of Texas recently enacted a law allowing grid operators to turn off the power supplied to data centers when consumer demand is high.
But while power itself may be in limited supply, there is no shortage of options for reducing the amount of power needed to run AI. Some of those are recycled ideas, such as re-using the heat in district energy systems, which are underground networks of insulated pipes that can deliver hot water, steam, and chilled water to nearby buildings, or re-using old EV batteries. Other solutions are new, such as moving more compute to the edge or adding more granularity into power management at the chip level.
No single option is sufficient by itself, but there is a long list of partial solutions that can have a significant impact on power consumption. Among them:
“Performance is growing, so efficiency of data movement, efficiency of computation, and then efficiency of the infrastructure, logistics, and cooling — all of that factors into every piece, as well as doing software optimizations to use hardware less for the same compute,” said Mo Faisal, CEO at Movellus. “Are you actually computing with less? Are you actually doing optical and others that are somewhat more in the right granules and are more power-efficient? Power efficiency in every aspect of the stack is necessary for a data center, especially right now.”
Efficiencies must be gained at every level, but the low-hanging fruit starts with chip design. “The most significant gains in data center efficiency can be realized through the tight integration and co-optimization of hardware and software,” said Frank Schirrmeister, executive director, strategic programs, system solutions in Synopsys’ System Design Group. “This results in tailored system architectures, in which the hardware and software are optimized for each other and for specific applications. It can involve offloading specific software tasks to specialized hardware accelerators for maximum efficiency, and early analysis can be done using transaction-level, model-based early architecture analysis tools that also directly connect to the hardware/software implementations.”
Co-design is an obvious target, and the chip industry has been talking about it since the 1990s. But adoption has been spotty. “The problem is the software guys don’t really understand hardware, and the hardware guys don’t really understand software,” said Andy Heinig, head of the Department for Efficient Electronics at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “Also, with the tools currently available for hardware-software co-design, it’s not that easy for a software guy to really get an improvement in power efficiency. The tools are too complex. They are too hardware-related. They are not hiding the hardware or doing a good enough abstraction of the hardware, so you really can focus only on the software. You always need a lot of hardware knowledge to do the software part. Besides that, we also see that on the software side we have problems finding embedded programmers, because people stay in embedded software, or one level higher on high-level languages like Python. From that perspective, it’s hard to get people here.”
Software development practices and architectural choices can have a profound impact on the overall energy consumption of a data center.
“The efficiency of the software itself is a primary factor,” Schirrmeister said. “Code that is optimized for performance and low power can be assessed early using early low power analysis on hardware emulation and simulation, and can directly reduce the number of CPU cycles required for a given task. At the system-level, sophisticated scheduling and orchestration systems are key to dynamic resource management. Workload-aware and power-aware schedulers can intelligently place and consolidate workloads on the most efficient servers. In that context, AI-driven optimization can be used for real-time monitoring, predictive analytics, and dynamic allocation of resources based on demand.”
At the data center level, there eventually will be opportunities to look at how workloads are deployed on clusters. “How those things are scheduled allows you to manage power across your entire data set and potentially look at, ‘If I know that I have alternate energy supplies that are more effective during certain times of the day, I might even be able to move some of these workloads around to operate in the most efficient location,” said Vikram Karvat, chief operating officer at Movellus. “There’s a challenge there, because you have a lot of data associated with the workloads. It’s a data gravity problem. It’s really heavy. You can’t really move it. To move data around very efficiently has a significant cost. Are you going to recoup it to shuffle around? There’s going to be some balance. I don’t know where that balance will be, but people will be looking at a lot of those techniques to try to optimize this.”
Heterogeneous integration with domain-specific accelerators that are optimized for different workloads can help, as well. “It could be CPUs, GPUs [or] ASICs,” said Ankireddy Nalamalpu, vice president and senior general manager for data center and artificial intelligence at MediaTek, during a recent panel at the Design Automation Conference. “All this needs to co-exist at the data center level, and software needs to efficiently manage it at the data center level. That’s the only way to really deliver better power efficiency, and by doing so it will be more sustainable. Sustainability will be the natural byproduct of being power-efficient. It’s not the other way around, because there is no way to go after sustainability without really going after the efficiency.”
Fig. 1: Areas for improvement. Source: DAC panel, “Performance versus Sustainability: The Challenge of Energy Efficient Computing.”
Another way to look at data center power use is to try to get more out of the power being used. “Data center power use is described by an overall system and looked at top-down,” explained Nandan Nayampally, chief commercial officer at Baya Systems. “When I was in the Arm physics team, we were solving data center efficiency. The initial point was, ‘Don’t think about it as cutting my power down, because already I’m powering rack to X.’ It becomes, ‘How much more density of performance can you give me in this thing?’ You can think of it turned on its head. You’re going to build all the infrastructure for all of this. If I can give you five times the performance in the same power, that’s as good as saying I’m going to take one-fifth of the power, because nobody is asking for five times less performance.”
Memory plays a big role in this, as well. “Memory is central to data center efficiency,” said Steven Woo, fellow and distinguished inventor at Rambus. “By adopting high-bandwidth, low-power memory technologies and optimizing memory hierarchies for AI and data-intensive workloads, we can significantly reduce energy consumption. Innovations like CXL and tiered memory systems are newer technologies that can also help to unlock both performance and sustainability.”
Sustainability
Sustainability and total cost of ownership or operational costs can align to help everyone.
“Whether people are making things more efficient because they really want to save the environment, in reality, you have to solve the same problems because the power consumption is so off the chart,” observed Michal Siwinski, chief marketing officer at Arteris. “People will have to build a nuclear power plant once a month to keep up, and that’s not plausible, so reducing the power profile, providing more energy efficiency, and looking at architectures that indirectly solve the sustainability challenge is happening. That’s happening for cost reasons and just pure pragmatic implementation reasons. You cannot build something that is going to melt. You have to build differently.”
Minimizing power consumption goes a long way toward improving sustainability. “We can help optimize to limit the amount of power used, for example, using strategies like clock gating and signal gating,” said Daniel Rose, founding engineer at ChipAgents. “Ironically, we do use some AI farms, but we think it’s a good investment. If we spend a little power to talk to our AI models, they can help us make chips that are much more efficient. I’m personally very optimistic about using nuclear to run these AI systems and AI farms. It’s a self-perpetuating loop of more sustainability.”
With low-power models like DeepSeek, what can run in a data center may perform just as well on an edge network, preventing data from being needlessly sent back and forth via the cloud. However, edge compute is more cost-sensitive in the marketplace. “The tradeoffs include whether it needs different silicon,” said Baya Systems’ Nayampally. “Does it use the same silicon? At every phase, you’re going to need to make that more efficient. Those are all innovations that will help you do the same kind of inference capability on the edge. If you start adding the issue of security and privacy, that’s when you want more and more of your data to be closer to the endpoint than to the data center.”
AI and data solutions
As high-performance compute problems get solved, parallel computation engines are going to be necessary. But the resulting data will be increasingly bottlenecked by data movement, said Rick Crotty, chief FPGA architect at Lattice Semiconductor, during the DAC panel. “We need to think even more on how we optimize data locality for parallel engines using the memories and the compute, leveraging reconfigurable fabrics, making use of scheduling. Energy efficiency isn’t just defined by tops per watt, but by bytes moved per watt, and getting that computation close to where the sensors and data is.”
Fig. 2: Small improvements add up. Source: DAC Panel “Performance versus Sustainability: The Challenge of Energy Efficient Computing.”
More is needed than just better GPUs. “The training energy, the training compute for these long context quadratic complexity algorithms is crazy, but it’s easy, and the market and the economy support it now,” said Peter Beerel, professor at the University of Southern California. “Yet DeepSeek and others are coming around. ‘Hey, let’s think a little bit differently about how we train these things.’ There are people, including myself, who have looked at linear approximations for transformers. These may not work as well. We need to think about hybrid, but in the end, we need to think about a more efficient algorithm that still can compete with the transformer-based algorithms that are dominating the world.”
AI scientists could come up with an efficient linear approximation to the transformer, or something better than stochastic gradient descent (SGD), or to build in more effective reinforcement learning during the training, said Beerel. “There are certain spiky neural networks. There’s a whole field where it’s simply yes or no. That’s something of interest, particularly attractive for event cameras (also known as a neuromorphic camera or dynamic vision sensor). There is a space for it, but again, less research because it’s easier to jump on the transformer bandwagon and get 1.1% accuracy on that data set. Until it settles down, there won’t be an economic push to focus on other things.”
Ilya Ganusov, fellow and director of programmable architecture at Altera, said that while transformer architectures were created to increase the efficiency of GPUs, they made it worse because of the Jevons Paradox.
For this reason, MediaTek’s Nalamalpu believes a systems approach is needed. “The hardware and software need to come together on one side. The second side is really creating the larger systems, looking at not the component level, but the system level, so that you can really talk about better, efficient designs.”
In addition to better AI models, chips, and systems, the DAC panel agreed that more processing has to happen at the edge. Lattice’s Crotty cited cameras, lidar, and radar as examples. “If there’s nothing meaningful going, that doesn’t need to be computed in a data center,” he said. “At the system level, think about data proximity in architectural optimization. Local data buffering and storage reduces off-chip power. In addition, designers need to think about sensor integration. Bring the semiconductor, the logic content, closer to the sensor so that we don’t have to send data backwards.”
USC’s Beerel took it a step further. “Because you’re dealing with video, often you can predict which regions of the camera don’t need to even be read out and go to the ADCs, which are the power-hungry components. A lot of this logic can be done 3D right on top of the camera, or near the camera. People are even talking about within pixels. In-pixel computing is a thing now. Combine some analog computing, and some really efficient things can be done in that space. There are a lot of us who focus on the edge, and we like to live in that world where every movement matters. This is why IoT devices are getting more powerful, and why cameras are getting more powerful.”
Panel moderator Rob Aitken, program manager for the National Advanced Packaging Manufacturing Program within the U.S. Commerce Department, drew a comparison between a camera data set and the equivalent in LLMs with nothing changing. “There are times when you are not speaking to your device,” he said. “There are aspects of what’s important and what’s not important, even in your speech. So token dropping techniques, different forms of compression, different forms of kV (key-value cache) eviction —even routers that ask, ‘Is this a hard task that you’re asking an LLM to do or is it an easy task?’ — there are a lot of algorithmic things we can do to only use the right-size LLM for the task at hand.”
Hardware still key
In her DAC keynote, Michaela Blott, senior fellow at AMD Research, doubled down on the fact that optimizing AI models is important, but by itself that’s not enough. “You might say, ‘Hey, we have all these software improvements. What do we need to do on the hardware side anymore? We might as well not bother.’ It’s not true. We are not going to get five orders of magnitude [improvement] out of the software. We have to turn over every single stone. We cannot afford to leave any efficiency gains behind. We have to do it on all levels. Condensation and sparsity are massive levers. Runtime schedulers and mappers introduce inefficiency. DeepSeek even customized their distributed file system to get more efficiency. We have to do all of this.”
In essence, compute can’t simply be scaled by adding more commodity servers or more of the same old racks. “That doesn’t work, certainly not with any of the sort of typical constraints of what it takes to build out a power infrastructure,” said Mohamed Awad, senior vice president and general manager for Arm’s infrastructure line of business, in the opening address at DAC. “The goal is to cram as much compute into a given rack, into a given warehouse, into a given data center as possible. The simple truth is, if we don’t push the per-watt boundaries as far as we can, AI will not scale and won’t realize the level of our aspirations.”
Saving power on chip
While experts agree that efficiency needs to happen at every pain point, Movellus’ Faisal said it is important to start at the source of power use. “I don’t think only one solution is going to win here,” he noted. “We need all of it. But I would put the weight on wherever the power goes. Where does it start? Where is it consumed? This is at the transistor. If I don’t solve the problem there, everything else is just an after effect.”
Karvat agreed. “There’s going to be more use of advanced power management techniques — whether you’re talking about things like DVS, DVFS, DFS — essentially being able to turn knobs on frequency and voltage based on the workload. You’re not just sitting there running the chip at 0.85, 0.9 volts at 2 gigahertz all the time. You’re now going to see in granular fashion that this particular workload doesn’t need this. I have the opportunity to dial back the voltage to 0.5, and dial back my frequency simultaneously. It becomes a cubic function of power reduction. The finer grained you can make that from a time domain perspective, the more efficient your power management story. You can think of peak power versus average power, where average power could be 20% more than peak power. That’s free energy to be recouped. Not to mention, if you put a watt in, it takes you another point or quarter watt to get it cooled. So, there’s a multiplicative effect of saving on power and cooling. It’s a righteous cycle.”
On the other hand, over-designing IP leads to excessive power use. “Every single company that delivers IP, they over-design their IP. They over-program their IP,” Faisal said. “The main reason is risk. No one wants to have the IP that fails. Also, it has to go into multiple systems otherwise it’s overly generalized. That comes at the expense of inefficiency. There are a lot of opportunities to be had where maybe AI can solve this problem. AI can break these barriers, like the data just needs to flow a bit more with a higher bandwidth. Once that happens, maybe we can stop over-programming and over-designing and over-protecting, because at the end of the day, when it comes to power, it’s a power optimization problem. It’s a tradeoff between mean time to failure versus efficiency. I can drop the voltage on a chip to really low, but then if I’m going to fail once a week, maybe that’s too expensive for me. I can dial it up and say, ‘I fail now once a month. That works for my workloads.’ There’s a failure versus efficiency tradeoff, and it’s a lot of low-level software like DeepSeek. They gained a lot of efficiency by going in and doing very low-level reprogramming of every single compute unit. We’ll start seeing more of that.”
Conclusion
Incremental gains can add up to meaningful savings. “Whether designers are looking at more advanced power management with baby steps, or looking at more aggressive, full control of their gradient frequency applied in autonomous fashion, there’s going to be a range of people’s comfort level and how fast these solutions are deployed,” said Karvat. “But when deployed in their entirety, there is opportunity to recruit potentially 10% to 15% power at the chip level, and then using some of these other techniques, potentially being able to improve 20% at the data center. People are fighting to get low, single-digit improvements right now. So that would be a huge win, but it’s only one element. When you combine those things with the continued innovations and how our models work, more effective or efficient use of our models, that will help.”
Related Reading
Crisis Ahead: Power Consumption In AI Data Centers
Four key areas where chips can help manage AI’s insatiable power appetite.
Leave a Reply