More efficient hardware, better planning, and better utilization of available power can help significantly.
Before generative AI burst onto the scene, no one predicted how much energy would be needed to power AI systems. Those numbers are just starting to come into focus, and so is the urgency about how to sustain it all.
AI power demand is expected to surge 550% by 2026, from 8 TWh in 2024 to 52 TWh, before rising another 1,150% to 652 TWh by 2030. Commensurately, U.S. power grid planners have doubled the estimated U.S. load forecast, from 2.6% to 4.7%, an increase of nearly 38 gigawatts through 2028, which is the equivalent to adding another two more states equivalent to New York to the U.S. power grid in 5 years.
Microsoft and Google, meanwhile, report electricity consumption has surpassed the power usage of more than 100 countries, and Google’s latest report shows a 50% rise in greenhouse gas emissions from 2019 to 2023, partly due to data centers.
This has put the entire tech sector on a worrisome trajectory. The chip industry had been doing well in terms of the amount of power being consumed for computation, which was matched somewhat with efficiency gains. Until AI, there wasn’t the big push for so much more compute power as is seen today, and many report they were caught by surprise. This may be why there is so much research into alternatives to traditional power sources, even including nuclear power plants, which are now being planned, built, or recommissioned.
“AI models will continue to become larger and smarter, fueling the need for more compute, which increases demand for power as part of a virtuous cycle,” said Dermot O’Driscoll, vice president of product solutions in Arm’s Infrastructure Line of Business. “Finding ways to reduce the power requirements for these large data centers is paramount to achieving the societal breakthroughs and realizing the AI promise. Today’s data centers already consume lots of power. Globally, 460 terawatt-hours (TWh) of electricity are needed annually, which is the equivalent to the entire country of Germany.”
To fully harness the potential of AI, the industry must rethink compute architectures and designs, O’Driscoll says. But while many of the largest AI hyperscalers are using Arm cores to reduce power, that’s only part of the solution. AI searches need to deliver more reliable and targeted information for each query, and AI models themselves need to become more efficient.
“AI applications are driving unprecedented power demand,” said William Ruby, senior director of product management for power analysis products at Synopsys. “The International Energy Agency in its 2024 report indicated that a ChatGPT request consumes 10X of the amount of power consumed by a traditional Google search. We are seeing this play out for semiconductor ICs. Power consumption of SoCs for high-performance computing applications is now in the hundreds of watts, and in some cases exceeding a kilowatt.”
The rollout and rapid adoption of AI was as much of a surprise to the tech world as it was to the power utilities. Until a couple years ago, most people assumed AI was plodding along at the same pace it had been for decades.
“You could argue the internet back in the mid-to-late ’90s was a big life changing thing — one of those once-in-a-generation type technologies,” said Steven Woo, distinguished inventor and fellow at Rambus. “Smart phones are another one. But with AI the ramp is faster, and the potential is like the internet — and in some ways maybe even greater. With so many people experimenting, and with the user base being able to do more sophisticated things that need more power, the semiconductor industry is being asked to try and become more power-efficient. In a lot of ways these architectures are becoming more power efficient. It’s just that you’re still getting dwarfed by the increase in the amount of compute you want to do for more advanced AI. It’s one of those things where you just can’t keep up with the demand. You are making things more power-efficient, but it’s just not enough, so now we must find ways to get more power. The models are getting bigger. The calculations are more complex. The hardware is getting more sophisticated. So the key things that happen are that we’re getting more sophisticated as the model is getting bigger, more accurate, and all that. But a lot of it now is coming down to how we power all this stuff, and then how we cool it. Those are the big questions.”
AI and sustainability
Where will the all power come from? Do the engineering teams that are writing the training algorithms need to start being more power-aware?
“Sustainability is something that we have been addressing in the semiconductor industry for 20 years,” said Rich Goldman, director at Ansys. “There’s been awareness that we need low-power designs, and software to enable low-power designs. Today, it comes down to an issue of engineering ethics and morality. Do our customers care about it when they buy a chip or when they buy a training model? I don’t think they make their decisions based on that.”
What also comes into play is how engineers are rewarded, evaluated, and assessed. “Commitment to sustainability is typically not included on what they must put into the product, so they aren’t motivated, except by their own internal ethics and the company’s ethics towards that. It’s the age-old ethics versus dollars in business, and in general we know who wins that. It’s a huge issue. Maybe we should be teaching ethics in engineering in school, because they’re not going to stop making big, powerful LLMs and training on these huge data centers,” Goldman noted.
Still, it’s going to take huge numbers of processors to run AI models. “So you want to take your data centers and rip those CPUs out and put in GPUs that run millions of times more efficiently to get more compute power out of it,” he said.” And while you’re doing that, you’re increasing your power efficiency. It might seem counterintuitive, because GPUs take so much power, but per compute cycle it’s much, much less. Given that you have limited space in your data center — because you’re not going to add more space — you’re going to take out the inefficient processors and put in GPUs. This is a bit self-serving for NVIDIA, because they sell more GPUs that way, but it’s true. So even today, when we’re at Hopper H100s, H200s — and even though Blackwell is coming, which is 10 or 100 times better — people are buying the Hopper because it’s so much more efficient than what they have. In the meantime, they’re going to save more on power expense than they are in buying and replacing. Then, when Blackwell becomes available, they’ll replace the Hopper with Blackwell, and that’s sufficient for them in a dollar sense, which helps with the power issue. That’s the way we have to tackle it. We have to look at the dollars involved and make it attractive for people to expend less power based on the dollars that go to the bottom line for the company.”
Meeting the AI energy/power challenges
Meeting the current and upcoming energy and power demands from large-scale deployments of AI, creates three challenges. “One is how to deliver power,” said Woo. “There’s a lot of talk in the news about nuclear power, or newer ways of supplying nuclear power-class amounts of power. Two is how to deal with the thermals. All these systems are not just trying to become more powerful. They’re doing it in small spaces. You’re anticipating all this power, and you’ve got to figure out how to cool all of that. Three involves opportunities for co-design, making the hardware and the software work together to gain other efficiencies. You try to find ways to make better use of what the hardware is giving you through software. Then, on the semiconductor side of things, supplying power is really challenging, and one of the biggest things that’s going on right now in data centers is the move to a higher voltage supply of power.”
At the very least, product development teams must consider energy efficiency at initial stages of the development process.
“You cannot really address energy efficiency at the tail end of the process, because by then the architecture has been defined and many design decisions have already been made,” said Synopsys’ Ruby. “Energy efficiency in some sense is an equal opportunity challenge, where every stage in the development process can contribute to energy efficiency, with the understanding that earlier stages can have a bigger impact than later stages. Collectively, every seemingly small decision can have a profound impact on a chip’s overall power consumption.”
A ‘shift-left’ methodology, designing hardware and writing software simultaneously and early enough in the development process can have a profound effect on energy efficiency. “This includes decisions such as overall hardware architecture, hardware versus software partitioning, software and compiler optimizations, memory subsystem architecture, application of SoC level power management techniques such as dynamic voltage and frequency scaling (DVFS) – to name just a few,” he said. It also requires running realistic application workloads to understand the impact.
That’s only part of the problem. The mindset around sustainability also needs to change. “We should be thinking about it, but I don’t think the industry as a whole is doing that,” said Sharad Chole, chief scientist at Expedera “It’s only about cost at the moment. It’s not about sustainability, unfortunately.”
But as generative AI models and algorithms become more stable, the costs can become more predictable. That includes how many data center resources will be required, and ultimately it can include how much power will be needed.
“Unlike previous iterations of model architectures, where architectures were changing and everyone had slightly different tweaks, the industry-recognized models for Gen AI have been stable for quite a long time,” Chole said. “The transformer architecture is the basis of everything. And there is innovation in terms of what support needs to be there for workloads, which is very useful.”
The is a good understanding of what needs to be optimized, as well, which needs to be balanced against the cost of retraining a model. “If it’s something like training a 4 billion- or 5 billion-parameter model, that’s going to take 30,000 GPUs three months,” Chole said. “It’s a huge cost to pay.”
Once those formulas are established, then it becomes possible to determine how much power will be needed to run the generative AI models when they’re implemented.
“OpenAI has said it can predict the performance of its model 3.5 and model 4 while projecting the scaling laws onto growth of the model versus the training dataset,” he explained. “That is very useful, because then the companies can plan that it’s going to take them 10 times more computation, or three times more data sets, to be able to get to the next generation accuracy improvement. These laws are still being used, and even though they were developed for a very small set of models, they can scale well in terms of the model insights into this. The closed-source companies that are developing the models — for example, OpenAI, Anthropic, and others are developing models that are not open — can optimize in a way that we don’t understand. They can optimize for both training as well as the deployment of the model, because they have better understanding of it. And because they’re investing billions of dollars into it, they must have better understanding of how it needs to be scaled. ‘In the next two years, this is how much funding I need to raise.’ It is very predictable. That allows users to say, ‘We are going to set this much compute. We’re going to need to build this many data centers, and this is how much power I’m going to need.’ It is planned quite well.”
Stranded power
A key aspect of managing the increasing power demands of large-scale AI involves data center design and utilization.
“The data center marketplace is extremely inefficient, and the inefficiency is a consequence of the split between the two market spaces of the building infrastructure and the EDA side where the applications run,” said Hassan Moezzi, founder of Future Facilities, which was acquired by Cadence in July 2022. “People talk about the power consumption and the disruption that it’s bringing to the marketplace. The AI equipment, like NVIDIA has, is far more power-hungry perhaps than the previous CPU-based products, and the equivalency is not there because no matter how much processing capability you throw at the marketplace, the market wants more. No matter how good and how efficiently you make your chips and technology, that’s not really where the power issue comes from. The power issue comes from the divide.”
According to Cato Digital, in 2021, 105 gigawatts of power was created for data centers, but well over 30% of that was never used, Moezzi said. “This is called stranded capacity. The data center is there to give you the power to run your applications. That’s the only reason you build these very expensive buildings and run them at huge costs. And the elephant in the room is the stranded capacity. However, if you speak to anybody in the data center business, especially on the infrastructural side, and you say, ‘stranded capacity,’ they all nod, and say they know about it. They don’t talk about it because they assume this is only about over-provisioning to safeguard risk. The truth is that some of it is over-provisioning deliberately, which is stranded capacity. But they do over-provisioning because they don’t know what’s going on inside the data center from a physics point of view. The 30%-plus statistic doesn’t do the situation justice in the enterprise marketplace, which is anybody who’s not hyperscale, since those companies are more efficient given their engineering orientation, and they take care of things. But the enterprises, the CoLos, the government data centers, they are far more inefficient. This means if you buy a megawatt of capacity — or you think you bought a megawatt — you will be lucky as an enterprise to get 60% of that. In other words, it’s more than 30%.”
This is important because a lot of people are jumping up and down about environmental impacts of data centers and the grids being tapped out. “But we’re saying you can slow this process down,” Moezzi said. “You can’t stop data centers being built, but you can slow it down by a huge margin by utilizing what you’ve already got as stranded capacity.”
Conclusion
Generative AI is unstoppable, and attempts to slow it are unrealistic, given its rapid spread and popularity. But it can be significantly more efficient than it is today, and this is where economics will drive the industry. What’s clear, though, is there is no single solution for making this happen. It will be a combination of factors, from more efficient processing to better AI models that can achieve sufficiently accurate results using less power, and utilizing the power that is available today more effectively.
Related Reading
AI Drives IC Design Shifts At The Edge
Rollout of artificial intelligence has created a whole new set of challenges, along with a dizzying array of innovative options and tradeoffs.
HW and SW Architecture Approaches For Running AI Models
Custom hardware tailored to specific models can unlock performance gains and energy savings that generic hardware cannot achieve, but there are tradeoffs.
Leave a Reply