Managing kW Power Budgets

Strategies for dealing with increasing compute demands from AI and other applications.

popularity

Experts at the Table: Semiconductor Engineering sat down to discuss increasing power demands and how to address it with Hans Yeager, senior principal engineer, architecture, at Tenstorrent; Joe Davis, senior director for Calibre interfaces and EM/IR product management at Siemens EDA; Mo Faisal, CEO of Movellus; Trey Roessig, CTO and senior vice president of engineering at Empower Semiconductor. This discussion was held in front of a live audience at DAC.


[L-R]: Tenstorrent’s Yeager; Siemens EDA’s Davis; Movellus’ Faisal; Empower’s Roessig

SE: We’re seeing massive jumps in power consumption. Is this just all about AI, or is there more going on, as well?

Yeager: There are a couple things happening. For a long time, we’ve had process node compaction, and so we’ve had power density increasing just because of processing. On top of that is data movement, which is a huge consumer of power. And AI is really driving computations.

Davis: From the design side, it’s always about getting more in the same bucket. You’ve got reticle size limits, although now we’re pushing even beyond that. But I don’t want to be melting silicon, so how do I make it so I can win that socket? The answer is you need more and more tricks to meet that requirement, whether that’s for communication or high-performance computing. Across the board, you see power demands increasing — except where power is your limiting factor, such as in mobile devices.

Faisal: It’s opening up a lot of opportunities for innovation. Everybody has to worry about margins and timing, and all the over-design that you put into your chip design to close it. All of that needs to be squeezed out by leveraging techniques at every level, from the transistor all the way up to the system. But timing is a big deal. The quality of your clock directly tells you your Fmax and Powermax. There are a lot of problems to solve.

Roessig: For years, the standard way to get power from point A to point B has worked just fine. But as densities and the power levels in the chip increased to the where they are now, bottlenecks that were never an issue before are starting to show up. It’s equivalent to trying to shove too much water through the same size pipe. We’ve gotten to the point where, just in terms of the characteristics of the packaging, you can only carry so much power into the silicon. So it’s very much a case of the modern power density surpassing the mechanical capability of getting power from point A to point B.

SE: If we stay on this trajectory, what happens? Can we just keep adding more and more devices?

Faisal: We are going to stay on this trajectory because of the demand, the applications, the workloads. AI is kind of like the raw material. You can go make specific things around applications in automotive and mobile and edge. That’s not stopping. What that means is we need a lot of awareness on the part of the systems guys and the software guys as they’re programming. What’s happening with the power delivery when I write this line of code? The trajectory is staying, so we have to innovate around it.

Davis: Humans are good at doing more. Even less is more. But what is the value of that? What are we willing to pay for? In certain applications, it’s worth it. In others, power density or long-lasting battery life are more important. And so the market is bifurcating into different applications. At the high end, where you need power to get the value, you’re going to continue along that trend.

Yeager: We’re going to see another trend, though, which is getting more and more bespoke IP blocks and doing exactly what’s needed. We’ve already seen a lot of this in SoCs. We have video encoders and decoders, and they’re super-efficient at doing exactly what they do. You turn them off when you don’t need them, and you monitor them when you do need them. Now you’re seeing people using GPUs for AI, and Tenstorrent is working on doing a better job of getting rid of stuff out of the GPU that can go on more efficient AI products. We’ll continue to see more of that trend — to build exactly the hardware that you need. That’s one of the ways we’re going to get better optimization. The other is going to be improving the efficiency of what we’re building.

Roessig: A few years ago you had to be an Intel or an NVIDIA to produce a chip that was really hard to power. Now anyone can do it. This year, we’re starting to see that in terms of how difficult these chips are to power in processes everybody has access to, and it just gets worse from there. I don’t see that bottleneck being alleviated anytime soon.

SE: Are the design tools and methodologies capable of dealing with this much power? That’s critical because it’s not just Intel anymore. About 30% to 35% of designs are going to systems companies that are creating massive data centers.

Roessig: It’s a multi-disciplinary problem. That’s the big thing, and that absolutely plays into the tools. For example, we’re doing converters, but we are not a silicon company. We are co-designing the silicon, the package, the inductance, the capacitance. It all has to meld together. We’re past the point where treating everything as an individual component works with these kinds of speeds and parallelism.

Davis: The software often lags what the innovation is in the hardware. What is it that the software needs to do? Today, we can manage the software. We can get the designs done, as evidenced by the fact that all these companies are bringing chips to market. Are the tools limiting? No. But what they can do is create better efficiencies. So can we do it cheaper? Can we create more automation to make it so that you can explore these corners faster and get better optimization? For a lot of these tricks Hans is talking about, we’re going to do the specialized thing instead of the generic thing. That applies for all algorithms. You’ve got to get more specialized, and that requires more corners and more exploration from the software.

Yeager: I still think there’s a gap. To really do this right, we can’t use the tools and run extraction once the [power distribution network] is done. By the time you’re that far along in the design, either you’re taping out or you’ve missed your tape-out by a year, or a year-and-a-half. Somehow, we’ve got to pull stuff like PDN analysis into the architectural phase of the development. We need to be able to simplify the problem down to just what we need to solve because this problem is too complex. And what we’re finding now is that you’ve got to pull power consumption all the way up into the architectural phase and your high-level block diagrams, because it will make or break your products.

Faisal: There’s another side to this. You can make the design tradeoffs and margin tradeoffs by putting intelligence in silicon, which is made up of circuits and IPs and sensors. That way you can move your design challenges into silicon, where it automatically detects and corrects things. Adaptive clocking is an example of that. You sense what’s happening to your power network. If there’s a droop, you stretch the clock and you can reduce the min rate. That’s very complex to analyze at the chip level. You need your workload, all kinds of models and extractions — not just RC extractions or LT extractions. You also need the inductances, and that is much more difficult. It’s a huge opportunity, and an absolute must. I’m waiting for the day when place-and-route tools can automatically put these intelligent sensors in there as it’s being designed. That gets aligned with the full up-and-down co-design that everyone has been talking about.

Davis: That’s the balance with variation, as well. Do you put margin in there to get all the way to six sigma, or do you adjust for that when you hit it?

SE: Given the growth rate of AI, are we now going to run into issues where you can’t do some of this stuff anymore?

Yeager: This circles back to the first question. One of the ways we’re doing the optimization, and actually reducing total power, is by getting chiplets and computation closer together. That increases the power density problem, so then it becomes a power delivery and a thermal challenge. What we’re trying to achieve is reducing data movement power and overhead. That’s the trajectory we’re on, and it will continue to evolve and make improvements.

Davis: It’s all about economics. If we continue on this path, how expensive is it at the chip level, the system level, and the compute-center level? Power is a function of the cost of communication and the computing element itself. We have transistors that do the computing, wires and communication that make that work together. And so chiplets can allow us to put things closer and take out that cost. You can do the same thing with the distances between your computing elements, and between racks in computing centers. Supercomputers today are megawatts when they’re running. There was a projection of whether they would reach tens or hundreds of megawatts, which is the same power consumption as a city. Can we tolerate that? Well, it depends on what value we get out of it as a culture. And individually, what are we willing to pay for? So yes, we’re going to run into things where we’re going to continue to innovate on all those pieces that reduce the cost of communication between the computing elements. That will enable us to continue to scale where it’s valuable.

Faisal: Every time you ask ChatGPT a question, it’s equivalent to keeping a 60-watt light bulb on for two to three minutes. Now try doing that 10 million times a day. And by the way, it’s projected that it’s going to be tens of billions of inquiries in the next four to five years. So the problem is big. You can do the numbers. It’s absolutely ridiculous. Innovation is needed at every level, starting at the backside vias. As an industry, we’ll figure out a way to get through this. It’s all about efficiency per operation or per question. That’s the metric, and even my layout designers should be aware of that. That’s how you innovate, even if it’s not their problem or what they’re being held accountable for. And the software guys need to be aware of that metric, as well. Part of it is just education. It’s not just my memory usage when I’m coding up. It’s also power usage and energy usage, so I know what the cost of my code is when it’s running.

Davis: There’s a lot of research being done in energy-efficient algorithms. For example, ‘Don’t do a sort the way you’ve always done it. If you do it this way, it’s going to be much more efficient.’ MIT has done a lot of research on that.

Roessig: At the end of the day, though, it’s going to be a macro-economic discussion about how much power for the value we’re getting. At the component level, the package can only take in two kilowatts, or whatever it is, because we can’t get the heat out. So now we have to put two chips next to each other. The ultimate limiter is the roll-up of all this power versus how much value we’re getting. We’ll have a lot of efficiencies in algorithms and everything else. But if you build a bigger highway, more cars will show up. Ultimately it’s going to be a tradeoff of value for power.



Leave a Reply


(Note: This name will be displayed publicly)