Striking A Balance On Efficiency, Performance, And Cost

More efficient designs can save a lot of power, but in the past those savings have been co-opted for higher performance.

popularity

Experts at the Table: Semiconductor Engineering sat down to discuss power-related issues such as voltage droop, application-specific processing elements, the impact of physical effects in advanced packaging, and the benefits of backside power delivery, with Hans Yeager, senior principal engineer, architecture, at Tenstorrent; Joe Davis, senior director for Calibre interfaces and EM/IR product management at Siemens EDA; Mo Faisal, CEO of Movellus; Trey Roessig, CTO and senior vice president of engineering at Empower Semiconductor. This discussion was held in front of a live audience at DAC. To view part one of this discussion, click here. Part 2 is here.


[L-R]: Tenstorrent’s Yeager; Siemens EDA’s Davis; Movellus’ Faisal; Empower’s Roessig. Source: Jesse Allen/Semiconductor Engineering

SE: As you start lowering the voltage, which is the way we’ve dealt with a lot of these issues in the past, tolerances become tighter, the impact of noise is greater. These issues weren’t major factors at older nodes. And it gets worse as you move into 2.5D and 3D, right?

Yeager: The voltage droop alone is problematic. Max VCC on reliability is going to drop on newer nodes, and min VCC is going be a challenge coming up. So voltage headroom gets smaller, and droops consume more of that voltage headroom. We need to figure out ways to deal with the pathological droop that your real workloads aren’t doing so that real workloads can run without margin. Otherwise, we’re running into a space where DVFS scalability is going to be something like 200 millivolts or less.

Davis: We need to figure out how a chip is going to operate in real life so you know where it matters that there’s droop. Then you can model those things and only make corrections where it’s important. That’s a really important development going forward.

Roessig: This is where 2.5D and 3D can help, because if you are converting inside the package, what your chip sees and what you’re converting are a lot closer than outside the package. You gain some back, but you do have to deal with the cost and the thermal modeling. There’s a whole lot of complexity, but at least you’ve taken down that PDM (power distribution module) problem, to some extent. We’ve done some studies where you look at power conversion outside, and you look at the load transient outside, in the package, and in the chip, and those are wildly different.

Faisal: The transistor is where it all counts. This continues to create opportunities for everybody. But when it comes to solving the power problem, it’s ‘this…and. Whatever technology and ideas people have, let’s grab it and add something else.’ That’s all the sensors, tools, packaging technologies, packaging modeling, droop mitigation, and more intelligence, and faster voltage regulators. All of that adds up. Solving the big problem of energy per inquiry isn’t something that questions the value of AI. An acquaintance of mine is 85 years old, and ChatGPT is actually keeping her sane in her older years. Can you put a dollar amount on that? Maybe you’re saving the nurses some time, but it goes beyond what we can just calculate. So it’s not a competitive situation. It’s more collaborative. ‘Give me what technology you have, I’m going to develop my own, and we’re going to go after this mission of energy per inquiry needs to go down.’

Yeager: There are a lot of permutations. It’s not obvious to me that you can take integrated VR and merge that with reactive systems. When you integrate VR and the coupling capacitors, you completely change your PDN resonance. You completely change your droop slew rate. First droop is now very different from second droop. Will reactive systems work together with that? You don’t know until you do a fairly detailed analysis of that for a very specific design.

Davis: It’s a complicated control system.

Yeager: Yes, and the number of permutations that need to be evaluated to determine what’s optimal for the problem you’re trying to solve with your chip is going to be real challenging.

SE: Where is the low-hanging fruit? Is it less movement of data, sparser algorithms, adjusting your design for real workloads?

Davis: There is no low-hanging fruit. It’s all hard.

Roessig: The move to backside power delivery is ‘lower’ hanging fruit. It’s not easy by any stretch of the imagination. Any closer and you’re inside the package, and that gets a lot more complex. But backside power is already happening. If the voltage doesn’t have to run laterally to get from point A to point B, and instead is right under point B, that’s relatively free efficiency if you can pull it off.

Yeager: Backside power delivery will help some. I’ve been in enough meetings where I’m sure someone is going to say, ‘What? I can get a low-RC wire and run a bus on the backside where that power is and steal some of those power wires?’ But one of the biggest things backside power is going to do is to get rid of the vertical obstruction of wires when you do power gates. And when you do power gates in an area and have true VDD coming down, and virtual VDD going up, the porosity of metal is almost zero. Being able to move that to the backside lets you route over those with impunity.

Davis: Backside power gives you a level up. FinFETs got us something. Optoelectronics gets you something. All of these are step functions. They’re never easy. They’re the result of decades of research. We’re going to look at fully optical communication between chassis. And all of these allow us to lower the power, and then to spend it on performance.

Faisal: The big guys all have their performance architects who are optimizing every single day and looking ahead. How do we democratize that and bring it to the masses in the semiconductor industry? That would be a huge step, because big chips are no longer just for the top 10 guys. It’s now everybody’s business. There is a 10-person team building a reticle-sized chip.

SE: Where do specialized processors fit into the picture? Are they going to be successful, or is all going to be GPUs in a data center?

Yeager: If you step back and look over the past 50 or 60 years, we had ASICs, and then the general-purpose computer came out. That was great, because hardware was expensive, and you could amortize the cost of the hardware. The software was cheap. Now software is expensive. Many times you put something in hardware, and the software guys say they don’t have time to implement it. ‘The product is good enough.’ So we ship it. You see this in the SoC. We’re going to move back to dedicated hardware to do exactly what we want. SystemC and high-level synthesis is another way people can rapidly prototype things and do this. There was an era of dark silicon where people just powered things off. The problem with that is you still have to route over it. So where the economics make sense, because there is enough demand for it, we’re going to move to totally dedicated IPs that do exactly what we want, and then balance the memory hierarchy with that and the software.

Davis: Absolutely. If you look at modern CPU design, it’s not just one big CPU. There are dozens of cores on there, and they’re not all the same. There are different cores that are optimized for different loads and different things. It’s the same for image sensors today. Image sensors used to just transmit the whole thing. But we can save power by putting the whole thing on the same die, sometimes even in the sensor cell itself. That’s extremely specialized. You’re doing pattern recognition on the die. You only transmit an indicator that says what this object is, rather than sending the whole image somewhere else. You’ve saved on communication costs, on the power of the system, and on security. Where it makes sense — where you know what your application is going to be — that is the biggest lever we have. You don’t use a general algorithm. You use very specific things to achieve very specific ends.

Faisal: Here’s an analogy. Humans have a natural intelligence, and we have a really sophisticated way of communicating. When I say ‘cat,’ everyone has an image of a cat. That’s really efficient communication. It’s data movement. How do we figure out how to do data movement between AI systems in a way that is similar to how humans communicate? As Joe was saying, figure out what it is and then only transmit the results. Developments in that direction would be very helpful, and would definitely move the needle.

Roessig: I agree that it’s going to go toward the targeted silicon, not using any power you don’t need and no dark silicon to route over. It’s back to the system-level problem. Do the tools keep up with the average customer’s ability to put this all together and get it to work?

SE: Is this problem solvable by the chip industry, or will we ultimately need more power sources?

Yeager: We either need more power sources, or the power becomes expensive enough that the economics stop us from playing around with ChatGPT just for entertainment. Then it balances itself out. For training and inference, the power consumption is huge. Let’s say we cut 30% out of that with algorithms and eliminating margin. Then we’ll just use it up. But something has got to give.

Davis: People put data centers next to hydroelectric dams for a reason. Do we need more? That will depend on how efficient we are, how fast we grow, and what are the economics. Where do we put them, how do we fund them, and what do we get out of it?

Faisal: There has been discussion about small, modular nuclear reactors next to data centers. There are real dollars being invested in that. So imagine if you’re a data center company with a nuclear reactor.

Davis: And a geiger counter.

Yeager: And next to a large body of water that’s not on a fault line.



Leave a Reply


(Note: This name will be displayed publicly)