Using Real Workloads To Assess Thermal Impacts

A number of methods and tools can be used to determine how a device will react under thermal constraints.

popularity

Thermal analysis is being driven much further left in the design, fueled by demand for increased transistor density and more features on a chip or in a package, as well as the unique ways the various components may be exercised or stressed.

However, getting a clear picture of the thermal activity in advanced-node chips and packages is extremely complex, and it can vary significantly by use case and by application. Unlike in the past, just adding more margin into the design no longer works, because that extra circuitry can push a design past the thermal budget due to increased resistance, capacitance, dynamic power density, not to mention current leakage. So thermal modeling needs to be more accurate, based on more and better data for how a chip will be used, and it needs to happen earlier with continual monitoring throughout the design cycle.

“If you want the most accurate thermal results, you need true inputs — layout, power, and all of that,” said John Ferguson, director of product management at Siemens EDA. “The closer you have to what is real, the better. The problem is that you’ve got chiplets being designed by multiple organizations, maybe not even in the same company. And then you’ve got packaging, maybe from another organization. All this data is coming together at the same time. The only time you have the true real workloads is when you’re done, but if you wait to do your thermal analysis and you find a problem, it’s too late. You’ve already lost your window.”

Effectively addressing thermal issues requires a comparison of two levels — benchmark and use cases — both of which are based on real workloads. “Benchmark analysis pushes the design to its maximum level to see how long it will last at that maximum,” said Melika Roshandell, product management director at Cadence. “When you do a CPU design and you have, let’s say, eight cores in your SoC, you can say the maximum frequency that each of these cores can go is 3GHz. You then design a benchmark that puts all these eight cores running at the same time at 3GHz to see how long they last.”

Thermal profile of a phone.
Source: Cadence

That becomes the basis for a benchmarked score. If the cores last for three seconds at 3GHz, that may be better or worse than what the competition is doing. “By three seconds, if 99% of the time the chips stop working at 3GHz and goes to lower and lower frequencies, it’s because of your thermal,” Roshandell said. “So you’re hitting your thermal mitigation, because imagine the eight cores running at 3GHz. That’s so much power that none of the chips is going to run more than one or two seconds before the thermal mitigation hits and reduces the frequency to reduce that temperature.”

Nevertheless, that benchmark analysis helps understand what will happen with a real-world use case. “A use case could be a video recording,” she said. “The video recording doesn’t require a lot of frequencies in CPUs. It doesn’t need a lot of processing power in the CPU. It needs more in the GPU. Let’s say you’re doing a FaceTime with your family. In this case, your modem is also activated, or your WiFi chip is activated, so the use cases are not really pushing to the maximum frequency. But they do have multiple IPs acting at the same time. Based on that, you can determine what the user experience is going to be. If your thermal mitigation hits and reduces the GPU frequency, how does that affect the user experience? Determining the workload and doing the thermal analysis is one of the most important aspects to determine how your design is actually going to work in a real product.”

Others agree. “Using a real workload versus a synthetic simulation vector provides a means of generating a realistic power profile, which is needed for thermal analysis, based on the software application and actual data rates in the system,” said William Ruby, product management director in Synopsys’ EDA Group. “Using synthetic simulation vectors may result in an overly pessimistic thermal result, leading to unnecessary high costs of packaging and cooling.”

These issues only become more pressing at each new process node and in advanced packages, where heat may be harder to dissipate.

“In the future, designers will need more information at a much earlier stage of design,” said Andy Heinig, head of department for efficient electronics at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “Currently, a lot of information is only visible at a later design stage — too late to change the architecture. More flexible architectures will also be needed in the coming years. The current architectures are too inflexible to change on a project-specific basis and at a reasonable cost.”

Designing for use cases
This is where advanced packaging fits in, because it can be customized more easily than trying to pack everything onto a planar SoC. But package choices also can have a significant impact on the ability to dissipate heat. Design engineers will need to make sure the thermal design power (TDP) limit of the package and junction temperature do not exceed the limit with maximum workloads, said Eric Hong, vice president of engineering at Mixel.

Workloads are not always easy to determine, however. They can change by user, by application, and over time through aging and software and firmware updates. So while modeling and simulation are important, getting them right isn’t always straightforward.

“Chipmakers have to very carefully decide what workload they want to simulate to make sure they are simulating the worst scenario for their chips, in order to understand how the heat is going to affect the performance and the reliability of the chip,” said Suhail Saif, principal product manager for power products at Ansys. “If it is an AI core chip, then naturally AI workloads are going to be the most dominant workloads, and that would also be most compute-intensive for the chip, so those are the right workloads to use to analyze the thermal effects of the chip. But if you have not-so-specific custom chips, like CPUs that do myriad things — being in a sleep mode, idle mode, high-performance gaming mode, being on a call, among hundreds of other applications — in those cases, the designers of those chips have to be intelligent and careful about which workloads they choose to simulate the chip for power and thermal in order to find the worst scenario.”

This is potentially even more difficult with chiplets. “Maybe you have an idea of where the chiplets are going to go, but you don’t necessarily know all the metallization of the chiplets, the power inputs,” said Siemens’ Ferguson. “Make your guess. Take a stab at it, and try it with three or four different ways of laying things out. This is tricky, because you’ve got three dimensions now, so you can lay these things out in an unlimited array of possibilities. Try out some of the main ones you want to see, and then at least you can throw away ones that are clearly bad ideas. Then, as data comes in, maybe you don’t have the whole die, but maybe you’ve got the DEF (design exchange format) routing, for instance, as some piece of extra information. As you get more, add that in, rerun the analysis, and eventually you hone yourself in a better direction.”

All of this data is then captured in a thermal model. But that opens another can of worms, because there are a lot of different models that get used here.

“When you do a detailed analysis of any chip, even a 2D chip, there’s a tremendous amount of detail that is involved,” said Joseph Davis, senior director for Calibre interfaces and mPower EM/IR product management at Siemens EDA. “When you’re doing packaging at the chip level, the EMIR tool generates a chip power model that’s consumed by the package tool. The package tool provides a package model to the chip tool. And now you add a thermal model for those chips so that you don’t have to carry around all of that data for the levels of abstraction just like you have with P&R. You’ve got LEF-DEF, you don’t have everything in the GDS when you’re doing P&R because you don’t need it. It’s the same thing here. As you go to a system-level analysis, I’m doing compact models that provide the effective modeling of that element, whether it’s a chip power model or a thermal model. I can model the thermal with resistances and capacitances so that it looks very much like an electrical circuit.”

Put simply, raising the abstraction to determine power consumption and heat distribution of the overall level system helps significantly, as long as design teams can drill down into data as needed.

“In the next few years it should be possible to determine such information at the behavioral level, for example, with SystemC,” said Fraunhofer’s Heinig. “In addition, new methodologies and tools should be introduced to determine the heat distribution at the same level, and also take into account the performance of the power delivery system. Currently, the power delivery network is not considered in such analyses.”

More challenges, more data
Still, this is a lot of data to crunch, and that requires massive compute horsepower and parallelization. Synopsys’ Ruby noted that a real workload can be run on a high-performance emulator, where the emulator generates the activity database for downstream power analysis, which in turn drives thermal. “This activity database can be hundreds of millions of clock cycles, representing the actual system operation or a system boot-up sequence. In order to analyze such a large amount of activity data, a fast cycle-based power analysis capability is required. Emulation-driven power analysis solutions takes the output activity database from the emulator and compute the power profile based on that data in a short amount of time.”

Ansys’ Saif agreed. “Most of the top semiconductor designers want to focus on the real workload, so emulation is the way to go. Emulation gives them that real-world scenario where they can run the real usage of the chip. The problem is scalability. An emulator dumps out billions of cycles. Typically, in a simulation, you can limit the time window and the signal size so the data is manageable, but when we talk about real-world scenarios with emulation, the scalability is a thousand or a million times more, so you are getting billions of cycles of data in terms of vectors. You need to be careful and intelligent in selecting what you choose to simulate. When you feed all these billions of cycles into simulation — any kind of simulation, either a voltage drop, IR drop, power, thermal, or security analytics — it’s going to consume that much more compute and time. But today’s timelines for delivering the chips are shrinking, so this is the problem statement.”

The solution space contains automated, scalable ways to select the most critical window out of these billions of cycles.

One approach is a transient solution, in which these billions of cycles are taken into the tool, which then generates a faster power profile. This is not real power analysis, but it is an intelligent way of finding out the most critical signals in the design and tracking them instead of backing the entire design.

“What this does for users is give them a very accurate power profile for a long duration of the vector, which they can use to select the narrow window of simulation analysis,” he said. “We’ll give them the worst case scenario. A lot of users want to stick to the real-world scenario, but then they don’t have a good way of finding what is the best window to select, and they can’t afford to just push all those billions of cycles through simulation.”

Other methods use statistical approaches, including vector scoring. “If we have long vectors or many vectors coming from emulation, we have a scalable scoring method where we take those vectors and also read in the design to find out the score of each vector in terms of coverage. Which vector has the most coverage for the design? Which vector has the capacity to exercise the most power-hungry scenarios in the design, as well? Based on these two parameters, we score each of the vectors for the given design. That tells users, ‘Out of these 100 vectors, these top 5 have the highest score, so you don’t have to worry about remaining 95 vectors.’ They can just focus on the top 5 vectors, which gives them full design coverage, and brings efficiency into the flow,” Saif noted.

Approaching thermal up front
As with energy efficiency and power consumption, the thermal impact must be considered early in the design flow, starting at the architectural level. “At this stage, architectural decisions can be made that profoundly affect the power, performance, area – and thermal characteristics – of the design,” said Synopsys’ Ruby. “In many cases today, power consumption and thermal issues constrain design performance. Conversely, if power consumption is lowered, higher performance can be achieved.

Nearly everyone agrees on the need to shift thermal left. Mixel’s Hong observed that estimating the total power and power budgeting to different blocks is crucial from the outset of a design. Fraunhofer’s Heinig added that power optimization should consider both power delivery and power removal solutions.

But as the industry shifts from monolithic designs to heterogeneous chiplets, all of this becomes significantly more difficult.

“Let’s say you have five chiplets that you’re going to be putting into this package,” said Siemens’ Ferguson. “You have to think about where each of them goes. There are lots of ways to combine them. You can stack things, you can connect things with pillars or with TSVs, or through interposers. TSMC has three or four main categories, and if you drill in, there are three or four subcategories of how you can do it, and so on. You can’t realistically model everything, so you’ve got to have some initial guess. Maybe it’s based just on what you need from a footprint perspective. Which of these possibly will have the footprint you need? So start with that, but right away look at some kind of estimate of the thermal impact, because ultimately your temperatures, your thermal changes, are going to mess around with how they electrically behave. You can have all known-good-die for your chiplets, but when you start changing the temperatures on them, they’re not going to behave in that package the same as they do standalone. So you’ve got to start getting some feel for that early.”

One thing to remember with all of this is that thermal is a low-frequency time constant. “Thermal generates heat, and it dissipates throughout your chip at a much lower rate than frequency and workload changes,” Siemens’ Davis noted. “The good news is for a thermal analysis, you’re looking on the time horizons of milliseconds to seconds, so you’re looking at average workloads, not instantaneous workloads. The architect is going to know, ‘I’m going to put this functionality on this chiplet, that functionality on that chiplet, and those two things are never going to be ‘on’ at the same time because I’m going to program them that way.’ Or, ‘This thing is going to sit next to a high-power driver, and it might get temperature gradient from that higher power driver, but that’s going to be on pretty much constantly.’ So at the architectural level, these workloads at a macro level are things that the architect can know, and start with. As you progress through the design cycle, you get more and more data about what that real cycle, on and off, is like. Again, these things are never going to be on at the same time but occasionally we need them to be, and how often does that happen? The good thing from a thermal perspective is that it’s more of a macro, rather than an instantaneous measurement, whereas power is fairly instantaneous. You can change the power very quickly by changing your workload, but then it’s going to generate heat, which propagates across your chip. What’s the average? What’s the RMS that’s going to happen over a different timescale than your high-frequency operations?”

Conclusion
Managing thermal and power issues requires a holistic end-to-end methodology, from architecture to implementation and sign-off, because these challenges are critical to the overall performance and longevity of silicon-based systems.

“Decisions made early in the design process have a profound impact on the power and thermal characteristics of the design,” said Ruby. “The end-to-end methodology ensures that thermal considerations are not an afterthought, but a fundamental aspect of the design process essential for achieving high-performance, reliable, and energy-efficient silicon devices.”

Related Reading
Thermal Integrity Challenges Grow In 2.5D
Work is underway to map heat flows in interposer-based designs, but there’s much more to be done.

 



Leave a Reply


(Note: This name will be displayed publicly)