But is it easy to cook?
People like me are challenged in the culinary department. We believe that all we have to do is put the meat, vegetables, sauce and everything else in the recipe into the crockpot and a few hours later, out comes dinner. We (desperately) believe that we can dump the ingredients into a Ninja blender and get a healthy, tasty shake in a few minutes. (I have been politely informed that it is NOT the kale that ruined the taste of that power shake, forgive the pun.)
Some would have you believe that power-performance-thermal tradeoffs are as easy that. It is true that for the first time we now can run software on the RTL (or gates) using an emulator. It is true that we can get rid of those pesky, humongous dump files. And then, there is that trivial matter of scanning the resultant activity profiles for peaks or connecting the resultant power numbers to thermal models. So where is our tasty power-shake? Here are some of the practical roadblocks:
Power Management is like playing Jenga
Modern SoCs manage hundreds of blocks to function. Imagine playing Jenga with blocks of different sizes and shapes and textures. Nine times out of 10 you try to fix something in power and it breaks something else, somewhere else. That is just the nature of it. Want to go to standby state in memory? Your bit cells are likely to be disrupted if you don’t do it right and you will probably need a new voltage regulator. One of the core mantras I had introduced to power management is unfortunately an unyielding one — Density-Delivery-Leakage-Lifetime/Reliability has to be balanced for everything you do. So, when you add a block or remove one, you never know if the stack is going to fall. It’s a delicate balancing game – fun like Jenga, but nerve wracking.
Software-driven needs to be well-done, not medium-rare
Software is often referred to as a monolithic entity. However, software that runs on SoCs is a very layered/stacked beast with both generic layers such as Apps and Operating Systems as well as customized layers for the hardware. When an SoC ships, the customized layers ship with it – there is no SoC without its software. In the real world, just loading an application takes millions/billions of cycles across a complex stack. Modern multi-core SoCs run parallel threads of applications and OS services which in turn, use the puppet strings of power management to efficiently manage resources (ideally). Every action in software, translates to real toggles and real current draws, so scanning and profiling isn’t just for apps. It is for the entire stack. This makes it hard to find the real stress points for power. Which events heat up one corner of the silicon? Which events heat up the whole thing? Which events cause the regulator to slow its response? What is the latency of handling a thermal interrupt? Or a battery low interrupt?
Verification can’t be separated from power analysis
Perhaps, the moral of the preceding section is that power management is so intrinsic to software functionality and so woven into the layers that separating it is not realistic. One must run basic functional testing of the software stack plus hardware together and then arrive at the optimizations. Software-driven needs to be well-done. You can’t serve this steak medium-rare.
Power numbers are like potatoes
You don’t do much with raw potatoes. If you know how to cook, though, potato chips, fries, baked potatoes, soups…there are a number of ways to utilize the potato. Same way with power – it must be integrated to energy (battery life), divided by area for thermal analysis (and some more), factored as current for IR drop, differentiated for di/dt and cross sectioned as current for reliability. None of these is a simple recipe. Perhaps more important is that some of this analysis needs to be done at a macro level, and some of it needs to be zoomed into in both time and space into a particular section of the die for a specific time window. When you run about 2 billion cycles a second over a billion transistors on a die, good luck with the zooming in. The trick is actually elsewhere. You have to select the appropriate scenario to help you generate the power numbers, and that is best done in the middle of the software stack.
Thermal behavior is not easily modeled
Back to the potatoes, how likely are you to find identical potatoes in a 10-pound sack? The same operation/transaction can consume variable amounts of power and result in vastly different thermal deltas. This has confounded system designers for decades now. Not that the phenomena are unknown – it is that the constraint space is so large on one hand and the variability of material is so wide on the other. You might have noticed that your phone stays cool playing a YouTube video for 40 minutes, but heats up 4 minutes into a call.
At high density thermal points on the die, we have the problem of finding spatial windows or zones that rapidly heat up to critical points (dark silicon) before you can cool them. The complexity of these zones comes from the fact that thermal impedance models are specific to floorplan and packaging. Even if you identify the spatial zone, you still need to find out if a worst case activation pattern exists. Today, we pre-emptively shutdown entire cores for dark silicon problems and that impacts performance. For instance, I crudely compared a 4-core smartphone to an 8-core smartphone. In theory, the octa-core is a much more powerful compute engine. For practical use, however, I found the quad-core to be more efficiently managed. The octa-core spends a lot of energy doing, guess what, shifting work loads around the 8 cores. In fact, it gets hot to the touch just being idle under some conditions.
Software responds to thermal sensors when it schedules voltage transitions. The variability of on-die thermal sensors and latency in responding to them is part of the dynamic software control loop that needs to be verified. This takes us back to a point I made earlier – verification of the hardware and software together isn’t really separated from power analysis. They both start in the same place.
Rapid iteration is the key to shipping fast
So far, we haven’t really covered performance. Power consumed all of the energy. As we put together solutions around software-driven power-performance analysis, the key is to be able to build good models with reasonable accuracy and iterate over software routines and policies really fast. This requires a number of pieces such as accurate profiles of high level software, power models for IP blocks and thermal models for the implementation, all of which can be input into a virtual platform. That’s what allows us to trade off power and performance in a software-driven manner.
The good news is that it is a possibility today. We have all the ingredients. All we need are the right recipes and the right chefs.
Leave a Reply