Is it Hot? Ask Joules

Phones are never actually off, but that doesn’t mean they can’t be more efficient.

popularity

Over the last decade it has become clear that power reduction techniques involving different parts of the chips would become more important than they had historically. In 2G cell phones everything except the real-time clock could be turned off when the phone was not in use. Pre-smartphones, a phone was either making a call (or texting, gaming, etc.) or it was off. In fact, a cell phone can’t ever be completely off or it would never be able to receive a call. Under the hood, everything except the real-time clock could be turned off, leaving the clock to wake the receiver up every second or so to listen to the paging channel to see if there was an incoming call or text.

Dividing the design into regions and having a power policy for each one gave a finer grain of control at the cost of a big increase in complexity. Each region could be a different voltage, powered on and off, and even have a varying voltage. The first problem was that design tools could not cope with this. Vdd and Vss were not explicit in the netlist so there was no way to capture these decisions. This drove the creation of CPF and UPF (since unified into IEEE 1801) to capture power policy so that tools could correctly create power networks, add level shifters, retention registers and more.

This was just an entry to the game, though. It captured power policy but did nothing to help decide what that power policy should be.

Management of power suffers from a number of problems. First, actual power dissipation depends on what the chip is doing (making a cell-phone call, playing a video game, sitting in your pocket). As a result, average power is a poorly defined idea that depends on the duty cycles of the various applications and, in turn, which blocks on the chip are actually active.

In fact, an up-to-the-minute example of this is the controversy over the different power dissipation of the two versions of the A9 chip manufactured by TSMC and Samsung in the iPhone 6s. Depending on what assumption you make about duty cycles, the differences vary from insignificant (the transmitter and the screen consume most of the power in many operations) to huge (run the chip at full speed with the screen off). These may be unrealistically extreme examples, but in fact there is a big difference between watching a video, making a phone call, and reading email.

cadence2

Second, there is a contrast between early and late in the design cycle. Early in the design cycle, at the architectural level, the reductions in power are potentially the largest, but the capability to compare two possible choices is the least feasible. Late in the design cycle, at the netlist level (never mind physical layout), the actual power numbers are known fairly accurately, but the impact of any changes are comparatively small. Plus, most designs are too large to be simulated at the netlist level in a realistic time in any case.

The sweet spot seems to be at the RTL level. At that level it is possible to make changes that have large impact while being able to have a reasonable estimate of what that impact will be. But this requires a very good estimation of how the RTL will look after physical design without going to the expense of actually doing physical design. For example, in a typical chip, perhaps 30% to 40% of the power is consumed in the clock tree, meaning that it needs to be estimated.

The key technology to making this happen is to be able to do a fast physically aware synthesis that doesn’t give up too much accuracy. A blazingly fast synthesis that is only 50% accurate is the wrong tradeoff, as is saving only 20% over actually doing the whole implementation for 95% accuracy. Once that has been done, vectors for the various modes (what the chip is doing) can be run and power dissipation numbers obtained. These can then be used as the basis for making RTL-level changes to the design.

Cadence’s Joules RTL Power Solution is a tool that is accurate to within 10% to 15%. It keeps performance high by parallelizing the analysis across multiple CPUs, and being able to analyze multiple stimulus files (the different modes for what the chip might be doing) in parallel. Joules is actually built on top of an ultra-fast prototype mode of the Genus Synthesis Solution released earlier in the year.

For additional accuracy, the Joules solution can be linked into Palladium Dynamic Power Analysis, which is based on emulation. This is especially useful when the analysis requires running a realistically large software load, which, in effect, results in literally billions of vectors.

The result is the capability of doing RTL power analysis with good accuracy, and about 20X faster than any other approach. For example, 20 million instance designs can be run overnight at the RTL level, producing power results with accuracy within 15% of signoff.

Oh, and that picture. That is James Joule, who Wikipedia describes as a physicist (I knew that) and brewer (who knew?). He is generally credited with discovering the conservation of energy. However, more important for this blog, he also discovered the relationship between the current through a resistor and the heat dissipated.

cadence3