Correlating Software Execution With Switching Activity To Save Power In SoC Designs

A practical example of debugging a tricky power problem.

popularity

There is probably no more pointless waste of energy than lighting and heating a room that is empty. The obvious optimization: notice that no one is there and turn off the lights. It works the same on an SoC or embedded system. To save energy, system developers are adding the ability turn off the parts of the system that are not being used. Big energy savings but with no compromise to functionality.

I was working with a customer who had put this type of system in place, but they were observing a problem. While most of the time the system did really well with battery life, occasionally (about 10% of the time) the battery would die long before it should. The developers were stumped. After a lot of debugging what they discovered was that one of the energy hungry peripherals would be turned on and left on continuously, while there were no processes using it.

To debug the problem, they stopped trying to use the prototype and went back to emulation on Veloce to try to figure out what was going on. Veloce has a feature that allows developers to create an “activity plot” of the design being run on the emulator. The activity plot shows a sparse sampling of the switching activity of the design. While switching activity does not give you an absolute and exact measurement of power consumed, it does allow you to find where likely power hogs are hiding.

codelink LP fig 1

Figure 1. Design switching activity over time — enables power spikes to be identified and further investigated.

They ran their design and captured the activity plot. The design was configured to run two processes: one using peripheral A and the other using peripheral A and peripheral B. As you can see from the graph, one peripheral is accessed at one frequency, creating one set of spikes in switching activity. The second process accesses both peripherals, but less frequently, producing the taller set of spikes.

codelink LP Figure 2

Figure 2. Activity plot showing switching activity in the design.

As seen in Figure 2, at some point the spikes on peripheral A disappear — that is, peripheral A gets left on, when peripheral B gets turned on. Someone “left the lights on” as it were. Examination of the system showed that, indeed, the power domain for peripheral A was left on.

With Codelink, a hardware/software debug environment that works with Veloce, the designers were able to correlate where the cores were, in terms of software execution, with the changes in switching activity shown in the activity plot.

Since the problem was related to turning off the power to one of the power domains, they set the Codelink correlation cursor to where the system should have powered down peripheral A (see figure 3).

codelink LP Figure 3

Figure 3. The Codelink correlation cursor set to where the system should have powered-down peripheral A.

At this point there were two processes active on two different cores that were both turning off peripheral A at the same time (see figure 4).

codelink LP Figure 4

Figure 4. Side-by-side view of two cores.

The developers were able to single step through the section of code where the power domain got stuck in the on position. What they saw were two processes, each on a different core, both turning off the same power domain.

It turned out that the AXI fabric was implementing the notion of “master” as the AXI master ID from the fabric. Since the ARM processor had four cores the traffic on the AXI bus for all four cores was coming from the same master port — so they were all seen as coming from the same master.

From the fabric’s perspective and the slave’s perspective, the reads and writes were all originating from the same master — so the accesses were allowed. There was no differentiation between accesses from core 0 and core 1. An exclusive access from one core could be followed by an exclusive access from another core in the same cluster, and it would be allowed (see figure 5). This was the crux of the bug.

codelink LP fig 5

Figure 5. AXI “exclusive access” implementation.

The ID of the core which originates an AXI transaction is coded into part of the transaction ID. By adding this to the master, which was used for determining the exclusivity of the access to the reference count register, the design allowed it to correctly process the exclusive accesses.

Veloce emulation gave the developers the needed performance to run the algorithm to the point where the problem could be reproduced. Codelink delivered the debug visibility needed to discover the cause of the problem. The activity plot is a great feature that lets developers understand the relative power consumption of their design. Together these give engineers the information and the means to make higher performing, more efficient designs.

You don’t have to worry about someone “leaving the lights on” anymore.

To learn more about power management validation using Veloce and Codelink and for fuller details on this particular implementation, read the full whitepaper Power Management Validation.