Can we stop power-hungry bugs from clawing their way through application software stacks?
By Achim Nohl
Identifying and describing power issues is tough, let alone trying to solve them. “Power” issues can be very diverse. It’s even more difficult to explain how virtual prototypes can help to analyze “power” consumption. We often approach it by introducing how power information can be reflected in virtual prototype models, but there are many different goals and conflicting views on accuracy, granularity, modeling approach, data sources, etc.
Some engineers will care about power, but most will care about energy efficiency. Some engineers will look at just a single block, while others care about processor subsystems and entire platforms. Some software engineers focus on a single piece of software, others care about the entire software stack.
So instead, we need to approach it from the software engineers’ intent. For example, software stacks using Android with multiple tens of millions of lines of code implement various power-saving strategies on almost every software layer, starting at the OS kernel and ending up in the application layer. A software power inefficiency or malfunction can quickly cause a 5x drop in device standby time. A Google example shows that a simple RSS feed application that wakes up the phone every 10 minutes for just 8 seconds at a time to do some updates, via the Internet, can cut the standby time of the phone in half. An almost endless list of battery drain problems reported by end-users can be found here. Of course, the most prominent example is the recent iOS bug. Why is so difficult to get this under control? Let’s examine some of the key issues:
1. Profiling the impact of software changes on energy. A software developer needs to be able to determine the relative impact of his implementation choices on energy. For example, is it “better” to use a spin-lock or a semaphore in the Android sensor poll function when waiting for accelerometer data from the hardware? A spin-lock is faster, but does not allow the CPU to go idle while waiting. The developer needs to find the right balance between responsiveness and energy consumption. While this example only looks at the CPU, most cases are concerned with the efficient use of the other HW components, too.
Here’s another example: In an application that requires location information, is it “better” to use coarse-grained updates from the GPS, or fine-grained updates for 3G? The data needed to compare the two options are as follows: 1) the time it takes to complete the task, and 2) the power level to compute the energy (e.g. GPS: 25 seconds * 140mA at about 1mAh, Network: 2 seconds * 180mA at about 0.1mAh). Again, the decision cannot be made only by the energy savings. It also requires an understanding of the performance demands of the particular service. Are we looking at a driving navigation or a walking navigation? Both would have different requirements on the precision.
2. Interference and side effects of local software changes on other modules. An application or service that requires a periodic network update can trigger a cascade. Other less important services may just be waiting for a connection to be established. So instead of doing a 1 second update every 10 minutes we may easily trigger a 1 minute online activity caused by waking up background applications. Here, the software developer needs to understand those cascades through an energy profile that shows an unexpectedly high peak and an estimation of the resulting standby time. Handset makers face this interference challenge when integrating multiple software modules from various suppliers.
3. Ensuring that multiple concurrent power management frameworks collaborate. System-wide and runtime power management are just two examples in the Linux kernel that need to collaborate. The former turns the whole phone into a suspend state and wakes it up without loss of data. The latter controls power states of hardware components while the phone is used. A typical issue is when a component that has been shut down by the runtime power-management prevents the device from going into a system-wide suspension because it needs to be woken up first. Power management debugging can easily become a nightmare, because even debugging and tracing services are impacted and may shut down during suspend/resume phases. Moreover, wrong settings in the constraints for the voltage regulator drivers can even result in lasting hardware damage.
4. Many inefficiencies and malfunctions only appear in usage context sensitive scenarios. Energy inefficiencies or power malfunctions are highly context-sensitive. The context is defined by how the device is interacting with the environment. Next to the user, the device and the software stack are linked to the environment through network 3G, WiFi, NFC/Bluetooth, cameras and many other sensors (acceleration, orientation, magnetic field, proximity, temperature, etc.). When optimizing software for energy, the usage scenario that drives the stimuli to the device needs to be repeatable in order to get deterministic and comparable results. Testing requires automation of those scenarios along with a certain level of randomization. During testing, developers need to check that all applications/services are implemented according to the coding guidelines (e.g. for application using the sensors, the programming reference says: “Always make sure to disable sensors you don’t need, especially when your activity is paused. Failing to do so can drain the battery in just a few hours. Note that the system will not disable sensors automatically when the screen turns off.”)
Looking at the above challenges, it becomes clear that to effectively debug and optimize power using virtual prototypes we have to look beyond it as just a way to instrument the models with power information. In my next blogs I will explain how the four important requirements mentioned in this blog can be successfully addressed using virtual prototypes. Maybe we can tone down those software bugs’ appetites.
Leave a Reply