Making Sense Of Virtualization

A closer look at software development for ARM’s big.LITTLE Processing—Part I.


By Achim Nohl
In the last month I’ve had the opportunity to get some hands-on experience with hardware virtualization and hypervisors. My knowledge so far on this has been mainly limited to what I could read about it and what other people are saying about it. However, the PowerPoint slides I’ve seen leave a lot of white fog between the bullet items.

This didn’t make me feel very comfortable talking about this topic, but there was no escape. Hypervisors play an increasingly important role for system designers in context of supporting multiple guest operating systems on the same device, or taking advantage of ARM’s new big.LITTLETM processing. The fog is not all gone, but let me provide some insight on what I found out. As a disclaimer, I’m not going to (and I cannot) write an expert almanac about all the aspects of virtualization covering Xen, VMWare, etc. Instead, I’m going to focus on my personal experience that I believe will be relevant to you as well. This post is the starting point for a series on this topic in this blog.

Clearing up terminology confusion
The fact that this whole concept is (also) called “virtualization” is high on the list of the top challenges I was facing when explaining it to my colleagues, or management. As a background, I am working in technical marketing in the area of “virtual prototyping,” and our product is called “Virtualizer.” To avoid the same confusion here, let’s separate those terms in context of embedded systems:

  1. Virtual prototypes/Virtual platforms (VP): A virtualized embedded system in the form of a simulation on a host PC for the purpose of developing the embedded system (HW and/or SW).
  2. Virtualization using hypervisors: An embedded software (ESW) layer between the operating system (OS) and the embedded system hardware. The hypervisor can serve multiple guest operating systems and bridges differences of the real underlying hardware and the target hardware for which the OS has been built for.

I used a VP on my desktop PC to bring up ARM Linux along with an embedded hypervisor. Using a VP, I’ve been able to integrate, run and debug a software stack including Android, Linux and a hypervisor on an ARM big.LITTLE processing-based system. The VP integrates Fast Models from ARM to simulate the processors.

big.LITTLE processing?
The first example of this processing concept combines a high-performance ARM Cortex-A15 MPCore processor along with an energy-efficient Cortex-A7 processor. The two main use-models that have been introduced by ARM are called “big.LITTLE Task Migration” and “big.LITTLE MP.” Those use models are nicely described by the Linaro organization.

The so-called big.LITTLE Task Migration use model is that the software can seamlessly migrate from one processor to the other, depending on the use context and resulting performance requirements. It sounds like this requires a heavy re-write or modification of the software stacks (e.g., Linux/Android) to support this; in fact, it doesn’t. This task migration is achieved by having the software run not on the hardware, but on-top of a new layer. This layer operates in the hypervisor mode and performs the task-migration without Linux/Android even knowing about it—although, the Linux/Android power management will initiate it as illustrated in the next figure.

Hypervisors and Interrupts
The task of the hypervisor software layer is to shield the hardware and payload (e.g., Linux/Android) software from directly communicating with each other. We call Linux/Android payload, as it is the software the user is intending to run. Why is a hypervisor needed? Interrupts are a good example here. Let’s assume we run multiple guest OSes on a system. Interrupts coming from the hardware could be for either of the OSes. Therefore, a hypervisor needs to first intercept the interrupt from the system, and then decide to which guest OS it was addressed.

For big.LITTLE processing it’s the other way around. Here, we have multiple processor clusters sharing the same interrupt controller. The hypervisor (open source example available here) ensures the transparency of the clusters for the OS and does the task migration. But, the hypervisor needs its own interrupts and should not interfere with the OS.

Hardware Supported Interrupts Virtualization
In order to enable interrupt trapping in the hypervisor, ARM provides specific hardware support in its processors. In the processor’s hypervisor mode, a higher privileged exception vector allows trapping interrupts even before the OS can react on it. Thus, when an interrupt come in, it is the hypervisor that first handles it. If it is for the OS, then it will configure a virtualized interrupt controller, which is a replica of the real interrupt controller seen by the OS. The OS will then handle the interrupt as if there was no hypervisor in between. Here, the MMU (Memory Management Unit) virtualization plays an important role as well. More about that in my next blog.

Interestingly, an interrupt has quite a way to go before it arrives at the user’s application, all the way from the hardware, through the hypervisor, into the OS. Virtual prototypes are extremely useful for debugging this chain, as all aspects of the system can be observed and traced as shown in the figure below.

Keeping the overview on what is going on and having a bird’s eye view to assess where things go wrong are really helpful here. Where does the interrupt stop? Stuck in the hypervisor? Not arriving at the interrupt controller? The VP ensures that at any time the different layers of the SW and even HW can be traced and debugged to spot and identify software integration bugs. First of all you have clarity on which CPU is currently active. Second, you can spot the time when the actual interrupt arrives from the hardware. Next, tracing of the system is possible across the layers of the software, from the hypervisor up to Android. The VP tracing and debugging is aware of the hypervisor layer by tracking the mode that is exposed from the underlying CPU models (ARM Fast Models).

Furthermore, debugging services are always available, because debugging does not depend on any embedded software daemons to function. This is very useful for debugging the interaction between the hypervisor and the Linux kernel. You simply can have one debugger attached to the hypervisor, and another one attached to the Linux kernel. This becomes really compelling when debugging the task migration. Even during phases where one CPU is powering down and the other is powering up, debugging and tracing is possible without limitation. This is a delicate phase as the entire context of one CPU is saved and subsequently restored by the other CPU. This can result is obscure and hard to find defects, for example, if the context saving is incomplete because it forgets about saving the secure mode registers. I will write more about the task migration, in context of energy/performance optimizations in a follow up post.

If you’d like to see a live demonstration and explanation of all this, we’re presenting at the DesignWest Conference (formerly called the Embedded Systems Conference) in San Jose in the ARM Partner Pavilion and also in a joint technical session with ARM as part of the Android Summit. You can also see a demonstration from ARM and Synopsys at SNUG Silicon Valley on Monday at the Designer Community Expo or Tuesday in the afternoon tutorial of the Systems track.