Virtualization Revisited

As Moore’s Law becomes more difficult to follow, virtualization architectures are getting a second look.


Virtual instruction set computing (VISC) is getting a second look as power and performance improvements begin to slow and is supplanted by .

While the current crop of finFETs will likely be extended for at least one more process node, there is some debate about what comes next, whether that is horizontal or vertical nanowire FETs or whether the compute architecture itself needs to be changed.

That could involve everything from advanced packaging to changing the code that runs on processors.

Modern code is still stuck in single-processor design mode. The one-to-one correlation between cores and threads in most multicore processors is hardwired, and has been since the dawn of coding. This is the basis for Amdahl’s Law, which states that for any parallelized program, serial code only can be executed on a single processor. Breaking ranks from that is difficult. There are proven economies of scale based on mature and well-understood processes, a well-tested supply chain and design-through-manufacturing flow, and a fully vetted one-to-one thread/processor design paradigm.

So the practical solution, which is commonly implemented, is running multiple threads of single-code design on multiple cores. That improves performance more than incremental jumps in processor technology, but it’s still a bit of a haphazard way to improve performance. It is kind of like having a multi-lane highway, but not all lanes are always used. And even when they are used, they may not carry the optimal amount of traffic. It’s better than a single, serial lane, but inefficiencies abound in this approach.

Therefore, the number of simultaneous threads that can be run is limited by the processors within the chip. And that is the main, but not only, factor that limits performance. Other constraints include the fact that it is difficult to maintain concurrency across large numbers of cores, and applications must be multi-core coded, which also hinders maximum, theoretical performance. So multi- or many-core solutions have narrow bounds for successful deployment and do not scale linearly across performance vs. cores. Just adding cores eventually reaches a point of no return in economics, power, footprint and other factors.

One technique that has gained some traction over the past decade is virtualization, particularly for improving utilization of servers inside of data centers. More recently, it has begun resurfacing in the semiconductor space using a VISC architecture.

Soft Machines, the company leading this push, claims to have working prototypes of this architecture. Mark Casey, Soft Machines’ vice president of marketing and business development, said the company expects to tape out by the end the year, with general availability next year.

Because the Internet of Anything (IoX) is going to cause an explosion in the proliferation of so many things, including processors, being able to virtualize cores on a processor has some interesting implications. With virtualized versions of multicore processing, power scaling becomes more attractive, but so does scaling overall. The idea is that performance can be ramped up without increasing power consumption.

What is VISC?
Thanks to advances in virtualization technology, the theory of virtual instruction set computing is sound. In fact, virtualization is in widespread use for everything from servers to semiconductor software development and manufacturing flows.

However, so far the only demonstration of such architectural innovations has been by Soft Machines, which has coined the term VISC for its product. However, VISC technology may show up in other chip architectures before long due to the impact on power. The core of VISC technology is essentially the virtualization of multiple CPU cores into a single virtual core, which in turn enables much higher single threaded performance.

Figure 1 shows the Soft Machines’ concept. While the theory is not proprietary, the Soft Machine concept is. “The key concept is to separate the physical cores from the virtual cores,” says Abdallah.

Fig. 1: VISC architecture. Source: Soft Machines

Mohammad Abdallah, Soft Machines President and CTO noted that the company implemented “multiple innovations to make these virtual cores work.”

The Soft Machines design removes the one-to-one correlation between cores and threads. One thread might utilize both cores. Or the system might allocate resources such as an adder or a multiplier, from another core. This allows a complex thread to have the largest share of resources for the time it takes to execute a particular task. Thus, by spreading itself across all of the chip’s resources, execution time is reduced significantly.

Moreover, this design abstracts the cores as well as the instruction set architecture (ISA). The software layer primarily converts from a guest (ARM, x86, Power, MIPS, etc.) to VISC instructions for compatibility purposes. It doesn’t provide the performance and scalability. The hardware virtual cores give the performance scalability.

Previous architectures from other vendors such as Transmeta and Nvidia (Project Denver) relied on the software layer for performance, but VISC relies on the hardware for performance and software for compatibility.

Abstracting the ISA is key for broad market adoption because it allows any instruction set to run on the virtual machines. However, for it to run on these architectures (vs. the Soft Machines product) requires some minor modifications to these architectures. “If someone has a natively designed ARM or X86 core, it would be very difficult thing to add virtual core support into their existing designs.” said Abdallah. “But the modifications are simple and do not require any extensive redesign.”

VISC drill down
Like any multi-core processor, virtual or otherwise, the advantage comes when a single thread can be processed across multiple cores. Unfortunately, most code is written as a single thread. So spreading it across multiple cores requires a bit of creativity. The way Soft Machines does that is by using a global front end to control and route VISC instructions to each core based on dynamic load balancing. (See Figure 1).

By dynamically allocating multiple core resources to a single thread allows single-threaded code to execute faster. The relationship isn’t linear, but there is significant improvement.

However, not all threads are created equal. Some are thin and streamlined, others fat and heavy. So the processor has to be “smart” as well as multi-core and be able to analyze the code, knowing what is going on in each core and thread. Then it has to dynamically route, reroute and balance the load across the cores, whether virtual or physical. It just so happens that is easier and much more efficient across virtual cores.

Soft Machines claims its VISC architecture does that. It is flexible enough to take a fat thread and steal cycles from the second VISC core, and give it the equivalent of 1.5 cores, for example. Then, the rest of the second core can run a lightweight thread.

Abdallah explains it this way. During rush hour traffic, one side of a highway is frequently more congested than the other. So imagine that during heavy process thread activity on one of the processors, the “other” side of the highway, the second processor, can allocate some “lanes” to the core that can use the additional resources for the heavy thread. The concept is to dynamically allocate resources among cores to optimize both efficiency and power utilization, thereby fully utilizing both cores.

There is overhead involved, which is why the gain isn’t linear. And not all threads are well coded. To be 100% efficient would require code to be optimized to the VISC architecture, which is not the case today.

Another design metric is that each VISC core has a relatively high IPC. That translates into more efficient and better performance, and at lower clock speeds.

Security issues
Because virtual processors are still in the experimental stages, it is difficult to say with certainty what the security issues will be. But it is safe to say at a minimum, that they will have many of the same issues standard processors and similar chips have.

However, the fact that such processors are dynamic and process code differently than with serial methodology virtualization generally adds more transition layers. More layers generally means more attack surfaces, but Soft Machines believe it can maintain security at the virtual core level.

Considering that such virtual processors have both hardware and software elements (more on the software side than standard processors), the best solution may be a combination of both. For example, on the software side there can be a piece of code that has an encryption algorithm in it, and the key to the algorithm is hidden there. If the code is being transferred from one of the new stages that are used for virtualization to another, it can become vulnerable at the transition point, or edge as it is also called. If that point isn’t well secured, it can leak some clues about what is going on with the code. And given enough time and resources, it is possible to extract the key from the code.

The hardware side is pretty standard, with the usual potential soft spots such as design flaws, back doors, and level of encryption. So while the focus on virtual processors is on enhanced performance, security also should be a top priority and considered as part of the design, and not after the performance is achieved.

The potential of virtual processors is staggering. It may well be the technology that lays the foundation for a radical shift in technology that the chip industry is looking for—or it may be just one more interesting idea.
Being at the technology wall is challenging. New applications pop up all the time that need new and innovative methodologies to become viable. The cloud, the IoX, expanding mobile platforms, smart devices, and many more innovations are going to present new processor challenges that will require agile, efficient and scalable designs. Virtual processors may be an answer.

Related Stories
Heterogeneous Multi-Core Headaches
Using different processors in a system makes sense for power and performance, but it’s making cache coherency much more difficult.
Designing SoCs For Hybrids
It’s a whole new ballgame when it comes to designing SoCs for hybrid electric vehicles, from regulatory to technical to ecosystem challenges.
Coherency, Cache, And Configurability
The fundamentals of improving performance.

Leave a Reply

(Note: This name will be displayed publicly)