Part one in a series. Processing architectures continue to become more complex, but is the software industry getting left behind? Who will help them utilize the hardware being created?
Eleven years ago processors stopped scaling due to diminishing returns and the breakdown of Dennard’s Law. That set in motion a chain of events from which the industry has still not fully recovered.
The transition to homogeneous multi-core processing presented the software side with a problem that they did not know how to solve, namely how to optimize the usage of the compute capabilities made available to them. They continue to struggle with that problem even today. At the same time, many systems required the usage of processing cores with more specialized functionality. The mixing of these processing elements gave us heterogeneous computing, a problem that sidestepped many of the software problems.
Homogeneous compute systems were found mainly on the desktop and in the datacenters, whereas heterogeneous systems were found primarily in mobile phones. Today, datacenters are deploying GPUs as processing elements and more recently field programmable gate arrays (FPGAs). At the same time, mobile phones have significantly expanded the capabilities of their applications processors, making them multicore. The complexity change does not stop there. Even with applications processors, cores with different operating characteristics are being integrated together that share an instruction set and programming paradigm. ARM‘s big.LITTLE is just one example of this.
Most designs today are multi-core CPUs, with some well-coupled elements such as GPUs. “That subset of the die is highly coherent and runs in virtual memory space,” says Drew Wingard, chief technology officer at Sonics. “Then there is the rest of the chip where we still have lots of heterogeneous hardware. They exist because they can do a much better job getting to the frequency, throughput, power and area targets of the application. The best model we have is to abstract those into sub-systems. At the top level, you have a loosely coupled selection of sub-systems, and that model is practical and works.”
It’s also shifting. “We are seeing both homogeneous and heterogeneous processes happening, but it is getting more heterogeneous,” says Tom De Schutter, director of product marketing for physical prototyping at Synopsys. “People want to get more specialized, and rather than one processor for everything it is becoming more specific.”
Tipping the scales toward heterogeneous
The reasons for heterogeneous elements are fairly clear within the hardware space. “The problems remain largely unchanged,” says Chris Jones, product marketing group director for Tensilica IP in Cadence. “Design teams are tasked with delivering maximum performance with minimal power consumption and beating the competition to market.”
Abstraction creates separation. “This was part of a divide and conquer design philosophy that enabled reuse of software and took advantage of chip scaling more than anything else,” says Larry Lapides, vice president of sales for Imperas. “You had a simple mailbox system of communication between them. Since then we have seen an increase in the need for communication between some of the heterogeneous processors.”
The means of communications also has changed. “The migration of architectures is a function of the evolution of communications between the heterogeneous parts,” says David Kruckemyer, chief hardware architect for Arteris. “We have had a processor and DSP or a GPU, and now the nature of that communications has changed from being an interrupt model to a shared memory model. That is where cache coherence and other programming paradigms come in to play. This is a major trend within the industry. You could get away with a lot of sins in the past just by adding more processors and doing everything in software, but that strategy appears to be running out of steam.”
Yet just bringing the cores into a state of coherence does not solve the problem. “The challenges that we are facing have moved, but we are still stuck in the homogeneous CPU-centric view of the world,” says Anush Mohandass, vice president of business development at NetSpeed Systems. “We call it divide and conquer, where people cut the problem into 10 different pieces and pretend they do not talk to one another. They are designed individually, stuck together, and then you hope it gives you the performance that you need. This approach creates holes on the verification side, holes in the performance side, and ultimately holes in what we deliver to our end customer. You have to look at the system as a whole. The techniques of divide and conquer do not scale.”
So what comes next? The answer isn’t entirely clear, a problem being made worse by a lack of expertise in this area.
“Heterogeneous multiprocessing is still very much a specialized domain for a few experts,” says James Aldis, architecture specialist for verification platforms at Imagination Technologies. “As an industry we definitely haven’t succeeded in making heterogeneous platforms programmable for the average software engineer. In fact, even homogeneous multiprocessing is not mature.”
Still, this does not encapsulate everything that is going on. Deciding which pieces of the software should run on which homogeneous core is the old problem. Several new ones are arising. “Today it’s difficult to optimize or scale GPU+CPU applications, or to port from one SoC platform to another,” continues Aldis. “Software programming models for coherent heterogeneous processors aren’t standardized, and even when they are, they are different for different types of processors, making heterogeneous applications very inflexible and non-portable. You need to decide before you code your app which parts are on GPU, which are on DSP, and you probably even need to know whose DSP and whose GPU you will be running on.”
It can even be questionable what optimization means. “Most of the current scaling has been in the homogeneous portions of the design but it is started to get more heterogeneous in nature,” says Kumar Venkatramani, VP of business development at Silexica. “What does efficiency mean in this context? It could mean the fastest throughput, or the shortest latency or cost. Another parameter might be power, and that itself can break down into multiple things such as peak power or average power. So it is a multi-parameter optimization problem.”
Warren Kurisu, director for product management and marketing for the Mentor Graphics Embedded Software Division, agrees. “The trend toward heterogeneous multicore architectures started a few years ago and is accelerating like crazy. How do you connect the different programming paradigms? How do you develop on them within the context of a massive heterogeneous system? It is a system architecture and optimization problem when you are attempting to consolidate different kinds of functionality onto the same, similar or disparate types of cores. What is an optimized architecture from a software perspective?”
There are two schools of thought developing. “The best thing we have come up with so far is the sub-system abstraction, which says we should put small processors into these sub-systems so that the code that runs the sub-systems lives with the sub-system, and from the perspective of the top-level design all it sees are a couple of relatively simple APIs,” says Wingard. “The hardware that we are talking to may be complex, but we can still build simple APIs.”
The other end of the scale is looking for the ability to dynamically move code to the best possible processing solution that is available and meets all of the dynamic operating requirements. “When you think about an SoC platform, you have to think about the programming model first,” Wingard says. “Because of that, we see increasingly large numbers of platforms where the central processing complex is friendlier toward software developers. We see cache coherence that has gone into the central processing complex. We see increasing numbers of cores with best-in-class programming models available to us.”
But to make full use of this requires a change of programming paradigm, which has not been developed even for the homogeneous programming space. “Early on, we saw that while semiconductor manufacturers were mixing in all of these cores, some of them were not sure how to position the use case of them,” says Mentor’s Kurisu. “As semiconductor manufacturers continue to design and develop even more complex systems, they are thinking more about the software architecture and the ways to support it.”
So who is really driving the architecture? “There is a shift in who drives the specification, who drives the requirements and how they are being tested,” says Synopsys’ De Schutter. “Organizationally, it takes a while to change and we are seeing differences throughout the industry. The balance has shifted and the types of collaboration have shifted.”
Sometimes, the race to market can cause certain decisions to be made. “Hardware and software teams must collaborate at the architectural phase to determine which parts of the design should exist as hardware vs. software,” says Cadence’s Jones. “The more programmable the design, the shorter the verification time and the faster time to market. That is accomplished most easily by employing highly optimized, programmable offload engines.”
But that may just be moving the problem. “Hardware teams are discovering that effective heterogeneous platforms require much more complex interaction between hardware components (cache coherency, synchronization, etc.),” points out Aldis. “The hardware protocols are richer and require much more verification for full coverage. Exercise of these protocols may not be possible without significant software being available and mature pre-silicon. The days of LEGO-like hardware building will soon be over.”
And legacy always has to be considered. “Unless you are in the rare circumstance that you are starting from a blank sheet of paper, you always have software that has to run,” says Wingard. “What you did before has to work on the new stuff. That is table stakes for being in the business. That does not mean that the software people are driving things. It means that you have a seat at the table and may have some requirement that goes into the process.”
But before the ideal solution has been achieved, progress can be made. “At the end of the day, continuous integration of hardware and software becomes critical,” says Frank Schirrmeister, senior group director for product management in the System & Verification Group of Cadence. “As a result, the industry is working with high effort on “shift-left” technologies. The number of developers touching both sides is still pretty limited, though. Even if an issue is found, moderation between the software and hardware stakeholders can be an experience rivaling marriage counseling.”
Ultimately, shift left brings you to people who can look at the entire problem. “We still see a fair degree of separation between the hardware and software teams, and these barriers have to be broken down,” says Lapides, noting that some of the system perspective that used to exist has been lost. “The space race brought teams together. Hardware and software worked closely together even though software was fairly minimal. It was all about systems engineering.”
Some say that is still the case. “The guys who are winning in the product space are the ones who understand the whole system, not just hardware or software,” says Mohandass. “We lack the means to translate what the system guys need to the software guys and what the software guys say to the hardware guys. We do not have a solution today. It may require cross-pollination of skills.”
Lapides believes there is a hero culture for the semiconductor designers. “The software guys never evolved in the same way. The semiconductor industry looked at the defense industry and saw massive amounts of overhead and decided that they needed to get away from that. They threw out most of the methodology and culture of that industry and got rid of the things that were good as well. We need to get some of that back.”
Aldis agrees. “There is a definite shortage of software engineers who can cope with the level of heterogeneity, or hardware engineers who understand the constraints of modern software. Trying to square these circles without slacking off on the pressures of product roadmap delivery requires a mix of platform architecture definition, standardization, open-source collaboration, and a wide range of partnerships with hardware and software vendors.”
Part two of this series will explore some of the methods that are being used to help solve these problems both in the short term and long term.
Heterogeneous Multi-Core Headaches
Using different processors in a system makes sense for power and performance, but it’s making cache coherency much more difficult.
How Cache Coherency Impacts Power, Performance
Part 1: A look at the impact of communication across multiple processors on an SoC and how to to make that more efficient.
Coherency, Cache And Configurability
The fundamentals of improving performance.