Integration of different operating systems across SoCs with a focus on power is forcing some unusual combinations—and lots of headaches.
By Ed Sperling
Software is the next big target in the quest to make electronics more energy efficient, but it’s proving a far bigger challenge than most systems architects originally believed it would be.
There are several very large big problems to deal with in software. Writing efficient code for small processors isn’t one of them. In fact, the proliferation of small processors across an SoC makes it easier to deal with at least a portion of the software software. Code can run directly on the bare metal, some of it can be nothing more than an executable file, and still other code can run on a real-time operating system written for a specific purpose or even on slimmed down versions of operating system code.
But bringing all of this code under the control of an SoC is another matter, despite the fact that this is the best way to manage power and minimize physical effects in a chip. Solving this problem requires integration and coherency across a chip, which in turn requires software architects and system architects to work together up front. This may be a goal among companies, but it certainly isn’t a reality.
“You need coherence to develop a high-end software design,” said Dan Driscoll, Nucleus software architect for Mentor Graphic’s Embedded Software Division. “At this point integration is a large portion of the effort, and the problem has yet to be solved. One thing that helps is a single development environment. If you use multiple profiling tools it’s more difficult to pull that together into a system.”
Devils in the details
Just understanding the interactions between various hardware portions of an SoC has far exceeded human limits in complex SoCs, even at mainstream process nodes. Most companies use a block or subsystem approach to deal with this complexity, working on smaller pieces and then assembling them into the whole and hoping it works as a single system.
Software increases the complexity by orders of magnitude, because an increasing amount of software now controls functionality across the chip. It determines what remains on, what gets turned off, in what sequence, at what speed, and what gets priority. It also determines how much power and memory can be allocated to a given function or logic subsystem—at least in 2D designs. (In stacked die, it may be possible to dedicate portions of memory to logic blocks to minimize this issue).
“This is the job of the controller software for the overall system,” said Frank Schirrmeister, group director for product marketing of the system development suite at Cadence. “You tell it to execute this API or put data over here. This is a high-level sequence, and it can do connectivity between different cores of a processor. You also can add up the energy transactions and memory transactions that will trigger.”
Multi-core, many-core, and multiple processors
A second big problem stems from the types of processors being used. The ability to write software applications that can take advantage of multiple cores is an old and well-understood issue—about four decades old, in fact. And while it’s easy for processor makers to add more cores onto a piece of silicon and hand it off to applications developers to deal with, the reality is that most applications cannot be parsed to take advantage of more than eight cores, and in many cases the number is likely to be fewer than four.
Databases, scientific calculations and graphics rendering, where there is extreme redundancy, are the exceptions. Even some games can have functionality parsed across cores. For most other applications, though, the limit it probably two to four cores. And if these cores are running popular general-purpose operating systems such as Windows, Mac OSX or Linux, chances are pretty good that it’s not the most efficient implementation of a function even though it may be the most convenient.
RTOSes have been used by the military for decades as a much more energy-efficient alternative, although most of that work was far less concerned about the energy than about security and performance. Their shift into commercial applications such as mobile phones makes them especially suitable for managing specific functions on separate processor cores in an SoC. It doesn’t make sense, for example, to utilize a multicore general-purpose processor for audio enhancements, and if it isn’t running on a general-purpose processor then it probably doesn’t need a general-purpose OS, either. But those functions still have to work with other parts of the chip without affecting signal integrity or creating hardware proximity effects such as heat, ESD and electromigration.
“The idea of SMP (symmetric multiprocessing) beyond 8 to 16 cores is not realistic for most applications,” said Mentor’s Driscoll. “We’re almost stuck with AMP (asynchronous multiprocessing) as part of large multicore implementations. But we’re seeing cases where you may have a TI OMAP 5, running a dual-core ARM Cortex A-9, an A4 and a DSP. You may have six or seven cores, and a general-purpose operating system going through this part of the system. That operating system may control other DSP interfaces, including RTOSes.”
Verification and testing brain freezes
This approach leads to another problem, though. How do engineering teams verify and test this complex SoC, which now may include multiple types of processors and processor cores, various types of software, and a central software management scheme that probably involves a standard operating system? There may even be middleware making some of the connections, and in homogeneous environments possibly even a virtualization layer that may include hypervisors that can run on bare metal.
“The first thing you have to deal with is a traffic debug issue,” said Cadence’s Schirrmeister. “In many cases, the partitioning may happen by hand. But how you pull this all together may affect your debug strategy. Tensilica presented an extreme example involving a printer design, where they had a block diagram of the functionality and the cores. The printer company used Tensilica cores, which allowed them to replace the functions done in RTL with programmable functions. The connections worked, the memories worked, and the functionality was done in software as bare-metal, low-level software.”
There’s a tradeoff in doing that, however. Driscoll said that pushing functionality down to lower-end processors makes integration more complex. In addition, measuring power consumption becomes more difficult because it means adding up energy transactions that the memory transactions will trigger.
“That means you need data to verify what works at the block level, the subsystem and in the overall system,” Schirrmeister said. “And some chips have processors you can’t access from outside for security reasons. You need flexibility in the software because of security, but you are not allowed to see it from the outside.”
Conclusion
While there has been much attention devoted to finding a common language between hardware and software engineers, the real path forward may be more focused on matching goals at the architectural stage, and then being able to swap information as a design progresses.
Virtual platforms that allow software to be developed earlier in the process help. So do some of the features that are being built into RTOSes these days. In addition, stacked die will help eliminate some issues, while creating new ones. But the real challenges will continue to be integration of hardware and software, and of various types of software with other software—with an eye toward remaining within a power budget and understanding how code affects energy consumed over time.
Leave a Reply