The Quest To Better Define Applications

Understanding how software utilizes hardware is changing how SoCs and systems-in-package are designed and used.


By Ed Sperling
For nearly five decades, just being able to get software to run on hardware and communicate with other systems was considered a feat of engineering. But with that part of the technology solved well enough, the next big challenge is to make sure that applications can run as efficiently as possible to maximize performance, minimize power consumption and limit the area required to do both.

These tradeoffs of area, power and performance have always been the basis of semiconductor design. What’s new is those variables need to be optimized to deal with a much higher level in the software stack than in the past. That means a more customized match-up between hardware and software, which has always been difficult because it involves multiple companies, different specialization and completely different tool sets, languages and even engineering cultures. But what’s becoming apparent is that in some cases, one processor—or at the very least one core—doesn’t provide enough performance to get the job done, while in others a full-scale processor provides too much in terms of cost, performance and the area required to deliver that performance.

“If you know what applications you need to run you can define what processor to run them on,” said Chris Rowen, chief technology officer at Tensilica. “But the more narrowly you define the task the more you gain in efficiency. Sometimes that can be a factor of 10 or 20 in terms of performance, area and power.”

At the two extremes of design are general-purpose processors and hard-wired logic, which actually doesn’t require a processor at all. The big changes are coming in the middle, where SoCs are being designed with multiple processors for specific purposes, or as a system-in-chip where those specialized chips are created separately.

Building a data path
This isn’t a simple change, however. Making a single processor work effectively is hard enough. Making multiple processors work together is even tougher, in part because data all needs to be integrated with each other and there needs to be cache coherency across multiple processors. This has opened up a slew of new opportunities, though, particularly for companies that can provide the glue and flexibility needed to create dynamic data paths.

“The total number of hardware blocks is getting to be so big that you want the big CPUs optimized around the best-case performance,” said Drew Wingard, chief technology officer at Sonics. “You don’t want perpetual interrupts because you want your battery to last as long as possible.”

One piece that has become particularly useful in this regard is what’s known as a maintenance processor. It’s used to service other processors. In the memory world, this processor is used for repairing other processors and keeping track of where the errors occur. In the virtualization world, it’s used to manage hypervisors. For these kinds of devices, software is either embedded or custom-coded, but functionality is purposely kept limited.

All of this has to work over some sort of network, as well. ARM’s Amba bus provides some of this capability, mostly tied to ARM cores. Outside of that, both Arteris and Sonics are battling for marketshare in the growing network-on-chip arena, which is a more sophisticated approach to a network fabric because it provides more programmability and flexibility at the front-end of the design, as well as throughout the development process.

“The fabric is important to put things together,” said Wingard. “Unlike a lot of IP and other parts of the chip, the network is not re-usable.”

Moving up a level of abstraction from there, all of these pieces have to work seamlessly together in all possible states. That requires the integration of hardware blocks, various processors running in various states, as well as the software. Even though a particular processor core may be optimized for a specific application, that application needs to share data with other parts of the chip.

“The interesting opportunity is in how you can get it all to work together,” said Mike Gianfagna, vice president of marketing at Atrenta. “The problem is that you need to run some software across the complete architecture where you synchronize the clocks, and with scenarios that have power-domain sequencing. If it doesn’t work, you don’t rewrite the software. You change the hardware. That’s a tremendous opportunity and from what we’re seeing the trend is real. But it also means that each sub-block has to be harmonized, and you need to be able to do it quickly and in a fast and iterative manner.”

So how exactly do you design a fully optimized system? The history of semiconductor design has largely been hardware first. But there are rumblings throughout the industry that it may be software first, particularly with the more popular applications.

“No one writes software anticipating the hardware,” said Tensilica’s Rowen. “It’s always applications first. But as you move down the stack you need to look at what the processors are doing. There may be a variety of ways of getting to the lowest cost, lowest power, with more freedom to define both.”

But given these goals, a better understanding how software utilizes the hardware would be a welcome benefit. Hardware and software engineers may never speak the same language, but their goals are becoming much more intertwined.