An Architectural Choice Overdue For Change

Risk aversion is often the reason why architectural decisions get made even though some different options may provide advantages.

popularity

The past appears to be a lot simpler than the present and when we look into the future, the right decisions often look highly uncertain. This is the value of hindsight, but also includes the notion that the winner gets to write history. What semiconductors look like today could have been very different if different decisions had been made 20 years ago. What if the industry had adopted a parallel language rather than C? What if the basic architecture of the processor had been different? These would have had a profound impact on the industry and the way in which everything is done.

There are many other architectural choices that have paved the way forward, not quite as dramatic as the first two examples, but important nonetheless. One of those decisions is when to implement something in hardware or in software, or in one of several options in between. This is an area in which the decision process appears to be based on very sketchy information.

Twenty years ago, the notion of a processor and an ASIC were worlds apart. The hardware/software partition was defined and the interface between them created. The decision about what to put into hardware was based on performance. If it would take too long to run in software, custom hardware was built.

This led to extensive research around the notion of hardware/software codesign. “It reminds of me of the place that ESL was a number of years ago,” recalls Drew Wingard, chief technology officer at Sonics. “We didn’t have any shortage of ideas about algorithms that could be used to optimize the tradeoffs between hardware and software, but we horribly missed having sufficiently accurate models around which those algorithms could be run. So we had the synthesis algorithms but lacked the models. Synthesis attempts to explore a range of options in order to achieve a goal or objective function, such as minimizing the energy. Without accurate models you don’t know the tradeoff space. When the synthesizer compares two options it has to know which one is better.”

The notions of hardware/software co-design were largely abandoned, leaving the decisions to the system architects, but today those architects have many more options to choose from. “It is a spectrum: a general purpose microprocessor at one end, which has a ton of flexibility but is not very power or performance efficient, and at the other end is hand coded RTL,” explains Matt Gutierrez, director of marketing for the Solutions Group of Synopsys. “Specialized hardware is often necessary to meet performance goals, especially if the processor is being shared by multiple high-bandwidth applications that would cause potential degradation in performance or contention for performance. The reason against hardwired solutions is time to market. If a processing function can be bought off the shelf with an existing ecosystem, then it is more efficient for an architect to look at this solution.”

Pranav Ashar, chief technology officer of Real Intent, adds another way in which the decisions are made. “Custom hardware works better when the algorithm can be designed to operate in a streaming dataflow, or has many parallel operations, each on a small amount of locally generated data that is quickly discarded. Custom hardware does not work well when the algorithm generates a lot of temporary data that must be maintained for an extended period and the control-data-flow depends on the temporary data.”

New considerations are becoming important in the decision making process. “Security can dictate choices,” says Bernard Murphy, chief technology officer for Atrenta. “For example, encryption in hardware can be more secure than software.”

Between the two ends of the spectrum, a number of options exist. The first of these to see widespread adoption was the Digital Signal Processor (DSP). DSPs contained, along with other differences, additional instructions used to perform functions, such as multiply, much faster than a general-purpose processor. Over time, DSPs were tuned for specific applications.

Today, an SoC contains many functions, each of which would have been a separate piece of silicon in the past. “Processors, in a variety of different forms, are showing up to perform a number of functions on a chip,” points out Gutierrez. “It all depends on the functionality of the chip as to how many different processing functions, but increasingly we are seeing different kinds of cores used to perform specialized functions.”

Power is another reason for making the choice. Gutierrez explains that “the host processor tends to be large and power-hungry, and so certain things such as voice detection, speech recognition, even low-end audio, do not need a large processor turned on for this purpose. So we see embedded processors around the main processors enabling functions to be offloaded to more power efficient processors.”

But the choices are not always clear. “If a special-purpose instruction-set-processor is more suited for an application than a general purpose processor, you might as well go whole hog and use a custom hardware block and do away with the overhead of instruction-set processing altogether,” says Real Intent’s Ashar. “When the algorithm on the general-purpose processor and custom hardware are about the same complexity, power reduction with customized hardware comes from the ability to obtain the same performance at a lower clock frequency, allowing for the customized hardware to be operated at a lower Vdd, and to some extent, from being able to avoid the overhead of instruction processing and the general-purpose register and pipeline architecture of a processor.”

Many times the decisions are based on the degree of certainty of functionality. “For a lot of consumer SoC-level applications, we do know, in general terms, what we need to run,” says Wingard. “The whole idea of partitioning the design into sub-systems identifies the functions and in each of these we find we can get more optimal results from using some software and a fair amount of custom hardware.”

One option selected quite frequently is the adoption of special-purpose processors that have been tuned for these applications, much like DSPs were in the past. But today this optimization can be done by the designer using specialized tools that come with the extensible processors.

brian2
Example of energy and cycle count reduction running a sensor application with ARC Processor EXtensions (APEX)

An underused option
One possible solution that should exist in the spectrum of possibilities is the use of field programmable gate array (FPGA) types of structures. “An FPGA shines when you are building the hardware and you have very little idea about what the hardware will need to run,” says Wingard. “This provides you with the maximum flexibility, so long as you have enough FPGA gates to use.”

“The programmability of FPGAs makes them extremely attractive and for some designs they are a good choice,” says Gutierrez, “but they are relatively large. For applications that run on batteries an FPGA is usually not a practical choice.”

Atrenta’s Murphy also sees advantages in FPGA structures. “Anything that requires extensive configurability, such as bug fixes or changing standards has to be in software or FPGA. A good example is software-defined radios that need to support multiple standards.”

A study performed by analysis company BDTi in 2009 showed that an FPGA can significantly outperform a high-end DSP on computation-intensive, highly parallelizable tasks and can beat DSPs in terms of performance per dollar. Their report showed a 40X performance improvement and 30X reduction in cost/performance. While they were unable to evaluate power tradeoffs, other reports have suggested that an FPGA would perform the same function for less total energy, but the results were based on power estimates given by tools in which they had a low level of confidence. In addition, no reports have been made for an on-chip integrated FPGA structure.

While SoCs may not be favoring the inclusion of FPGA structures, Intel has become a believer and in June 2014, Intel announced a Xeon chip with integrated FPGA and claims a 20X performance boost.

Today, FPGAs are not common in an SoC, but the SoC has come to the FPGA. “The availability of cache-coherent multiprocessor clusters in FPGAs has enabled a whole new generation of designs without the cost and fabrication time of custom chips,” says Tom Anderson, vice president of marketing for Breker. “However, there are at least three cases in which the SoC architect may prefer a custom implementation despite the progress in FPGAs. Chips that need the maximum capacity of an advanced process node will most likely exceed the capabilities of a programmable device. SoCs that require extremely fine-grained control of power and voltage domains may also outgrow FPGAs. Finally, proprietary high-speed I/O requirements may push a design from an FPGA into a process that supports customized I/O cell design.”

Making the Decision
The question remains: how does an architect make an informed decision about which option to use? “Virtual Prototypes are being used for doing hardware / software tradeoffs,” says Synopsys’ Gutierrez. “Even putting aside how something may be implemented, the VP allows you to run real application software and use this to determine if you will meet performance goals and to trade off hardware or software.”

But not everyone is convinced we’re far enough along in this regard. “I still don’t think we have the models to enable us to make many of the tradeoffs,” says Wingard. “Should we use a special-purpose engine, a configurable instruction set processor, or maybe some FPGA-like structure. It is a very difficult tradeoff space.”

When a problem becomes well enough understood, informed choices for the most optimal structure can be made, but when the problem is not well understood, we tend to lean towards more general purpose solutions. The challenge is that there are very few times when all of the necessary information is available to make the optimum choice. Instead, the choice remains based on what worked in the past.



Leave a Reply


(Note: This name will be displayed publicly)