Hardware Accelerators Earn Their Keep


With the proliferation of multicore chips, hardware accelerators maintain their usefulness.

By Ann Steffora Mutschler
Hardware accelerators have been used for years, but with the proliferation of multicore chips and SoCs their use is evolving.

Multicore processors have reduced the reliance on hardware accelerators, but that doesn’t mean the number of hardware accelerators is shrinking. The insatiable demand for performance while also reducing power consumption means that accelerators are still needed in multicore environments, said Abhishek Ranjan, senior director of engineering at Calypto.

This need is primarily driven by smartphone and tablet markets, because most of the modern smartphones and tablets are multicore and run very sophisticated applications such as image and speech recognition and multi-media games. Those devices have ever-increasing performance requirements as well as low power requirements. Hardware accelerators provide additional speed-ups by utilizing special purpose hardware to perform critical computations, which also more power efficient than a software implementation on a multi-core, Ranjan said.

“For SoCs or multicore SoCs, we definitely see more functions having dedicated processing all on the same chip,” observed Pat Sheridan, senior staff product marketing manager in the system level team for Synopsys. “These could be represented by a CPU core from one of the standard vendors or a GPU that has within it multiple cores, or it could be accelerators that the designer is adding to this architecture—and those could be programmable, like the processor designer functions, or they could be just dedicated hardware. When the architect has a specification for a product, they have to look at if they are going to implement the design as one SoC where they are integrating these things together, or are they going to have some things on separate chips like maybe there are some functions that they put on an FPGA or something like that.”

When talking about system-level tools, it’s about trying to enable “architecture prototyping,” which is an early phase of simulation, Sheridan noted. “Here, designers want to do some simulation early and have less risk by trying out different combinations and simulations and they don’t just have to rely on a paper spec or a spreadsheet that has estimates in it but not actual simulation results.”

Commercial tools can do that by exploring hardware/software partitioning to allow designers to examine the number of processors along with how software tasks would be assigned to different processors. This is done in a non-functional way with performance modeling that doesn’t require knowing whether there are ARM processors or GPU cores or dedicated processors. It’s a simulation method using workload models, Sheridan explained.

Still demand for offloading
Considering the traditional use for hardware accelerator, which is to offload the main CPU with a compute-intensive task for power and performance reasons, it’s the same in multicore designs today, said Markus Willems, product marketing for system-level solutions at Synopsys. “If you think in terms of multicore, multiple ARM cores or GPUs in there, there is still a significant demand to offload compute intensive tasks from those main cores into accelerators.”

However, what’s new and happening today is that these accelerators might not be the traditional hardware accelerators, which would execute exactly one function. Instead, the accelerators can be processors in and of themselves, he noted. “They could be programmable accelerators, not a processor in a straight sense with a whole RISC instruction set maybe, but really a processor that could perform a wide range, for example, of filtering operations. They could implement different algorithms that would offload the main CPUs. In a sense, this accelerator becomes a processor within a multicore environment. By any means, this offloading of compute-intensive tasks is a significant trend whenever you think about embedded devices because that’s the only way to address the power efficiency requirements.”

This is significant because by tuning the instruction set of a processor, “you can be way more efficient in terms how you are accessing data and the number of cycles you will need to execute a function,” Willems said. “It’s very much like what we did 20 years ago when building dedicated accelerators. You’re trading off area typically, because you parallelize things, in return for higher performance and power efficiency. Twenty years ago when I was involved in GSM designs, we didn’t know exactly how it would look in the end so we built things in a kind of flexible way by putting in lots of parallel paths and programming it through registers. That was the way we built accelerators, and you will find them in all the GSM phones in the world. Rather than programming through register setting, that’s moving toward, ‘Why not have a real programmable machine in there that would help to keep that flexibility by combining it with the efficiency of dedicated hardware accelerators?”

Frank Schirrmeister, group director for product marketing of the System Development Suite at Cadence, agreed that hardware accelerators are not going away. “It’s really a question of what can you do better in an application-specific way from a low-power perspective. That’s often the driver. Even when you look at things like some of the multimedia algorithms for which high-level synthesis works very well, there’s always this component of the dedicated implementation that gives you better power consumption. That’s better than running it in software.”

Determining when to implement a hardware accelerator or leave it out comes down to power budget. From a very high level, there are essentially three different components in a system: the block implementation for the hardware, the software implementation for the software blocks, and the integration piece where the idea is to bring the integration in as early as possible. That’s why virtual prototyping is so important, he said.

“Then, all of this needs to be assembled,” Schirrmeister continued, “but with respect to new blocks, you will find out very fast, ‘I have this new function which I need to add to my existing chip for the next platform. What is the best way to implement it?’ Then you look at your different options and parameters and you decide based on that what’s the best implementation. Do I just add it on to some of my software, which I’m writing for the processors in the system anyway, or is it better to create a dedicated piece of hardware, which in exchange may have all kind of other interesting side effects like the interconnect getting new requirements to be able to move more data back and forth. You find this out fairly early when you do the partitioning of your system.”

At the end of the day, hardware accelerators are required for low power and often for performance as well, and it all becomes very, very specific to the application domain, he concluded.



Related posts