Age Of Acceleration

Focus shifts from the fastest processors to faster processes.

popularity

A shift from the fastest processors to accelerating specific functions is underway, supplanting an era of dark silicon in which one or more processor cores remain in a ready state whenever a single core’s performance bogs down.

In effect, the dark silicon/multi-core approach is being scrapped for many functions in favor of an accelerator-based microarchitecture that is far more granular. The advantage of accelerators—which include DSPs to FPGAs, eFPGAs, GPUs, and even microcontrollers—is that they can be appropriately sized for a discrete task, with just the right amount of memory and throughput to speed up a particular function. And in many cases, chipmakers are adding programmability into the mix to help future-proof designs, because many of the end markets for which these chips are being designed are still in transition.

This trend isn’t new. Back in the early days of the PC era, Intel decided that it could achieve better floating point performance of its 8086 processor by adding in a co-processor. The company introduced the 8087 in 1980, and repeated that strategy for the next three generations of chips up through the 486/487.


Fig. 1: The 80387 co-processor. Source: Wikipedia.

After that, Intel combined floating point and general-purpose processing into a single Pentium chip. But starting in 2006, which is when Moore’s Law began showing the first signs of stress (whether this appeared before the 1 micron litho wall depends on who you talk to), Intel began adding cores rather than processors to offset the thermal problems associated with ratcheting up the clock frequency.

That was 2006. The iPhone was introduced the following year, setting the stage for more power-efficient processing. A decade later, the smart phone market is flattening, and all the next big markets for semiconductors—cloud, IIoT, IoT, virtual/augmented reality, automotive, medical, machine learning and AI— are still in the formative stages. Nobody is quite sure how those devices will look, behave and interact. But what those markets will ultimately demand in terms of chips will likely be very different from each other. And perhaps more important, there is uncertainty about how they will change and whether they ultimately will become more personalized.

The result is that, at least for now, a one-size-fits-all strategy no longer applies. And considering that for all of these markets, energy efficiency will be essential, architectures need to be created specifically for these markets. The fastest and least-expensive way to do that is by creating discrete processing units with programmability built into those devices, and new memory architectures that can handle big increases in the volume of data for all of these devices.

Processor makers have seen this coming for some time. Intel’s Turbo Boost, has been around since the beginning of this decade, allowing the clock speed to ramp up as needed for short periods of intensive computing. The acquisition of Altera provided yet another option, allowing Intel to connect small FPGAs in a package with its x86 processors. Intel has been hinting at this for the past couple of years, utilizing its Embedded Multi-Die Interconnect Bridge to connect them together. Samsung’s RDL-level interconnect is a similar approach.

ARM’s DynamIQ architecture is another approach to this problem, expanding on its big.LITTLE heterogeneous multi-core approach to create mini compute clusters with dedicated processors for machine learning and AI. And Synopsys’ ARC processor cores have been configurable for years.


Fig. 2: ARM DynamIQ. Source: ARM

The common thread here is that no single chip does everything well, but if everything is unbundled these chips can be designed to do one or two things extremely well. And rather than creating everything from scratch, it’s much easier to utilize what’s already available and develop less-expensive, highly targeted and customizable accelerators to reduce power and increase performance. That can come in the form of an FPGA, embedded FPGA, DSP, or some other programmable logic built into the chip or package.

As these markets expand and mature, the most efficient and least-expensive approaches ultimately will win out. But for the foreseeable future, meaning at least the next five years and maybe even the next decade or more, accelerators will be a big growth market for processors, as well as for I/O, memory and the IP necessary to support those processors.