Flexible, Energy-Efficient Neural Network Processing At 16nm

How eFPGAs measure up on inferencing.

popularity

At Hot Chips 30, held in August in Silicon Valley, Harvard University (Paul Whatmough, SK Lee, S Xi, U Gupta, L Pentecost, M Donato, HC Hseuh, Professor Brooks and Professor Gu) made a presentation on “SMIV: A 16nm SoC with Efficient and Flexible DNN Acceleration for Intelligent IOT Devices. ” (Their complete presentation is available now on the Hot Chips website for attendees and will be publicly available on the website for anyone in December).

The researchers were interested in evaluating architectural tradeoffs for implementing efficient and flexible hardware acceleration for DNNs (deep neural networks) inference for edge devices.

There is a tension between flexibility and efficiency. They wanted to test the tradeoffs with different architectures.

One of the architectures they wanted to evaluate is eFPGA. They asked us for a 2×2 EFLX4K array: 2 DSP cores and 2 Logic cores for a total of 14K LUT4s and 80 MACs (22×22 multiply with 48-bit accumulators) and 44Kbits RAM in the array. The eFPGA array connected directly to a 128-bit AXI4 bus.

The EFLX array was delivered and integrated into their SoC in less than 6 months by a team of 7, who also implemented all of the rest of the SoC, which totaled 25mm² in TSMC16FFC packaged in a 672-pin flip-chip BGA package. Flex Logix supported them with a weekly phone call and occasional emails. Integrating EFLX eFPGA is straightforward and similar to integrating an SRAM array.

The whole chip including the eFPGA worked first time.

Besides eFPGA, three other compute blocks were implemented: an ARM Cortex-A53 CPU cluster; Cache-Coherent Datapath Accelerators (ACC); and a Near-Threshold Always-On Cluster (AON).

Of these, the AON is not reconfigurable. The other three are.

For Neural Networks, flexibility is practically a must. The rate of change in algorithms is dramatic and any silicon that is designed must be expected to need to run neural networks that weren’t invented when the silicon was architected.

To test the efficiency of the sub-blocks, representative DNN kernels were executed on all 4 blocks.

The most efficient block for energy and area was AON, but it is a totally fixed architecture.

Of the flexible DNN engines, the eFPGA array achieved energy efficiency much better than the other two—4.5X the energy efficiency of CPUs.

Regarding area efficiency, eFPGA was similar to CPU but less than ACC.

This is an impressive result for eFPGA considering it is not optimized for DNN inference.

So how is an eFPGA optimized for neural network inference?

First, the existing 22×22 MACs would be replaced with 8×8 MACs which could optionally be combined for 16×16. Approximately three 8×8 MACs would fit in the same area as the 22×22 MAC. This would increase energy efficiency and area efficiency significantly.

Second, an 8×8 MAC would run faster than a 22×22 MAC since the critical path would be shorter.

Third, the density of MACs could be increased by having more area dedicated to MACs and less to LUTs.

The result would be an increase in GigaMACs/second of >10X and increases in energy efficiency and area efficiency that would make this the best option of the flexible choices.

Flex Logix is working now to implement such an AI-optimized eFPGA architecture. We’d like to thank Harvard University for their hard work and impressive results. Their team was a pleasure to work with. We are excited that they are working now on a second chip that will use EFLX eFPGA to further their research.



Leave a Reply


(Note: This name will be displayed publicly)