SPONSOR BLOG

Flexible, Energy-Efficient Neural Network Processing At 16nm

How eFPGAs measure up on inferencing.

September 6th, 2018 - By: Geoff Tate

At Hot Chips 30, held in August in Silicon Valley, Harvard University (Paul Whatmough, SK Lee, S Xi, U Gupta, L Pentecost, M Donato, HC Hseuh, Professor Brooks and Professor Gu) made a presentation on “SMIV: A 16nm SoC with Efficient and Flexible DNN Acceleration for Intelligent IOT Devices. ” (Their complete presentation is available now on the Hot Chips website for attendees and will be publicly available on the website for anyone in December).

The researchers were interested in evaluating architectural tradeoffs for implementing efficient and flexible hardware acceleration for DNNs (deep neural networks) inference for edge devices.

There is a tension between flexibility and efficiency. They wanted to test the tradeoffs with different architectures.

One of the architectures they wanted to evaluate is eFPGA. They asked us for a 2×2 EFLX4K array: 2 DSP cores and 2 Logic cores for a total of 14K LUT4s and 80 MACs (22×22 multiply with 48-bit accumulators) and 44Kbits RAM in the array. The eFPGA array connected directly to a 128-bit AXI4 bus.

The EFLX array was delivered and integrated into their SoC in less than 6 months by a team of 7, who also implemented all of the rest of the SoC, which totaled 25mm² in TSMC16FFC packaged in a 672-pin flip-chip BGA package. Flex Logix supported them with a weekly phone call and occasional emails. Integrating EFLX eFPGA is straightforward and similar to integrating an SRAM array.

The whole chip including the eFPGA worked first time.

Besides eFPGA, three other compute blocks were implemented: an ARM Cortex-A53 CPU cluster; Cache-Coherent Datapath Accelerators (ACC); and a Near-Threshold Always-On Cluster (AON).

Of these, the AON is not reconfigurable. The other three are.

For Neural Networks, flexibility is practically a must. The rate of change in algorithms is dramatic and any silicon that is designed must be expected to need to run neural networks that weren’t invented when the silicon was architected.

To test the efficiency of the sub-blocks, representative DNN kernels were executed on all 4 blocks.

The most efficient block for energy and area was AON, but it is a totally fixed architecture.

Of the flexible DNN engines, the eFPGA array achieved energy efficiency much better than the other two—4.5X the energy efficiency of CPUs.

Regarding area efficiency, eFPGA was similar to CPU but less than ACC.

This is an impressive result for eFPGA considering it is not optimized for DNN inference.

So how is an eFPGA optimized for neural network inference?

First, the existing 22×22 MACs would be replaced with 8×8 MACs which could optionally be combined for 16×16. Approximately three 8×8 MACs would fit in the same area as the 22×22 MAC. This would increase energy efficiency and area efficiency significantly.

Second, an 8×8 MAC would run faster than a 22×22 MAC since the critical path would be shorter.

Third, the density of MACs could be increased by having more area dedicated to MACs and less to LUTs.

The result would be an increase in GigaMACs/second of >10X and increases in energy efficiency and area efficiency that would make this the best option of the flexible choices.

Flex Logix is working now to implement such an AI-optimized eFPGA architecture. We’d like to thank Harvard University for their hard work and impressive results. Their team was a pleasure to work with. We are excited that they are working now on a second chip that will use EFLX eFPGA to further their research.

Geoff Tate

(all posts)
Geoff Tate is a technology strategy advisor. He was the founding CEO of Flex Logix (now part of Analog Devices). Before that, he was the founding CEO of Rambus, and prior to that he was senior vice president of AMD's processor group. He received his BSc in computer science from the University of Alberta, and an MBA from Harvard Business School.

Knowledge Centers
Entities, people and technologies explored

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Advanced Packaging Fundamentals for Semiconductor Engineers

New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.

by Bryon Moyer

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

Flexible, Energy-Efficient Neural Network Processing At 16nm

Geoff Tate

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

Big Changes Ahead For Interposers And Substrates

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Recent Comments

About

Navigation

Connect With Us

Flexible, Energy-Efficient Neural Network Processing At 16nm

Geoff Tate

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

Big Changes Ahead For Interposers And Substrates

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored