SPONSOR BLOG

Software Is At Least As Important As Hardware For Inference Accelerators

All the TOPS in the world are of little use if software can’t utilize them efficiently.

February 6th, 2020 - By: Geoff Tate

In articles and conference presentations on Inference Accelerators, the focus is primarily on TOPS (frequency times number of MACs), a little bit on memory (DRAM interfaces and on chip SRAM), very little on interconnect (also very important, but that’s another story) and almost nothing on the software!

Without software, the inference accelerator is a rock that does nothing. Software is what breathes life into an inference accelerator (but can’t rescue a bad hardware architecture).

Ready, fire, aim
Several customers have told us “our vendors cannot give us performance projections before silicon.” And some customers who have designed their own inference accelerators have complained “we have lots of TOPS but the software guys can’t seem to utilize them efficiently.”

And other customers have told us that to get reasonable performance from some well-known inference accelerators require very low level programming to manage memory storage and transfers because the vendors’ software cannot. It appears many inference accelerators have left their software to a late stage rather than develop the software and hardware together to make sure they work well together.

All inference accelerators have in common the following elements:

MACs
On-chip SRAM
Off-chip DRAM
Control logic
On-chip interconnect between all of the units

The number of elements and organization varies widely between inference accelerators.

When architecting an inference accelerator how do you know if you are building a chip that will deliver high throughput/watt and high throughput/$? The answer is the inference software.

In architecting our InferX X1 we had a performance estimation model very early on for key performance benchmarks, often requested by customers, such as YOLOv3 for megapixel images and ResNet-50 for 224×224 and megapixel images. Using these performance estimation models along with cost models from our silicon/package vendors allowed us to determine the optimum die size, number of MACs, number of SRAM bytes and number of DRAM interfaces to maximize throughput/$ and throughput/watt for megapixel images.

How can we be confident in our performance estimates before silicon? It is because our architecture is totally deterministic. For a given model and image size, we know the execution time to the cycle. It appears that most other inference accelerators have non-determistic features: bus contention, SRAM contention, DRAM contention. With contention performance modelling is very difficult without simulating a large, large number of images for the full model size.

Today our customers can use our performance modelling tool to determine how fast their model/image size will run on X1: it takes a few minutes maximum. Because it’s fast, customers can quickly try modifications to their model to see if it improves throughput by better utilizing the underlying hardware.

Some customers have shared their models with us, especially where they have non-standard applications, to see if we could improve performance. In several cases we have been able to optimize performance 2x or 4x on key layers by implementing new algorithms in our software compiler.

Our full chip RTL is running on Mentor emulators for multiple inference layers running full megapixel image sizes. To do this requires our software to actually be generating the control code for the X1 so our software is ready for silicon (which we will get soon).

Our nnMAX inference compiler takes neural network models in ONNX and TensorFlow-Lite and compiles them directly to the control code for the InferX X1. The customer does not need to do any low level programming, unlike what we hear of most other inference accelerators. X1 supports BF16 so customers with models trained in FP32 can very quickly get up an running without having to wait for quantization (but when they do quantization, X1 runs in INT8 mode too).

When our silicon comes back in Q2 we expect to be able to run numerous open source models (YOLOv3, etc) and numerous customer proprietary models within a week to confirm performance estimations then to sample customers with boards for them to confirm as well.

Conclusion
Developing software performance estimation models then the full software compiler in parallel with and before chip silicon is critical to ensure the combination of hardware+software delivers optimum throughput/$ and throughput/watt. A deterministic architecture is very helpful in being able to do this.

Geoff Tate

(all posts)
Geoff Tate is a technology strategy advisor. He was the founding CEO of Flex Logix (now part of Analog Devices). Before that, he was the founding CEO of Rambus, and prior to that he was senior vice president of AMD's processor group. He received his BSc in computer science from the University of Alberta, and an MBA from Harvard Business School.

Knowledge Centers
Entities, people and technologies explored

Startup Funding: Q1 2025

AI chips and data center communications see big funding; 75 startups raise $2 billion.

by Jesse Allen

Advanced Packaging Fundamentals for Semiconductor Engineers

New SE eBook examines the next phase of semiconductor design, testing, and manufacturing.

by Bryon Moyer

Chip Industry Week in Review

AI export rule to be scrapped; SEMI, EU request; Cadence, Nvidia supercomputer; AI co-processor; Imagination's new GPU; semi sales up; imec, TNO photonics lab; NSF key to national security; flexible packaging control system; SiConic test engineering; USB 4 support; SiC JFETS; magnetic behavior in hematite.

by The SE Staff

Software Is At Least As Important As Hardware For Inference Accelerators

Geoff Tate

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

Big Changes Ahead For Interposers And Substrates

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Recent Comments

About

Navigation

Connect With Us

Software Is At Least As Important As Hardware For Inference Accelerators

Geoff Tate

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2025

Advanced Packaging Fundamentals for Semiconductor Engineers

Chip Industry Week in Review

Chip Industry Week in Review

RISC-V’s Increasing Influence

Chip Industry Week in Review

Big Changes Ahead For Interposers And Substrates

What Exactly Are Chiplets And Heterogeneous Integration?

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored