SPONSOR BLOG

Embrace The New!

New ML networks far outperform old standbys, so why do so many fixate on the old?

March 14th, 2024 - By: Steve Roddy

The ResNet family of machine learning algorithms was introduced to the AI world in 2015. A slew of variations was rapidly discovered that at the time pushed the accuracy of ResNets close to the 80% threshold (78.57% Top 1 accuracy for ResNet-152 on ImageNet). This state-of-the-art performance at the time, coupled with the rather simple operator structure that was readily amenable to hardware acceleration in SoC designs, turned ResNet into the go-to litmus test of ML inference performance. Scores of design teams built ML accelerators in the 2018-2022 time period with ResNet in mind.

One common trait shared by all these accelerator – or “NPU” – designs is the use of integer arithmetic instead of floating-point math. Integer formats are preferred for on-device inference because an INT8 multiply-accumulate (the basic building block of ML inference) can be 8X to 10X more energy efficient than executing the same calculation in full 32-bit floating point. The process of converting a model’s weights from floating point to integer representation is known as quantization. Unfortunately, some degree of fidelity is always lost in the quantization process.

NPU designers have in the past half-decade spent enormous amounts of time and energy fine-tuning their weight quantization strategies to minimize the accuracy loss of an integer version of ResNet compared to the original Float-32 source (often referred to as Top-1 Loss). For most builders and buyers of NPU accelerators, a loss of 1% or less is the litmus test of goodness. Some even continue to fixate on that number today, even in the face of dramatic evidence that suggests now-ancient networks like ResNet should be relegated to the dustbin of history. What evidence, you ask? Look at the “leaderboard” for ImageNet accuracy that can be found on the Papers With Code website:

Image classification leaderboard. (Source: https://paperswithcode.com/sota/image-classification-on-imagenet)

Time to leave the last decade behind

As the leaderboard chart aptly demonstrates, today’s leading edge classifier networks – such as the Vision Transformer (ViT) family – have Top 1 accuracies exceeding 90%, a full 10% points of accuracy ahead of the top-rated ResNet in the leaderboard. For on-device ML inference performance, power consumption and inference accuracy must all be balanced for a given application. In applications where accuracy really, really matters – such as automotive safety applications – you’d think that design teams would be rushing to embrace these new network topologies to gain the extra 10+% in accuracy.

Wouldn’t it make more sense to embrace 90% accuracy thanks to a modern ML network – such as ViT, or SWIN transformer, or DETR – rather than spend effort fine-tuning an archaic network in pursuit of its original 79% ceiling? Of course. So then why haven’t some teams made the switch?

Stuck with an inflexible accelerator

Perhaps those who are not Embracing The New are limited because they chose accelerators with limited support of new ML operators. If a team four years ago implemented a fixed-function accelerator in an SoC that cannot add new ML operators, then many newer networks – such as Transformers – cannot run on those fixed-function chips today. A new silicon respin – which takes 24 to 36 months – is needed.

But one look at the leaderboard chart tells you that picking a fixed-function set of operators now in 2024 and waiting until 2027 to get working silicon only repeats the same cycle of being trapped as new innovations push state of the art accuracy, or retain accuracy with less computational complexity. If only there was a way to run both known networks efficiently and tomorrow’s SOTA networks on device with full programmability!

Programmable, general purpose NPUs (GPNPU)

Luckily, there is now an answer. Since mid-2023, Quadric has been delivering its Chimera General-Purpose NPU (GPNPU) as licensable IP. With a TOPs range scaling from 1 TOP to 100s of TOPs, the Chimera GPNPU delivers accelerator-like efficiency while maintaining C++ programmability to run any ML operator. Any ML graph. Any new network. Even ones that haven’t been invented yet. Embrace the new! Experience it for yourself at www.quadric.io.

Steve Roddy

(all posts)
Steve Roddy is the chief marketing officer at Quadric.io. Previously, he was vice president of the Machine Learning Group at Arm, and before that he served as vice president for IP licensing businesses at Tensilica (acquired by Cadence), and Amphion Semiconductor. He also held product management roles at Synopsys, LSI Logic, and AMCC.

Embrace The New!

Time to leave the last decade behind

Stuck with an inflexible accelerator

Programmable, general purpose NPUs (GPNPU)

Steve Roddy

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

RISC-V’s Increasing Influence

Development Flows For Chiplets

New Data Center Protocols Tackle AI

Chiplet Tradeoffs And Limitations

Implementing AI Activation Functions

Die-to-die Interconnect Standards In Flux

The Best DRAMs For Artificial Intelligence

Future-proofing AI Models

Sponsors

Recent Comments

About

Navigation

Connect With Us

Embrace The New!

Time to leave the last decade behind

Stuck with an inflexible accelerator

Programmable, general purpose NPUs (GPNPU)

Steve Roddy

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

RISC-V’s Increasing Influence

Development Flows For Chiplets

New Data Center Protocols Tackle AI

Chiplet Tradeoffs And Limitations

Implementing AI Activation Functions

Die-to-die Interconnect Standards In Flux

The Best DRAMs For Artificial Intelligence

Future-proofing AI Models

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored