Operator Anxiety

Avoid getting stranded on the machine learning roadside.

popularity

Are you one of the early pioneers who have purchased an electric car? In the United States in Q3 2022, 6% of new vehicle sales were pure electric models. Despite all the hype — and significant purchase subsidies in support of battery cars — today only 1% of the cumulative number of vehicles in service in the US are purely plug-in electric. One of the reasons electric car sales have not fully eclipsed traditional car sales is the lack of widespread, readily available fast-charging infrastructure. Consumer studies highlight the fear of running low on battery power and not being near a fast-charging station as a major barrier to adoption. Automotive market analysts have termed this concern “Range Anxiety.” In a strange parallel, the semiconductor market has an anxiety problem of its own!

Designers of advanced semiconductors for nearly every end market are adopting and deploying machine learning (ML) processing power in new silicon designs. System-on-chip (SoC) designers are deploying mixtures of programmable cores (CPUs, GPUs, DSPs) and dedicated machine learning inference accelerators (NPUs) in a bid to offer high-efficiency solutions that can run today’s latest state of the art ML inference models.

Yet the world of machine learning is rapidly changing. Innovative data scientists keep discovering and inventing new techniques to improve the state of the art (SOTA). The benchmark ML networks used today in 2023 to select intellectual property (IP) building blocks for new SoC designs did not exist 3 or 4 years ago. The silicon being designed today will be in production in 2025 and 2026 at which point SOTA models may look nothing like today’s leading-edge networks. Changes in ML models occur in two ways — different model topologies that rearrange known operators into ever deeper and more complex networks, as well as the creation of new fundamental ML operators. The former — rearranging building blocks — is not a daunting thought for SoC designers. But the latter — new operators — raises major concerns for chip designers. What if the ML accelerator you pick today in 2023 can’t support a new operator invented in 2026? Analysts in our market have borrowed from the aforementioned automotive market and coined the term “Operator Anxiety” to describe the concern that choices now might come back to haunt the chip designer in the future.

The typical architectural approach in today’s silicon solutions pairs an NPU “accelerator” for machine learning with a fully programmable, but much lower performance CPU, GPU, or DSP. The accelerators are typically hardwired to run the most common ML operators — convolutions, activations, pooling — as efficiently as possible. Some accelerators have no ability to add new functions once silicon is in hand. Others have limited flexibility in the form of micro-coded command streams hand-written by the IP vendor. In both cases, the NPU vendor promises the SoC designer that in the worst case the companion CPU or DSP can run newly emergent operators in what the IP vendor calls Fallback mode. But a programmable CPU or DSP may be an order of magnitude slower than the NPU. Often two orders of magnitude slower. (After all, if the CPU was almost as fast the NPU, why would you need an NPU in the first place?) This is the source of the anxiety! Overall system performance craters if one layer of a SOTA network must move away from the fast accelerator and run on the slow processor.

Running a new ML operator on a slow DSP or CPU is the equivalent of charging your EV with an extension cord plugged into a 110V wall socket. What you want for your EV is an 800 Volt fast charger that tops off the battery in 20 minutes, not a low voltage, low amperage wall socket that needs 18 hours to recharge your car.

The answer to EV Range Anxiety is widespread, readily available Fast Chargers. The parallel for SoCs: Operator anxiety could be cured by a readily available processor that can run any operator yet does so with the performance efficiency of a dedicated NPU.

There is a cure for Operator Anxiety!

The Chimera GPNPU from Quadric — available in 1 TOPS, 4 TOPS, and 16 TOPS variants — is that sought-after anxiety relieving solution. Chimera GPNPUs deliver the matrix-optimized performance you expect from an ML-optimized compute engine while also being fully C++ programmable by the software developer. New ML operators can be quickly written and run just as fast as the “native” operators written by Quadric engineers. With a Chimera core there is no fallback, no operator anxiety, only fast execution — no matter what new forms of operators or graphs the future brings.

 



Leave a Reply


(Note: This name will be displayed publicly)