Author's Latest Posts


In Memory, At Memory, Near Memory: What Would Goldilocks Choose?


The children’s fairy tale of ‘Goldilocks and the Three Bears’ describes the adventures of Goldi as she tries to choose among three choices for bedding, chairs, and bowls of porridge. One meal is “too hot,” the other “too cold,” and finally one is “just right.” If Goldi were faced with making architecture choices for AI processing in modern edge/device SoCs, she would also face... » read more

Can You Rely Upon Your NPU Vendor To Be Your Customers’ Data Science Team?


The biggest mistake a chip design team can make in evaluating AI acceleration options for a new SoC is to rely entirely upon spreadsheets of performance numbers from the NPU vendor without going through the exercise of porting one or more new machine learning networks themselves using the vendor toolsets. Why is this a huge red flag? Most NPU vendors tell prospective customers that (1) the v... » read more

ConvNext Runs 28X Faster Than Fallback


Two months ago in our blog we highlighted the fallacy of using a conventional NPU accelerator paired with a DSP or CPU for “fallback” operations. (Fallback Fails Spectacularly, May 2024). In that blog we calculated what the expected performance would be for a system with a DSP needing to perform the new operations found in one of today’s leading new ML networks – ConvNext. The result wa... » read more

KANs Explode!


In late April 2024, a novel AI research paper was published by researchers from MIT and CalTech proposing a fundamentally new approach to machine learning networks – the Kolmogorov Arnold Network – or KAN. In the six weeks since its publication, the AI research field is ablaze with excitement and speculation that KANs might be a breakthrough that dramatically alters the trajectory of AI mod... » read more

Fallback Fails Spectacularly


Conventional AI/ML inference silicon designs employ a dedicated, hardwired matrix engine – typically called an “NPU” – paired with a legacy programmable processor – either a CPU, or DSP, or GPU. The common theory behind these two-core (or even three core) architectures is that most of the matrix-heavy machine learning workload runs on the dedicated accelerator for maximum efficienc... » read more

Hybrid Architecture Blends Best Of Both Worlds


Quadric chose the brand name Chimera to describe the company’s novel general purpose neural processing unit (GPNPU) architecture. According to the online Oxford dictionary, in biology a chimera is “an organism containing a mixture of genetically different tissues (or DNA).” Quadric made that naming choice to reflect the fact that its Chimera GPNPU has characteristics of both conventiona... » read more

Embrace The New!


The ResNet family of machine learning algorithms was introduced to the AI world in 2015. A slew of variations was rapidly discovered that at the time pushed the accuracy of ResNets close to the 80% threshold (78.57% Top 1 accuracy for ResNet-152 on ImageNet). This state-of-the-art performance at the time, coupled with the rather simple operator structure that was readily amenable to hardware ac... » read more

Thanks For The Memories!


“I want to maximize the MAC count in my AI/ML accelerator block because the TOPs rating is what sells, but I need to cut back on memory to save cost,” said no successful chip designer, ever. Emphasis on “successful” in the above quote. It’s not a purely hypothetical quotation. We’ve heard it many times. Chip architects — or their marketing teams — try to squeeze as much brag-... » read more

Is Transformer Fever Fading?


The hottest, buzziest thing bursts onto the scene and captures the attention of the business press and even the general public. Scads of articles and videos are published about The Hot Thing. And then, in the blink of an eye, the world’s attention shifts to the Next New Thing! Are we talking about the latest pop song that leads the Spotify streaming charts? Perhaps a new fashion trend that... » read more

BYO NPU Benchmarks


In our last blog post, we highlighted the ways that NPU vendors can shade the truth about performance on benchmark networks such that comparing common performance scores such as “Resnet50 Inferences / Second” can be a futile exercise. But there is a straight-forward, low-investment method for an IP evaluator to short-circuit all the vendor shenanigans and get a solid apples-to-apples result... » read more

← Older posts