To (B)atch Or Not To (B)atch?


When evaluating benchmark results for AI/ML processing solutions, it is very helpful to remember Shakespeare’s Hamlet, and the famous line: “To be, or not to be.” Except in this case the “B” stands for Batched. Batch size matters There are two different ways in which a machine learning inference workload can be used in a system. A particular ML graph can be used one time, preced... » read more

GDDR7 Memory Supercharges AI Inference


GDDR7 is the state-of-the-art graphics memory solution with a performance roadmap of up to 48 Gigatransfers per second (GT/s) and memory throughput of 192 GB/s per GDDR7 memory device. The next generation of GPUs and accelerators for AI inference will use GDDR7 memory to provide the memory bandwidth needed for these demanding workloads. AI is two applications: training and inference. With tr... » read more

GDDR7: The Ideal Memory Solution In AI Inference


The generative AI market is experiencing rapid growth, driven by the increasing parameter size of Large Language Models (LLMs). This growth is pushing the boundaries of performance requirements for training hardware within data centers. For an in-depth look at this, consider the insights provided in "HBM3E: All About Bandwidth". Once trained, these models are deployed across a diverse range of... » read more

Dedicated Approximate Computing Framework To Efficiently Compute PCs On Hardware


A technical paper titled “On Hardware-efficient Inference in Probabilistic Circuits” was published by researchers at Aalto University and UCLouvain. Abstract: "Probabilistic circuits (PCs) offer a promising avenue to perform embedded reasoning under uncertainty. They support efficient and exact computation of various probabilistic inference tasks by design. Hence, hardware-efficient compu... » read more

The Implications Of AI Everywhere: From Data Center To Edge


Generative AI has upped the ante on the transformative force of AI, driving profound implications across all aspects of our everyday lives. Over the past year, we have seen AI capabilities placed firmly in the hands of consumers. The recent news and product announcements emerging from MWC 2024 highlighted what we can expect to see from the next wave of generative AI applications. AI will be eve... » read more

Why A DSP Is Indispensable In The New World of AI


Chips being designed today for the automotive, mobile handset, AI-IoT (artificial intelligence - Internet of things), and other AI applications will be fabricated in a year or two, designed into end products that will hit the market in three or more years, and then have a product lifecycle of at least five years. These chips will be used in systems with a large number and various types of senso... » read more

IBM’s Energy-Efficient NorthPole AI Unit


At this point it is well known that from an energy efficiency standpoint, the biggest bang for the back is to be found at the highest levels of abstraction. Fitting the right architecture to the task at hand i.e., an application specific architecture, will lead to benefits that are hard or impossible to claw back later in the design and implementation flow.  With the huge increase in the inter... » read more

Your AI Chip Doesn’t Need An Expensive Insurance Policy


Imagine you are an architect designing a new SoC for an application that needs substantial machine learning inferencing horsepower. The team in marketing has given you a list of ML workloads and performance specs that you need to hit. The in-house designed NPU accelerator works well for these known workloads – things like MobileNet v2 and Resnet50. The accelerator speeds up 95+% of the comput... » read more

A Packet-Based Architecture For Edge AI Inference


Despite significant improvements in throughput, edge AI accelerators (Neural Processing Units, or NPUs) are still often underutilized. Inefficient management of weights and activations leads to fewer available cores utilized for multiply-accumulate (MAC) operations. Edge AI applications frequently need to run on small, low-power devices, limiting the area and power allocated for memory and comp... » read more

A Bridge From Mars To Venus


In a now-famous 1992 pop psychology book titled "Men Are from Mars, Women Are from Venus," author John Gray posited that most relationship troubles in couples stem from fundamental differences in socialization patterns between men and women. The analogy that the two partners came from different planets was used to describe how two people could perceive issues in completely different and sometim... » read more

← Older posts