Implementing AI Activation Functions


Activation functions play a critical role in AI inference, helping to ferret out nonlinear behaviors in AI models. This makes them an integral part of any neural network, but nonlinear functions can be fussy to build in silicon. Is it better to have a CPU calculate them? Should hardware function units be laid down to execute them? Or would a lookup table (LUT) suffice? Most architectures inc... » read more

HW-Aligned Sparse Attention Architecture For Efficient Long-Context Modeling (DeepSeek et al.)


A new technical paper titled "Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention" was published by DeepSeek, Peking University and University of Washington. Abstract "Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention... » read more

Normalization Keeps AI Numbers In Check


AI training and inference are all about running data through models — typically to make some kind of decision. But the paths that the calculations take aren’t always straightforward, and as a model processes its inputs, those calculations may go astray. Normalization is a process that can keep data in bounds, improving both training and inference. Foregoing normalization can result in at... » read more

New AI Data Types Emerge


AI is all about data, and the representation of the data matters strongly. But after focusing primarily on 8-bit integers and 32‑bit floating-point numbers, the industry is now looking at new formats. There is no single best type for every situation, because the choice depends on the type of AI model, whether accuracy, performance, or power is prioritized, and where the computing happens, ... » read more

Mass Customization For AI Inference


Rising complexity in AI models and an explosion in the number and variety of networks is leaving chipmakers torn between fixed-function acceleration and more programmable accelerators, and creating some novel approaches that include some of both. By all accounts, a general-purpose approach to AI processing is not meeting the grade. General-purpose processors are exactly that. They're not des... » read more

Novel NorthPole Architecture Enables Low-Latency, High-Energy-Efficiency LLM inference (IBM Research)


A new technical paper titled "Breakthrough low-latency, high-energy-efficiency LLM inference performance using NorthPole" was published by researchers at IBM Research. At the IEEE High Performance Extreme Computing (HPEC) Virtual Conference in September 2024, new performance results for their AIU NorthPole AI inference accelerator chip were presented on a 3-billion-parameter Granite LLM. ... » read more

Supercharging AI Inference With GDDR7


A rapid rise in the size and sophistication of AI inference models requires increasingly powerful AI accelerators and GPUs deployed in edge servers and client PCs. GDDR7 memory offers an attractive combination of bandwidth, capacity, latency and power for these accelerators and processors. The Rambus GDDR7 Memory Controller IP offers industry leading GDDR7 performance of up to 40 Gbps and 160 G... » read more

Why A DSP Is Indispensable In The New World of AI


Chips being designed today for the automotive, mobile handset, AI-IoT (artificial intelligence - Internet of things), and other AI applications will be fabricated in a year or two, designed into end products that will hit the market in three or more years, and then have a product lifecycle of at least five years. These chips will be used in systems with a large number and various types of senso... » read more