A HW-Aware Scalable Exact-Attention Execution Mechanism For GPUs (Microsoft)


A technical paper titled “Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers” was published by researchers at Microsoft. Abstract: "Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art models has in... » read more

High-Level Synthesis Propels Next-Gen AI Accelerators


Everything around you is getting smarter. Artificial intelligence is not just a data center application but will be deployed in all kinds of embedded systems that we interact with daily. We expect to talk to and gesture at them. We expect them to recognize and understand us. And we expect them to operate with just a little bit of common sense. This intelligence is making these systems not just ... » read more

Fallback Fails Spectacularly


Conventional AI/ML inference silicon designs employ a dedicated, hardwired matrix engine – typically called an “NPU” – paired with a legacy programmable processor – either a CPU, or DSP, or GPU. The common theory behind these two-core (or even three core) architectures is that most of the matrix-heavy machine learning workload runs on the dedicated accelerator for maximum efficienc... » read more

Research Bits: April 30


Sound waves in optical neural networks Researchers from the Max Planck Institute for the Science of Light and Massachusetts Institute of Technology found a way to build reconfigurable recurrent operators based on sound waves for photonic machine learning. They used light to create temporary acoustic waves in an optical fiber, which manipulate subsequent computational steps of an optical rec... » read more

Sea Of Processors Use Case


Core counts have been increasing steadily since IBM's debut of the Power 4 in 2001, eclipsing 100 CPU cores and over 1,000 for AI accelerators. While sea of processor architectures feature a stamp and repeat design, per-core workloads aren't always going to be symmetrically balanced. For example, a cloud provider (AI or compute) will rent out individual core clusters to customers for specialize... » read more

A Hypermultiplexed Integrated Tensor Optical Processor (USC, MIT et al.)


A technical paper titled “Hypermultiplexed Integrated Tensor Optical Processor” was published by researchers at the University of Southern California, Massachusetts Institute of Technology (MIT), City University of Hong Kong, and NTT Research. Abstract: "The escalating data volume and complexity resulting from the rapid expansion of artificial intelligence (AI), internet of things (IoT) a... » read more

AI Tradeoffs At The Edge


AI is impacting almost every application area imaginable, but increasingly it is moving from the data center to the edge, where larger amounts of data need to be processed much more quickly than in the past. This has set off a scramble for massive improvements in performance much closer to the source of data, but with a familiar set of caveats — it must use very little power, be affordable... » read more

SystemC-based Power Side-Channel Attacks Against AI Accelerators (Univ. of Lubeck)


A new technical paper titled "SystemC Model of Power Side-Channel Attacks Against AI Accelerators: Superstition or not?" was published by researchers at Germany's University of Lubeck. Abstract "As training artificial intelligence (AI) models is a lengthy and hence costly process, leakage of such a model's internal parameters is highly undesirable. In the case of AI accelerators, side-chann... » read more

AI Accelerator Architectures Poised For Big Changes


AI is driving a frenzy of activity in the chip world as companies across the semiconductor ecosystem race to include AI in their product lineup. The challenge now is how to make AI run faster, use less energy, and to be able to leverage it from the edge to the data center — particularly with the rollout of large language models. On the hardware side, there are two main approaches for accel... » read more

Considerations For Accelerating On-Device Stable Diffusion Models


One of the more powerful – and visually stunning – advances in generative AI has been the development of Stable Diffusion models. These models are used for image generation, image denoising, inpainting (reconstructing missing regions in an image), outpainting (generating new pixels that seamlessly extend an image's existing bounds), and bit diffusion. Stable Diffusion uses a type of dif... » read more

← Older posts