Wafer-Scale Computing for LLMs (U. of Edinburgh, Microsoft)


A new technical paper titled "WaferLLM: A Wafer-Scale LLM Inference System" was published by researchers at University of Edinburgh and Microsoft Research. Abstract "Emerging AI accelerators increasingly adopt wafer-scale manufacturing technologies, integrating hundreds of thousands of AI cores in a mesh-based architecture with large distributed on-chip memory (tens of GB in total) and ultr... » read more

Potential of Wireless Interconnects For Improving Performance And Flexibility Of Multi-Chip AI Accelerators


A new technical paper titled "Exploring the Potential of Wireless-enabled Multi-Chip AI Accelerators" was published by researchers at Universitat Politecnica de Catalunya. Abstract "The insatiable appetite of Artificial Intelligence (AI) workloads for computing power is pushing the industry to develop faster and more efficient accelerators. The rigidity of custom hardware, however, conflict... » read more

Choosing The Right Memory Solution For AI Accelerators


To meet the increasing demands of AI workloads, memory solutions must deliver ever-increasing performance in bandwidth, capacity, and efficiency. From the training of massive large language models (LLMs) to efficient inference on endpoint devices, choosing the right memory technology is critical for chip designers. This blog explores three leading memory solutions—HBM, LPDDR, and GDDR—and t... » read more

MACs Are Not Enough: Why “Offload” Fails


For the past half-decade, countless chip designers have approached the challenges of on-device machine learning inference with the simple idea of building a “MAC accelerator” – an array of high-performance multiply-accumulate circuits – paired with a legacy programmable core to tackle the ML inference compute problem. There are literally dozens of lookalike architectures in the market t... » read more

Designing Heterogeneous AI Acceleration SoCs


A new technical paper titled "Open-Source Heterogeneous SoCs for AI: The PULP Platform Experience" was published by researchers at University of Bologna. Abstract "Since 2013, the PULP (Parallel Ultra-Low Power) Platform project has been one of the most active and successful initiatives in designing research IPs and releasing them as open-source. Its portfolio now ranges from processor co... » read more

Enhancing Compute Security Architecture For New-Age Applications


New-age AI-powered applications are becoming increasingly essential in our daily lives. Continuing to do so requires that these applications and services meet three primary challenges: Achieving high performance for complex compute tasks. Ensuring cost-effectiveness and seamless integration with existing infrastructure. Maintaining robust security and privacy measures. Historicall... » read more

Chiplet-Based NPUs to Accelerate Vehicular AI Perception Workloads


A new technical paper titled "Performance Implications of Multi-Chiplet Neural Processing Units on Autonomous Driving Perception" was published by researchers at UC Irvine. Abstract "We study the application of emerging chiplet-based Neural Processing Units to accelerate vehicular AI perception workloads in constrained automotive settings. The motivation stems from how chiplets technology i... » read more

Can You Rely Upon Your NPU Vendor To Be Your Customers’ Data Science Team?


The biggest mistake a chip design team can make in evaluating AI acceleration options for a new SoC is to rely entirely upon spreadsheets of performance numbers from the NPU vendor without going through the exercise of porting one or more new machine learning networks themselves using the vendor toolsets. Why is this a huge red flag? Most NPU vendors tell prospective customers that (1) the v... » read more

A HW-Aware Scalable Exact-Attention Execution Mechanism For GPUs (Microsoft)


A technical paper titled “Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers” was published by researchers at Microsoft. Abstract: "Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art models has in... » read more

High-Level Synthesis Propels Next-Gen AI Accelerators


Everything around you is getting smarter. Artificial intelligence is not just a data center application but will be deployed in all kinds of embedded systems that we interact with daily. We expect to talk to and gesture at them. We expect them to recognize and understand us. And we expect them to operate with just a little bit of common sense. This intelligence is making these systems not just ... » read more

← Older posts