Tool-Assisted LLM Targets RTL Code Generation (UC Riverside, Futurewei)


Researchers from University of California, Riverside and Futurewei published a technical paper titled “LLM4RTL: Tool-Assisted LLM for RTL Generation.” Abstract: “Large language models (LLMs) have facilitated impressive progress in software engineering, code generation, tooling, and systems. Concurrently, a significant body of research has developed which explores a growing variety o... » read more

Why Vision LLMs Force A Rethink Of Edge AI Hardware


As vision-centric large language models move on-device, performance measured in raw TOPS is no longer enough. Architectures need to be built around real workloads, memory behavior, and sustained utilization, especially at the edge. Vision LLMs are changing the edge AI equation For the last decade, most edge AI silicon has been built to do one job extremely well: run convolutional networks for... » read more

A Review Of Acoustic Side-Channel Attacks: An AI View (Penn State Univ.)


A new technical paper titled "A Survey on Acoustic Side-Channel Attacks: An Artificial Intelligence Perspective" was published by researchers at Penn State University. Abstract "Acoustic Side-Channel Attacks (ASCAs) exploit the sound produced by keyboards and other devices to infer sensitive information without breaching software or network defenses. Recent advances in deep learning, large ... » read more

Small Language Models Create New Security Risks


The rollout of edge AI is creating new security risks due to a mix of small language models (SLMs), their integration into increasingly complex hardware, and the behavior and interactions of both over time. AI data centers still garner the most attention due to massive investments and an ongoing flood of deals and acquisitions, but the edge is quietly starting to take shape for several reaso... » read more

Small Vs. Large Language Models


The proliferation of edge AI will require fundamental changes in language models and chip architectures to make inferencing and learning outside of AI data centers a viable option. The initial goal for small language models (SLMs) — roughly 10 billion parameters or less, compared to more than a trillion parameters in the biggest LLMs — was to leverage them exclusively for inferencing. In... » read more

Implementing Power Dynamic Response For Greener AI Data Centers (Univ. of Cambridge, Nyobolt, Nanyang Tech)


A new technical paper titled "Improving AI Efficiency in Data Centres by Power Dynamic Response" was published by researchers at University of Cambridge, Nyobolt Limited and Nanyang Technological University. Abstract "The steady growth of artificial intelligence (AI) has accelerated in the recent years, facilitated by the development of sophisticated models such as large language models and... » read more

GDDR7 Tackles Massive-Context AI Inference


The AI hardware landscape is evolving at breakneck speed, and memory technology is at the heart of this transformation. NVIDIA’s recent announcement of Rubin CPX, a new class of GPU purpose-built for massive-context inference, underscores this trend. Rubin CPX is designed to tackle workloads that require reasoning across millions of tokens. Use cases include long-form generative video, comple... » read more

Overflowing Zoo: The Power Of Compilers


The term “model zoo” first gained prominence in the world of Artificial Intelligence/Machine Learning (AI/ML) beginning in the 2016-2017 timeframe. Originally used to describe open-source public repositories of working AI models — the most prominent of which today is Hugging Face — the term has since been adopted by nearly all vendors of AI chips and licensable Neural Processors Units (... » read more

System-HW Co-Design Approach Combines Mono3D DRAM, NMP, and GPU Acceleration (UCSD, Georgia Tech, UIUC, Illinois Tech)


A new technical paper titled "Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving" was published by researchers at UC San Diego, Georgia Tech, University of Illinois Urbana-Champaign and Illinois Institute of Technology. Abstract "As Large Language Models (LLMs) continue to evolve, Mixture of Experts (MoE) architecture has emerged as a preva... » read more

Heterogeneous System With Specialized HW For Disaggregated LLM Inference (Princeton Univ., Univ. of Washington)


A new technical paper titled "SPAD: Specialized Prefill and Decode Hardware for Disaggregated LLM Inference" was published by researchers at Princeton University and University of Washington. Abstract "Large Language Models (LLMs) have gained popularity in recent years, driving up the demand for inference. LLM inference is composed of two phases with distinct characteristics: a compute-boun... » read more

← Older posts