The Edge LLM Offload Story


By Karthikeyan Shanmuga Vadivel and Sauryadeep Pal Developers and system architects today face a growing demand to enable large language model variants on device. They are facing pressure to support transformer-capable models on constrained devices to ensure data privacy, eliminate cloud API charges, and provide offline reliability. On-device execution is also becoming a necessity to meet st... » read more

Toward Agentic Verification


Key Takeaways: Agentic verification provides flow orchestration for common repetitive tasks. Capabilities will expand when tools can learn from a larger context, including the specification. Design houses need to fully understand the costs and benefits and plan accordingly. Agentic verification is more than a buzzword. It is a pivotal moment in the evolution of verification ... » read more

Faster Verification Debug With AI


Every stage of semiconductor development takes longer and requires more effort with each new generation of chips. At no stage is this more apparent than functional verification. Industry consensus is that verification consumes roughly two-thirds of development time and resources. Within verification, debug is the most challenging step, consuming a third to two-thirds of the effort. Any serious ... » read more

The Shape Of Prompts: Exploring Their Effect On Inference Infrastructure


AI inference prompts exhibit a shape-shifting behavior, arriving in many forms and attempting to fit themselves within the constraints of the inference stack. Ultimately, it is the design of the inference infrastructure that determines whether it can sustain a large volume of prompts or only a limited number. Prompts are not uniform transactions; they represent dynamic workload profiles whose ... » read more

Large-scale, SRAM-based LLM Inference Deployment (Groq)


A new technical paper, "SHIP: SRAM-Based Huge Inference Pipelines for Fast LLM Serving," was published by researchers at Nvidia, with work done while at Groq. Abstract "The proliferation of large language models (LLMs) demands inference systems with both low latency and high efficiency at scale. GPU-based serving relies on HBM for model weights and KV caches, creating a memory bandwidth b... » read more

Why Vision LLMs Force A Rethink Of Edge AI Hardware


As vision-centric large language models move on-device, performance measured in raw TOPS is no longer enough. Architectures need to be built around real workloads, memory behavior, and sustained utilization, especially at the edge. Vision LLMs are changing the edge AI equation For the last decade, most edge AI silicon has been built to do one job extremely well: run convolutional networks for... » read more

Silent Data Corruption: A Major Reliability Challenge in Large-Scale LLM Training (TU Berlin)


A new technical paper, "Exploring Silent Data Corruption as a Reliability Challenge in LLM Training," was published by researchers at Technische Universitat Berlin. Abstract "As Large Language Models (LLMs) scale in size and complexity, the consequences of failures during training become increasingly severe. A major challenge arises from Silent Data Corruption (SDC): hardware-induced faults... » read more

Automated Security Assertion Generation Using LLMs (U. of Florida)


A new technical paper, "Assertain: Automated Security Assertion Generation Using Large Language Models," was published by University of Florida. Abstract "The increasing complexity of modern system-on-chip designs amplifies hardware security risks and makes manual security property specification a major bottleneck in formal property verification. This paper presents Assertain, an automated ... » read more

An Exploration of Agent Scaling for HLS Design Space Exploration (IBM)


A new technical paper, "Agent Factories for High Level Synthesis: How Far Can General-Purpose Coding Agents Go in Hardware Optimization?" was published by IBM. Abstract "We present an empirical study of how far general-purpose coding agents – without hardware-specific training – can optimize hardware designs from high-level algorithmic specifications. We introduce an agent factory, a ... » read more

How SW and HW Vulnerabilities Can Complement LLM-Specific Algorithmic Attacks (UT Austin, Intel et al.)


A new technical paper, "Cascade: Composing Software-Hardware Attack Gadgets for Adversarial Threat Amplification in Compound AI Systems," was published by the University of Texas, Austin, Intel Labs, Symmetry Systems, Microsoft and Georgia Tech. Abstract "Rapid progress in generative AI has given rise to Compound AI systems - pipelines comprised of multiple large language models (LLM), so... » read more

← Older posts