Customizing An LLM Tailored Specifically For VHDL Code And Design Of High Performance Processors (IBM)


A new technical paper titled "Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors" was published by researchers at IBM. Abstract "The use of Large Language Models (LLMs) in hardware design has taken off in recent years, principally through its incorporation in tools that increase chip designer productivity. There has been considerable discussion about the ... » read more

Arithmetic Intensity In Decoding: A Hardware-Efficient Perspective (Princeton University)


A new technical paper titled "Hardware-Efficient Attention for Fast Decoding" was published by researchers at Princeton University. Abstract "LLM decoding is bottlenecked for large batches and long contexts by loading the key-value (KV) cache from high-bandwidth memory, which inflates per-token latency, while the sequential nature of decoding limits parallelism. We analyze the interplay amo... » read more

HW-based Heterogeneous Memory Management for LLM Inferencing (KAIST, Stanford Unversity)


A new technical paper titled "Hardware-based Heterogeneous Memory Management for Large Language Model Inference" was published by researchers at KAIST and Stanford University. Abstract "A large language model (LLM) is one of the most important emerging machine learning applications nowadays. However, due to its huge model size and runtime increase of the memory footprint, LLM inferences suf... » read more

LLM-based Agentic Framework Automating HW Security Threat Modeling And Test Plan Generation (U. of Florida)


A new technical paper titled "ThreatLens: LLM-guided Threat Modeling and Test Plan Generation for Hardware Security Verification" was published by researchers at University of Florida. Abstract "Current hardware security verification processes predominantly rely on manual threat modeling and test plan generation, which are labor-intensive, error-prone, and struggle to scale with increasing ... » read more

Wafer-Scale Computing for LLMs (U. of Edinburgh, Microsoft)


A new technical paper titled "WaferLLM: A Wafer-Scale LLM Inference System" was published by researchers at University of Edinburgh and Microsoft Research. Abstract "Emerging AI accelerators increasingly adopt wafer-scale manufacturing technologies, integrating hundreds of thousands of AI cores in a mesh-based architecture with large distributed on-chip memory (tens of GB in total) and ultr... » read more

Pooling CPU Memory for LLM Inference With Lower Latency and Higher Throughput (UC Berkeley)


A new technical paper titled "Pie: Pooling CPU Memory for LLM Inference" was published by researchers at UC Berkeley. Abstract "The rapid growth of LLMs has revolutionized natural language processing and AI analysis, but their increasing size and memory demands present significant challenges. A common solution is to spill over to CPU memory; however, traditional GPU-CPU memory swapping ofte... » read more

Language’s Role In Embodied Agents


Large Language Models (LLMs) and models cross-trained on natural language are a major growth area for edge applications of neural networks and Artificial Intelligence (AI). Within the spectrum of applications, embodied agents stand out as a major developing focal point for this AI. This article will address developments in this space and how the application of language-trained models improves t... » read more

Generative AI On Mobile Is Running On The Arm CPU


By Adnan Al-Sinan and Gian Marco Iodice 2023 was the year that showcased an impressive number of use cases powered by generative AI. This disruptive form of artificial intelligence (AI) technology is at the heart OpenAI's ChatGPT and Google’s Gemini AI model, with it demonstrating the opportunity to simplify work and advance education through generating text, images, or even audio content ... » read more

LLM Inference On CPUs (Intel)


A technical paper titled “Efficient LLM Inference on CPUs” was published by researchers at Intel. Abstract: "Large language models (LLMs) have demonstrated remarkable performance and tremendous potential across a wide range of tasks. However, deploying these models has been challenging due to the astronomical amount of model parameters, which requires a demand for large memory capacity an... » read more

Applications Of Large Language Models For Industrial Chip Design (NVIDIA)


A technical paper titled “ChipNeMo: Domain-Adapted LLMs for Chip Design” was published by researchers at NVIDIA. Abstract: "ChipNeMo aims to explore the applications of large language models (LLMs) for industrial chip design. Instead of directly deploying off-the-shelf commercial or open-source LLMs, we instead adopt the following domain adaptation techniques: custom tokenizers, domain-ad... » read more

← Older posts