Reliable Training Data Paramount To AI Model Success


AI systems are increasingly being integrated into safety- and mission-critical applications ranging from automotive to health care and industrial IoT, stepping up the need for training data that is reliable, secure, and which is generated from trusted sources. AI activity is growing exponentially, as everybody tries to figure out how to apply it to their domain, application, or workload. In ... » read more

LLM Inference: Core Bottlenecks Imposed By Memory, Compute Capacity, Synchronization Overheads (NVIDIA)


A new technical paper titled "Efficient LLM Inference: Bandwidth, Compute, Synchronization, and Capacity are all you need" was published by NVIDIA. Abstract "This paper presents a limit study of transformer-based large language model (LLM) inference, focusing on the fundamental performance bottlenecks imposed by memory bandwidth, memory capacity, and synchronization overhead in distributed ... » read more

Co-Designing Data Center Architecture To Support LLMs (Intel, Georgia Tech)


A new technical paper titled "Scaling Intelligence: Designing Data Centers for Next-Gen Language Models" was published by Intel Corporation and Georgia Tech. An excerpt from the paper's abstract: "Our work provides a comprehensive co-design framework that jointly explores FLOPS, HBM bandwidth and capacity, multiple network topologies (two-tier vs. FullFlat optical), the size of the scale-ou... » read more

Customizing An LLM Tailored Specifically For VHDL Code And Design Of High Performance Processors (IBM)


A new technical paper titled "Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors" was published by researchers at IBM. Abstract "The use of Large Language Models (LLMs) in hardware design has taken off in recent years, principally through its incorporation in tools that increase chip designer productivity. There has been considerable discussion about the ... » read more

Arithmetic Intensity In Decoding: A Hardware-Efficient Perspective (Princeton University)


A new technical paper titled "Hardware-Efficient Attention for Fast Decoding" was published by researchers at Princeton University. Abstract "LLM decoding is bottlenecked for large batches and long contexts by loading the key-value (KV) cache from high-bandwidth memory, which inflates per-token latency, while the sequential nature of decoding limits parallelism. We analyze the interplay amo... » read more

HW-based Heterogeneous Memory Management for LLM Inferencing (KAIST, Stanford Unversity)


A new technical paper titled "Hardware-based Heterogeneous Memory Management for Large Language Model Inference" was published by researchers at KAIST and Stanford University. Abstract "A large language model (LLM) is one of the most important emerging machine learning applications nowadays. However, due to its huge model size and runtime increase of the memory footprint, LLM inferences suf... » read more

LLM-based Agentic Framework Automating HW Security Threat Modeling And Test Plan Generation (U. of Florida)


A new technical paper titled "ThreatLens: LLM-guided Threat Modeling and Test Plan Generation for Hardware Security Verification" was published by researchers at University of Florida. Abstract "Current hardware security verification processes predominantly rely on manual threat modeling and test plan generation, which are labor-intensive, error-prone, and struggle to scale with increasing ... » read more

Wafer-Scale Computing for LLMs (U. of Edinburgh, Microsoft)


A new technical paper titled "WaferLLM: A Wafer-Scale LLM Inference System" was published by researchers at University of Edinburgh and Microsoft Research. Abstract "Emerging AI accelerators increasingly adopt wafer-scale manufacturing technologies, integrating hundreds of thousands of AI cores in a mesh-based architecture with large distributed on-chip memory (tens of GB in total) and ultr... » read more

Pooling CPU Memory for LLM Inference With Lower Latency and Higher Throughput (UC Berkeley)


A new technical paper titled "Pie: Pooling CPU Memory for LLM Inference" was published by researchers at UC Berkeley. Abstract "The rapid growth of LLMs has revolutionized natural language processing and AI analysis, but their increasing size and memory demands present significant challenges. A common solution is to spill over to CPU memory; however, traditional GPU-CPU memory swapping ofte... » read more

Language’s Role In Embodied Agents


Large Language Models (LLMs) and models cross-trained on natural language are a major growth area for edge applications of neural networks and Artificial Intelligence (AI). Within the spectrum of applications, embodied agents stand out as a major developing focal point for this AI. This article will address developments in this space and how the application of language-trained models improves t... » read more

← Older posts Newer posts →