What’s Different About HBM4


Memory bandwidth is limiting the flow of huge datasets that are needed to train AI models. There is much more data to process, store, and retrieve, but the speed at which that data moves through high-bandwidth memory (HBM) stacks is significantly lower than the speed at which data can be processed. Frank Ferro, group director for product management at Cadence, talks about the new HBM4 standard,... » read more

Epitaxial Growth Of Up To 120 Si/SiGe Bilayers In View of 3D DRAM Applications (imec, Ghent Univ.)


A new technical paper titled "Epitaxial growth of up to 120× {Si0.8Ge0.2/Si} bilayers in view of three dimensional dynamic random access memory applications" was published by researchers at imec and Ghent University. Abstract "Epitaxially grown Si/Si1−xGex multi-stacks with ≥100 bilayers (≥200 sublayers) are being considered for three dimensionally vertically stacked dynamic rando... » read more

LLM Inference: Core Bottlenecks Imposed By Memory, Compute Capacity, Synchronization Overheads (NVIDIA)


A new technical paper titled "Efficient LLM Inference: Bandwidth, Compute, Synchronization, and Capacity are all you need" was published by NVIDIA. Abstract "This paper presents a limit study of transformer-based large language model (LLM) inference, focusing on the fundamental performance bottlenecks imposed by memory bandwidth, memory capacity, and synchronization overhead in distributed ... » read more

Rowhammer Attack On NVIDIA GPUs With GDDR6 DRAM (University of Toronto)


A new technical paper titled "GPUHammer: Rowhammer Attacks on GPU Memories are Practical" was published by researchers at University of Toronto. Abstract: "Rowhammer is a read disturbance vulnerability in modern DRAM that causes bit-flips, compromising security and reliability. While extensively studied on Intel and AMD CPUs with DDR and LPDDR memories, its impact on GPUs using GDDR memorie... » read more

Stacking Persistent Embedded Memories Based On Oxide Transistors Upon GPGPU Platforms (Georgia Tech)


A new technical paper titled "CMOS+X: Stacking Persistent Embedded Memories based on Oxide Transistors upon GPGPU Platforms" was published by Georgia Tech. Abstract "In contemporary general-purpose graphics processing units (GPGPUs), the continued increase in raw arithmetic throughput is constrained by the capabilities of the register file (single-cycle) and last-level cache (high bandwidth... » read more

Novel Assembly Approaches For 3D Device Stacks


The next big leap in semiconductor packaging will require a slew of new technologies, processes, and materials, but collectively they will enable orders of magnitude improvement in performance that will be essential for the AI age. Not all of these issues are fully solved but the recent Electronic Components Technology Conference (ECTC) provided a glimpse into the huge leaps in progress that... » read more

The Best DRAMs For Artificial Intelligence


Artificial intelligence (AI) involves intense computing and tons of data. The computing may be performed by CPUs, GPUs, or dedicated accelerators, and while the data travels through DRAM on its way to the processor, the best DRAM type for this purpose depends on the type of system that is performing the training or inference. The memory challenge facing engineering teams today is how to keep... » read more

Benefits Of Memory-Centric Computing (ETH Zurich)


A new technical paper titled "Memory-Centric Computing: Solving Computing's Memory Problem" was published by researchers at ETH Zurich. Abstract "Computing has a huge memory problem. The memory system, consisting of multiple technologies at different levels, is responsible for most of the energy consumption, performance bottlenecks, robustness problems, monetary cost, and hardware real esta... » read more

Chiplet Tradeoffs And Limitations


The semiconductor industry is buzzing with the benefits of chiplets, including faster time to market, better performance, and lower power, but finding the correct balance between customization and standardization is proving to be more difficult than initially thought. For a commercial chiplet marketplace to really take off, it requires a much deeper understanding of how chiplets behave indiv... » read more

GPU Analysis Identifying Performance Bottlenecks That Cause Throughput Plateaus In Large-Batch Inference


A new technical paper titled "Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference" was published by researchers at Barcelona Supercomputing Center, Universitat Politecnica de Catalunya, and IBM Research. Abstract "Large language models have been widely adopted across different tasks, but their auto-regressive generation nature often leads to inefficient resource util... » read more

← Older posts Newer posts →