From Latency To Reaction: Simulating The Next Wafer Demand Inflection


The semiconductor industry faces an unprecedented paradox: AI demand is booming, fab investments are rising, yet wafer shipments remain stubbornly flat. What's driving this disconnect, and when will it break? As of mid-2025, the global silicon wafer market appears calm on the surface, but underlying structural tensions are quietly mounting. The demand for AI semiconductors remains resilient,... » read more

Scheduling Architecture Integrated With M3D BEOL Memories For LLM Inference (Georgia Tech, Samsung)


A new technical paper titled "Architecting Long-Context LLM Acceleration with Packing-Prefetch Scheduler and Ultra-Large Capacity On-Chip Memories" was published by researchers at Georgia Institute of Technology and Samsung. Abstract "Long-context Large Language Model (LLM) inference faces increasing compute bottlenecks as attention calculations scale with context length, primarily due to t... » read more

Maximize Uptime And Improve TCO: RAS And Telemetry In HBM4 For Data Centers


As AI workloads scale and data center operations become increasingly complex, it is critical to keep the infrastructure up and running. Total Cost of Ownership (TCO) is a key metric that includes not only the upfront cost of hardware but also the ongoing expenses of power, cooling, maintenance, and—most importantly—downtime. A single memory failure in a hyperscale AI cluster can cascade int... » read more

Scaling DRAM Technology To Meet Future Demands: Challenges And Opportunities


Since the invention of the 1T1C bit cell more than 50 years ago, DRAMs have become the main memory of choice for processors in computer systems and many consumer electronics devices. As new use computing paradigms have been created, including 3D graphics, cloud computing, smart phones, and AI processing, specialized processors and DRAM memories have been developed that are optimized for these u... » read more

LLM Inference: Core Bottlenecks Imposed By Memory, Compute Capacity, Synchronization Overheads (NVIDIA)


A new technical paper titled "Efficient LLM Inference: Bandwidth, Compute, Synchronization, and Capacity are all you need" was published by NVIDIA. Abstract "This paper presents a limit study of transformer-based large language model (LLM) inference, focusing on the fundamental performance bottlenecks imposed by memory bandwidth, memory capacity, and synchronization overhead in distributed ... » read more

Five Questions To Ask When Selecting A Temporary Bonding And Debonding System


High-bandwidth memory blocks (HBM) memory, microprocessors, field-programmable gate arrays (FPGA), AI accelerators, and other devices used in advanced system-level packaging all rely on temporary bonding and debonding systems to shrink their footprint. Understanding which properties play the most crucial role in device reliability and efficient production will ensure you are maximizing your yie... » read more

Examination Of Thermal Issues Related to Hybrid Bonding of 3D-Stacked HBM


A new technical paper titled "Thermal Issues Related to Hybrid Bonding of 3D-Stacked High Bandwidth Memory: A Comprehensive Review" was published by researchers at Chungbuk National University. Abstract "High-Bandwidth Memory (HBM) enables the bandwidth required by modern AI and high-performance computing, yet its three dimensional stack traps heat and amplifies thermo mechanical stress. We... » read more

Co-Designing Data Center Architecture To Support LLMs (Intel, Georgia Tech)


A new technical paper titled "Scaling Intelligence: Designing Data Centers for Next-Gen Language Models" was published by Intel Corporation and Georgia Tech. An excerpt from the paper's abstract: "Our work provides a comprehensive co-design framework that jointly explores FLOPS, HBM bandwidth and capacity, multiple network topologies (two-tier vs. FullFlat optical), the size of the scale-ou... » read more

System-Level Approach To Reducing HBM Cost for AI inference (RPI, IBM)


A new technical paper titled "Breaking the HBM Bit Cost Barrier: Domain-Specific ECC for AI Inference Infrastructure" was published by researchers at Rensselaer Polytechnic Institute and IBM. Abstract "High-Bandwidth Memory (HBM) delivers exceptional bandwidth and energy efficiency for AI workloads, but its high cost per bit, driven in part by stringent on-die reliability requirements, pose... » read more

Physics Limits Interposer Line Lengths


Electrical interposers provide a convenient surface for mounting multiple chips within a single package, but even though interposer lines theoretically can be routed anywhere, insertion losses limit their practical length. Lines on interposers — and on silicon interposers in particular — can be exceedingly narrow. Having a small cross-section makes such lines resistive, degrading signals... » read more

← Older posts Newer posts →