Four-Tier Memory Hierarchy for LLM Reasoning (USC, UW)


A new technical paper, "Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning," was published by researchers at USC and University of Wisconsin-Madison. Abstract "Reasoning LLMs produce thousands of chain-of-thought tokens whose KV cache must reside in scarce GPU HBM. The dominant response -- permanently evicting low-importance tokens -- is catastrophic for reasoni... » read more

Flash Getting Stacked High-Bandwidth Version


Key takeaways: A new HBF 3D flash stack is similar to HBM for use in AI processing. HBF capacity will be much higher, allowing static storage of AI model weights, with optimized read speed. Samples are due out later this year, with accelerators featuring it coming out next year. AI inference using modern models requires billions of parameters, and moving them to where they c... » read more

AI Accelerator Testing Depends On DFT Innovations


Key Takeaways: I/O and lane repair capabilities are becoming critical to improving yield. System-level testing catches marginal defects and rare defects such as silent data corruption errors. Synopsys and TSMC developed a multi-die demo vehicle capable of full test, monitor, debug, and repair capability across the system’s lifecycle. The proliferation of accelerators in AI... » read more

HBM Shifts Testing Left To Preserve AI Chip Yield


Key Takeaways: A high-yield, known-good stack requires multiple test insertions. Known good stack testing poses challenges for power delivery and thermal management. The shift to HBM4 and HBM5 will increase the pressure for shift-left test flows. Taller high-bandwidth memory (HBM) stacks and tighter TSV pitch are impacting AI module yields. The solution is to push test furth... » read more

Replacing GPU Compute Dies With PNM-Enabled HBM Cubes For Long-Context Decode Attention (UCSD, Columbia, Yonsei U., NVIDIA, Samsung)


A new technical paper, "AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving," was published by researchers at UC San Diego, Columbia University, Yonsei University, NVIDIA, and Samsung. Abstract "All current LLM serving systems place the GPU at the center, from production-level attention-FFN disaggregation to NVIDIA's Rubin GPU-LPU heterogeneous p... » read more

When Semiconductor Materials Misbehave


Key Takeaways Material behavior in production depends on the process context that no development environment can fully replicate. In advanced packaging, the interactions that cross domain boundaries are increasingly where failures originate. The most accurate materials data is also the most commercially sensitive, leaving simulation models calibrated against generic inputs rather tha... » read more

TSV Complexity Leads To Manufacturing Bottleneck


Key Takeaways: Through-silicon vias are the biggest enabler of 3D chip stacking and chip-to-PCB connections through silicon interposers. The AI boom is causing HBM and advanced assembly shortages, straining the supply chain. Optimization around etch, fill and reveal help reduce TSV cost. Through-silicon vias (TSVs) provide essential interconnects between DRAM dies inside hig... » read more

Panel-Level Packaging’s Second Wave Meets Engineering Reality


Key Takeaways Panel-level packaging is arriving not because the engineering is ready, but because wafer-level economics are breaking down. Glass improves the warpage and dimensional stability problems of organic substrates but introduces a different class of failure modes that require materials solutions, not process adjustments. The central challenges of panel-level processing are m... » read more

PDN Challenges In DRAM-Based Compute-In-Memory Systems (UT Austin)


A new technical paper, "A comparative study on power delivery aspects of compute-in/near-memory approaches using DRAM," was published by researchers at UT Austin. Abstract "Compute-in-memory (PIM) mitigates the memory wall by performing computation within memory, reducing data movement and improving energy efficiency. DRAM-based PIM is particularly attractive due to its high density, matu... » read more

Early HBM4 Validation Points The Way For Next Generation AI And HPC Systems


As AI and high‑performance computing systems continue to scale, memory bandwidth has emerged as a primary system‑level constraint. Larger models, higher compute density, and increasingly complex multi‑die designs are driving the need for memory interfaces that can deliver extreme bandwidth while operating within tight power and signal‑integrity margins. High‑Bandwidth Memory (HBM) has... » read more

← Older posts