Chiplet Tradeoffs And Limitations


The semiconductor industry is buzzing with the benefits of chiplets, including faster time to market, better performance, and lower power, but finding the correct balance between customization and standardization is proving to be more difficult than initially thought. For a commercial chiplet marketplace to really take off, it requires a much deeper understanding of how chiplets behave indiv... » read more

GPU Analysis Identifying Performance Bottlenecks That Cause Throughput Plateaus In Large-Batch Inference


A new technical paper titled "Mind the Memory Gap: Unveiling GPU Bottlenecks in Large-Batch LLM Inference" was published by researchers at Barcelona Supercomputing Center, Universitat Politecnica de Catalunya, and IBM Research. Abstract "Large language models have been widely adopted across different tasks, but their auto-regressive generation nature often leads to inefficient resource util... » read more

Experimental Characterization Results and State-of-the-Art Device-Level Studies of DRAM Read Disturbance


A new technical paper titled "Revisiting DRAM Read Disturbance: Identifying Inconsistencies Between Experimental Characterization and Device-Level Studies" was published by researchers at ETH Zurich. Abstract "Modern DRAM is vulnerable to read disturbance (e.g., RowHammer and RowPress) that significantly undermines the robust operation of the system. Repeatedly opening and closing a DRAM ro... » read more

Improving DRAM Performance Using Dual Work-Function Metal Gate (DWMG) Structures


Dynamic Random Access Memory (DRAM) serves as the backbone of modern computing, enabling devices ranging from smartphones to high-performance servers. As the demand accelerates for higher density and lower power consumption in memory devices, innovation in reducing DRAM leakage currents and enhancing performance becomes essential. One significant challenge in scaling DRAM technology is gate-... » read more

The Rise Of Thin Wafer Processing


The shift from planar SoCs to 3D-ICs and advanced packages requires much thinner wafers in order to improve performance and reduce power, reducing the distance that signals need to travel and the amount of energy needed to drive them. Markets calling for ultrathin wafers are growing. The combined thickness of an HBM module with 12 DRAM dies and a base logic chip is still less than that of a ... » read more

Solution To Read Disturbance For Current And Future DRAM Chips at Low Area, Performance And Energy Costs (ETH Zurich et al.)


A new technical paper titled "Chronus: Understanding and Securing the Cutting-Edge Industry Solutions to DRAM Read Disturbance" was published by researchers at ETH Zurich, TOBB, and University of Sharjah. Abstract "We 1) present the first rigorous security, performance, energy, and cost analyses of the state-of-the-art on-DRAM-die read disturbance mitigation method, Per Row Activation Count... » read more

Effects Of Reduced Refresh Latency On RowHammer Vulnerability Of DDR4 DRAM Chips


A new technical paper titled "Understanding RowHammer Under Reduced Refresh Latency: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions" was published by researchers at ETH Zurich, TOBB University of Economics and Technology, and University of Sharjah. Abstract "RowHammer is a major read disturbance mechanism in DRAM where repeatedly accessing (hammering) a row of... » read more

SW-HW Co-Design Mitigation To Strengthen ASLR Against Microarchitectural Attacks (MIT)


A technical paper titled "Oreo: Protecting ASLR Against Microarchitectural Attacks" was published by researchers at MIT. Abstract "Address Space Layout Randomization (ASLR) is one of the most prominently deployed mitigations against memory corruption attacks. ASLR randomly shuffles program virtual addresses to prevent attackers from knowing the location of program contents in memory. Microa... » read more

Interconnects Approach Tipping Point


As leading devices move to next generation nanosheets for logic, their interconnections are getting squeezed past the point where they can deliver low resistance pathways. The 1nm (10Å) node will have 20nm pitch and larger metal lines, but the interconnect stack already consumes a third of device power and accounts for 75% of the chip's RC delay. Changing this dynamic requires a superior co... » read more

Memory Wall Problem Grows With LLMs


The growing imbalance between the amount of data that needs to be processed to train large language models (LLMs) and the inability to move that data back and forth fast enough between memories and processors has set off a massive global search for a better and more energy- and cost-efficient solution. Much of this is evident in the numbers. The GPU market is forecast to reach $190 billion in ... » read more

← Older posts