Open-Source RISC-V Cores: Analysis Of Scalar and Superscalar Architectures And Out-Of-Order Machines


A new technical paper titled "Ramping Up Open-Source RISC-V Cores: Assessing the Energy Efficiency of Superscalar, Out-of-Order Execution" was published by researchers at ETH Zurich, Università di Bologna and Univ. Grenoble Alpes, Inria. Abstract "Open-source RISC-V cores are increasingly demanded in domains like automotive and space, where achieving high instructions per cycle (IPC) throu... » read more

Hardware-Oriented Analysis of Multi-Head Latent Attention (MLA) in DeepSeek-V3 (KU Leuven)


A new technical paper titled "Hardware-Centric Analysis of DeepSeek's Multi-Head Latent Attention" was published by researchers at KU Leuven. Abstract "Multi-Head Latent Attention (MLA), introduced in DeepSeek-V2, improves the efficiency of large language models by projecting query, key, and value tensors into a compact latent space. This architectural change reduces the KV-cache size and s... » read more

Integration of High-Density Polymer Waveguides With Silicon Photonics for CPO (imec, Ghent)


A technical paper titled "Low-Loss Integration of High-Density Polymer Waveguides with Silicon Photonics for Co-Packaged Optics" was published by researchers at imec and Ghent University. Abstract "Co-Packaged Optics applications require scalable and high-yield optical interfacing solutions to silicon photonic chiplets, offering low-loss, broadband, and polarization-independent optical coup... » read more

Quantifying The PFAS Impact In ICs Manufacturing (Harvard University)


A new technical paper titled "Modeling PFAS in Semiconductor Manufacturing to Quantify Trade-offs in Energy Efficiency and Environmental Impact of Computing Systems" was published by researchers at Harvard University and Mohamed Bin Zayed University of AI (MBZUAI). "The electronics and semiconductor industry is a prominent consumer of per- and poly-fluoroalkyl substances (PFAS), also known a... » read more

Arithmetic Intensity In Decoding: A Hardware-Efficient Perspective (Princeton University)


A new technical paper titled "Hardware-Efficient Attention for Fast Decoding" was published by researchers at Princeton University. Abstract "LLM decoding is bottlenecked for large batches and long contexts by loading the key-value (KV) cache from high-bandwidth memory, which inflates per-token latency, while the sequential nature of decoding limits parallelism. We analyze the interplay amo... » read more

Roadmap for AI HW Development, With The Role of Photonic Chips In Supporting Future LLMs (CUHK, NUS, UIUC, Berkeley)


A new technical paper titled "What Is Next for LLMs? Next-Generation AI Computing Hardware Using Photonic Chips" was published by researchers at The Chinese University of Hong Kong, National University of Singapore, University of Illinois Urbana-Champaign and UC Berkeley. Abstract "Large language models (LLMs) are rapidly pushing the limits of contemporary computing hardware. For example, t... » read more

SRAM Cell Scaling With Monolithic 3D Integration Of 2D FETs (Penn State)


A new technical paper titled "Enabling static random-access memory cell scaling with monolithic 3D integration of 2D field-effect transistors" was published by researchers at The Pennsylvania State University. Abstract "Static Random-Access Memory (SRAM) cells are fundamental in computer architecture, serving crucial roles in cache memory, buffers, and registers due to their high-speed perf... » read more

Chiplet-to-Chiplet Gateway Architecture, A C2C Interface Bridging Two Chiplet Protocols (Peter Grünberg, Jülich Supercomputing Centre)


A new technical paper titled "Modeling Chiplet-to-Chiplet (C2C) Communication for Chiplet-based Co-Design" was published by researchers at Peter Grünberg Institute and Jülich Supercomputing Centre. Abstract "Chiplet-based processor design, which combines small dies called chiplets to form a larger chip, enables scalable designs at economical costs. This trend has received high attention s... » read more

Offline RL Framework That Dynamically Controls The GPU Clock And Server Fan Speed To Optimize Power Consumption And Computation Time (KAIST)


A new technical paper titled "Power Consumption Optimization of GPU Server With Offline Reinforcement Learning" was published by researchers at Korea Advanced Institute of Science and Technology (KAIST) and KT Research and Development Center. "Optimizing GPU server power consumption is complex due to the interdependence of various components. Conventional methods often involve trade-offs: in... » read more

Role of Josephson Junctions In Propelling Quantum Technologies Forward (LBNL, UC Berkeley, et al.)


A new technical paper titled "Josephson Junctions in the Age of Quantum Discovery" was published by researchers at Lawrence Berkeley National Laboratory, UC Berkeley, Gwangju Institute of Science and Technology, Korea University, Max Planck and Anyon Computing. Abstract "The unique combination of energy conservation and nonlinear behavior exhibited by Josephson junctions has driven transfor... » read more

← Older posts Newer posts →