Heterogeneous NPU Data Movement: What The Execution Flow Shows


Heterogeneous NPU designs bring together multiple specialized compute engines to support the range of operators required by modern AI models. This approach enables coverage across diverse workloads, but it also introduces a structural consequence: intermediate data must move between those engines. That movement consumes power, adds latency, and requires additional silicon resources, with effect... » read more

Research Bits: Apr. 14


Authentication for edge devices Researchers from the University of Hong Kong, Tsinghua University, and the Southern University of Science and Technology designed a privacy-preserving system for edge devices that combines physically unclonable functions and compute-in-memory. The Co-Located Authentication and Processing (CLAP) system integrates authentication and processing functions within ... » read more

PCIe 8.0: Enabling The Next Generation Of High Bandwidth Systems


As compute architectures evolve to support increasingly data‑intensive workloads, the role of high‑speed I/O has never been more critical. Artificial intelligence, high‑performance computing, hyperscale infrastructure, and advanced networking all depend on moving massive volumes of data efficiently, reliably, and at scale. The PCI‑SIG’s announcement of PCIe 8.0, which targets 256.0... » read more

Early HBM4 Validation Points The Way For Next Generation AI And HPC Systems


As AI and high‑performance computing systems continue to scale, memory bandwidth has emerged as a primary system‑level constraint. Larger models, higher compute density, and increasingly complex multi‑die designs are driving the need for memory interfaces that can deliver extreme bandwidth while operating within tight power and signal‑integrity margins. High‑Bandwidth Memory (HBM) has... » read more

The Coming Breakup Between AI And The Cloud


For a decade, cloud AI has felt inevitable. It powers our voice assistants, photo libraries, recommendation engines, and a growing list of “smart” features we barely notice anymore. Yet beneath the convenience is a fragile dependency: if your connection stutters, your intelligence does too.​ We rarely question this arrangement, but we should. As models grow larger and expectations grow... » read more

Power Integrity Without Blind Spots: A System Level Approach To 3D-ICs


Power delivery has become one of the defining challenges of next-generation semiconductor systems. As AI, high-performance computing, and data-centric workloads drive higher performance and tighter integration, traditional 2D SoC design approaches are reaching their limits. The industry’s shift toward 2.5D and 3D heterogeneous integration promises breakthroughs in performance and efficiency�... » read more

DRAM’s Whac‑A‑Mole Security Crisis


Key takeaways: Rowhammer remains a DRAM security threat, while Rowpress has increasingly become a related threat. New commands issued by the memory controller can help manage refreshes, but they’re not a perfect solution. A smaller, vertical DRAM cell may eliminate the problem, but it’s years away. Rowhammer has been a persistent DRAM issue across several memory generati... » read more

A New Era For Co-Processing


Key Takeaways: There is no single processor capable of executing everything efficiently, meaning that multiple processors are required. Maximum efficiency is gained by minimizing the movement of data. Architects must maximize efficiency for today's workloads, while also adding enough flexibility to handle tomorrow's. New processor architectures are rapidly evolving thanks to... » read more

Rethinking Robotics Reinforcement Learning: A Practical Humanoid Training Workflow


Reinforcement learning (RL) for robotics is often associated with large GPU clusters, distributed infrastructure, and x86-based development environments. Training a humanoid robot with high-fidelity simulation is a resource-intensive workflow that runs in the data center. What if that workflow could run on a single workstation? In this blog post, we explore a complete robotics pipeline bu... » read more

Fast Isn’t Fast Enough: Redefining Metrics for Edge AI


Key Takeaways: Edge AI performance is about low latency and power efficiency, not peak TOPS. Memory bandwidth and data movement now limit edge AI more than compute. Successful edge AI requires balanced hardware, software, and fast model updates. Experts At The Table: Today’s chip architect must contend with multiple factors when architecting AI processors for fast and effi... » read more

← Older posts Newer posts →