A Bench-To-In-Field Telemetry Platform For Data Center Power Management


By Aakash Jani and Venkatesh Santhanagopalan NVIDIA's Blackwell platform delivered roughly 15% lower energy and 13% higher throughput [1]. Those gains came from hardware-firmware co-design that matches operating points to each workload, not a new process node. Most SoCs do not adapt: their margins are set and frozen the day silicon ships, based on the workloads measured at the bench. The mi... » read more

Improving GPU Energy Efficiency With Component-Level Power Management (AMD)


Researchers from AMD released “CompPow: A Case for Component-level GPU Power Management”. Abstract “The ever increasing demand for ML-driven intelligence in a wide spectrum of domains has led to ubiquity of GPUs. At the same time, GPUs are notorious for their power consumption needs and often dominate power allocation in a typical ML datacenter. While datacenter-level power opti... » read more

Robust Dynamic Voltage Droop Mitigation And Power Management


Power management is one of the keys for developing successful semiconductors products. There are virtually no applications for which power consumption is not a concern. Many creative solutions have been developed to reduce and manage power. Making these schemes work robustly in real-world conditions can be a challenge. This post considers widely used methods—voltage droop/glitch detection and... » read more

The Future of Powering AI


We are at the dawn of the next technological revolution. A revolution driven by Artificial Intelligence (AI) and accelerating at an unprecedented speed. At the end of 2022, with the launch of ChatGPT, our world changed significantly; within two months, it gained 100 million active users. AI not only has the potential to transform every area of life, be it financial services, digital assistants,... » read more

PCIe Low-Power Validation Challenges And Potential Solutions


As chip complexities increase and the industry evolves to more battery-powered devices, power-aware/consumption research becomes an integral part of design in the industry. Low power is crucial in ASIC applications to ensure longevity, durability, and reliability. PCI-SIG has focused on reducing power consumption while the PCIe interface is active to enable better platform power management (... » read more

Efficiency Defines The Future Of Data Movement


For decades, chip performance was measured by how much raw compute could be packed onto a die. However, that equation has changed. Moving data across a system-on-chip (SoC) now consumes more energy than the computations it performs. Efficient data movement has become a significant challenge for next-generation SoC designs. AI workloads are multiplying, hyperscale data centers are approaching po... » read more

Power Stabilization To Allow Continued Scaling Of AI Training Workloads (Microsoft, OpenAI, NVIDIA)


A new technical paper titled "Power Stabilization for AI Training Datacenters" was published by researchers at Microsoft, OpenAI, and NVIDIA. Abstract "Large Artificial Intelligence (AI) training workloads spanning several tens of thousands of GPUs present unique power management challenges. These arise due to the high variability in power consumption during the training. Given the synchron... » read more

Connecting AI Accelerators


Experts At The Table: Semiconductor Engineering sat down to discuss the various ways that AI accelerators are being applied today with Marc Meunier, director of ecosystem development at Arm; Jason Lawley, director of product marketing for AI IP at Cadence; Paul Karazuba, vice president of marketing at Expedera; Alexander Petr, senior director at Keysight; Steve Roddy, chief marketing office... » read more

Offline RL Framework That Dynamically Controls The GPU Clock And Server Fan Speed To Optimize Power Consumption And Computation Time (KAIST)


A new technical paper titled "Power Consumption Optimization of GPU Server With Offline Reinforcement Learning" was published by researchers at Korea Advanced Institute of Science and Technology (KAIST) and KT Research and Development Center. "Optimizing GPU server power consumption is complex due to the interdependence of various components. Conventional methods often involve trade-offs: in... » read more

Future-proofing AI Models


Experts At The Table: Making sure AI accelerators can be updated for future requirements is becoming essential due to the rapid introduction of new models. Semiconductor Engineering sat down to discuss the challenges of future-proofing these designs with Marc Meunier, director of ecosystem development at Arm; Jason Lawley, director of product marketing for AI IP at Cadence; Paul Karazuba, vic... » read more

← Older posts