More Massive Still: Why AI Infrastructure Demands A Unified Design Approach


At the recent Data Center World 2026 in Washington, D.C., one message came through louder than ever: AI infrastructure is scaling faster than any system we’ve built before—and the industry can no longer afford to design it in silos. The workshop: “More Massive Still! Delivering AI-Driven Scale in the Face of Historic Constraints” captured this perfectly: the industry is shifting fr... » read more

Cloud HPC For AI: Addressing Latency, Cost, And Scale At The Architectural Level


Many organizations assume that moving HPC workloads to the cloud is simply a matter of lifting and shifting on-premises clusters. In practice, that approach often erodes performance, inflates costs, and undermines AI training efficiency. Getting the most out of HPC in the cloud requires a fundamentally different architectural approach — one that minimizes latency, maximizes utilization, an... » read more

Building A Production-Ready Optically Connected Rack For AI Scale-Up


By Nandita Aggarwal and Nicholas Chang As AI models drive compute demand, servers keep getting bigger. Rack‑scale AI systems (such as the 72-GPU systems from NVIDIA or AMD) enable many GPUs to work together through system-level optimization. They push beyond the limits of single-chip performance and meet the soaring compute needs of the AI era. But this is just the beginning. The next s... » read more

CPO Will Dominate Scale-Up: Link Budgets For dB And $ Are Key


In the next five years, scale-up interconnects will transition from copper to optical interconnects — primarily co-packaged optics (CPO), with some near-packaged optics (NPO), and perhaps some vertical-cavitity-surface-emitting lasers (VCSELs). The demand for AI has become visibly real with Anthropic hitting a $47 billion annual run rate, followed closely by OpenAI and Google Gemini. Anthr... » read more

1 Megawatt Racks In Data Centers


The demand for performance in an AI data center is causing a huge spike in the amount of power being consumed. Within a rack are a half-dozen SoC components housed in different types of advanced packages and connected with an assortment of blazing-fast interface IP and optical signaling. Manmeet Walia, director of product management for mixed-signal PHY IP in the Synopsys Solutions Group, talks... » read more

A Bench-To-In-Field Telemetry Platform For Data Center Power Management


By Aakash Jani and Venkatesh Santhanagopalan NVIDIA's Blackwell platform delivered roughly 15% lower energy and 13% higher throughput [1]. Those gains came from hardware-firmware co-design that matches operating points to each workload, not a new process node. Most SoCs do not adapt: their margins are set and frozen the day silicon ships, based on the workloads measured at the bench. The mi... » read more

SOCAMM2: Bringing LPDDR5X Benefits To AI Servers


The rapid scaling of artificial intelligence is reshaping nearly every dimension of data center design. While much of the focus has been on GPUs, accelerators and advanced packaging, another constraint is emerging as equally critical: power. As AI models grow larger and more complex, power consumption, not raw compute, is increasingly the limiting factor in system scalability. Modern AI work... » read more

Test Distribution Evolves To Meet AI Challenges


The proliferation of artificial intelligence (AI) is driving rapid acceleration of the semiconductor market, which analysts now predict will reach $1 trillion this year. Many semiconductor devices will be the GPUs that populate the data centers that run AI workloads. Driven by strong, sustained investments from hyperscaler operators, high-performance computing (HPC)/AI data centers are expected... » read more

Ensuring AI Reliability: Mitigating Silent Data Corruption Risks


Silent Data Corruption (SDC) is an industry challenge affecting data centers worldwide with increasing frequency. This phenomenon stems from untraceable hardware failures that make detection notoriously difficult. SDCs don’t leave any record in system logs or trigger exception mechanisms. The corrupted data they produce can propagate unnoticed, causing cascading failures that often demand ext... » read more

The AI Server Challenge: Testing Power At Scale


Artificial intelligence is most often framed as a story of compute advancements. Faster GPUs, denser accelerators, and advanced process nodes. But behind every AI workload, the most fundamental constraint is power. Fig. 1: AI server market. Source: Grand View Research As AI servers scale to meet data center demand, power delivery is becoming one of the most critical and complex engine... » read more

← Older posts