Systems & Design

SPONSOR BLOG

More Massive Still: Why AI Infrastructure Demands A Unified Design Approach

Tokens-per-watt is now the primary metric driving AI data center optimization.

June 25th, 2026 - By: Antti Lautanen

At the recent Data Center World 2026 in Washington, D.C., one message came through louder than ever: AI infrastructure is scaling faster than any system we’ve built before—and the industry can no longer afford to design it in silos.

The workshop: “More Massive Still! Delivering AI-Driven Scale in the Face of Historic Constraints” captured this perfectly: the industry is shifting from traditional data centers — where IT and facility teams operate in silos, optimizing against each other around competing metrics (uptime and PUE) — to fully integrated “AI factories” where the entire system is judged by one metric: tokens-per-watt.

And that shift changes everything.

The discussion brought together leaders across the ecosystem, including Greg Stover (Vertiv), Al Nichols (Silverback Data Center Solutions), Josh Claman (Accelsius), Kourosh Nemati (NVIDIA), Nathan Mallamace (Supermicro), Rob Curtis (AMD), and Sherman Ikemoto (Cadence Design Systems).

The era of fragmented design and management is over

For decades, data center architecture evolved in layers—chips, packages, racks, cooling, power, and facilities—each optimized independently. That model worked when Moore’s Law drove progress.

But as the session highlighted, “now the system is the chip.”

This was reinforced across speakers:

Integrated systems are now winning, requiring a holistic approach across compute, cooling, and operations.
The emergence of the 1 GW AI factory is driving the need to optimize everything—from location selection to cooling strategy—to maximize tokens-per-watt.
Thermal challenges are intensifying at the chip and package level, particularly with HBM and advanced packaging.

Power density is accelerating rapidly, with AI factories scaling from 100 MW to 1 GW and beyond. Across all perspectives, the conclusion was consistent: the ecosystem must design together, or it will fail to scale.

The stack tax is real—and it’s massive

One of the most powerful concepts discussed was the “stack tax.”

When each layer of the infrastructure is overbuilt independently “for safety,” inefficiencies compound across the system. According to the session data:

Each layer can add ~35% overhead
Fragmented designs can drive full-stack PUE as high as 5.4
Integrated designs can reduce this to ~1.2

The impact is staggering.

A well-optimized, 1 GW NVIDIA Vera Rubin AI factory will operate approximately 300,000 GPUs, while the same AI factory, unoptimized, might reach only 65% of its throughput potential.

This isn’t just an efficiency problem—it’s a business problem. The difference directly translates into token output, revenue, and competitive advantage.

And as AI workloads shift increasingly toward inference at massive scale, this gap will only widen.

AI factories demand full-stack optimization

NVIDIA described AI infrastructure as a “five-layer cake”: energy, infrastructure, chips, models, and applications. All five must scale together.

That is exactly the challenge the industry faces today.

Cooling is no longer a facility concern—it’s deeply linked to chip performance and total factory token throughput.
Packaging decisions affect rack density and thermal behavior.
Power constraints determine where AI factories can even be built.
Workloads now dynamically impact system behavior in real time.

Even emerging innovations like:

Two-phase cooling
Agentic AI systems
AI-driven design workflows using simulation-ready assets

…all require coordination across domains that historically never interacted closely.

This is where the industry must evolve—from connected tools to a connected system design methodology.

Rendering of the SimReady NVIDIA GB300 NVL72 model in the Cadence Reality Digital Twin Platform, powered by NVIDIA Omniverse libraries, demonstrating detailed airflow simulation within an AI factory environment.

The glue across the AI factory

This is precisely where Cadence plays a unique role. As multiple speakers acknowledged during the session, Cadence is helping bridge the gap between disciplines—from chip design to facility optimization.

Why does this matter? Because the modern AI factory is not a collection of components—it’s a tightly coupled system.

Cadence enables:

Chip-to-chiller design integration: Connecting silicon, packaging, and cooling models into a unified simulation flow.
Physics-based digital twins for full-stack optimization: This enables operators to simulate the physical behavior of an AI factory — across power, cooling, and workloads — and understand how these interactions drive token throughput, and optimize for tokens-per-watt performance before a single piece of infrastructure is deployed.
SimReady assets and AI-driven workflows: Supporting technologies such as NVIDIA Omniverse libraries to enable interactive simulation and rapid design iteration.
Cross-domain collaboration: Providing a common environment where chip designers, system architects, and data center operators can work together.

This is not just incremental improvement—it’s a fundamental shift.

In fact, Cadence’s Reality Digital Twin approach demonstrates tangible impact:

Maximize tokens-per-watt: Continuously optimize tokens-per-watt by adjusting operating parameters — cooling set points, workload distribution, and GPU operating point on the P-Q curve — to maintain peak efficiency as workloads and environmental conditions change. Run GPUs in efficient (MaxQ)—validated by digital twins—delivering up to 30% more tokens and +17% improvement in tokens/watt.
Scale the upside: Efficiency gains translate into up to billions of dollars of additional annual revenue at 1 GW AI factory scale.

That’s the power of eliminating the stack tax.

Designing for tokens, not just watts

Another key takeaway from the session was the shift in performance metrics.

The industry is moving toward:

Watts per token
Performance per token

This reframes optimization entirely.

It’s no longer enough to:

Improve chip efficiency
Optimize cooling systems
Or reduce facility costs

Instead, the goal is to maximize end-to-end token output under real-world constraints.

That means:

Designing for peak and average workloads
Dynamically balancing cooling and power
Understanding how infrastructure behaves over time—not just at design point

This is a systems problem—and it demands systems thinking.

The path forward: Open, integrated, and collaborative

The main takeaway from the session was clear: the industry must align around a shared ecosystem.

Key enablers include:

Common data models and standards (e.g., OpenUSD)
Simulation-ready assets across vendors
Interoperability between tools and platforms
Collaboration across the entire value chain

Because no single company can solve this alone.

But together, the industry can build AI infrastructure that is:

Scalable
Efficient
Sustainable
Profitable

Final thoughts

We are in the era of the AI factory—where data centers are no longer passive infrastructure, but active systems producing intelligence in real time.

The scale is unprecedented. The constraints are real. And the margin for inefficiency is gone. The only way forward is integration. And that is where Cadence becomes the glue—connecting chips to cooling, power to performance, and design to operations.

Because in the age of AI, the system is the product—and only a unified approach will unlock its full potential.

Antti Lautanen

(all posts)
Antti Lautanen is a senor marketing lead for strategy & new vendtures at Cadence Design Systems.

More Massive Still: Why AI Infrastructure Demands A Unified Design Approach

The era of fragmented design and management is over

The stack tax is real—and it’s massive

AI factories demand full-stack optimization

The glue across the AI factory

Designing for tokens, not just watts

The path forward: Open, integrated, and collaborative

Final thoughts

Antti Lautanen

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2026

Advanced Packaging Limits Come Into Focus

All AI Data Center Interconnects Will Be Optical Within 5 Years

The Sub-2nm Paradox

When Semiconductor Materials Misbehave

TSMC Tech Symposium 2026, By The Numbers

Silicon Photonics Lights The Way To More Efficient Data Centers

Memory Wall Gets Higher

Sponsors

Recent Comments

About

Navigation

Connect With Us

More Massive Still: Why AI Infrastructure Demands A Unified Design Approach

The era of fragmented design and management is over

The stack tax is real—and it’s massive

AI factories demand full-stack optimization

The glue across the AI factory

Designing for tokens, not just watts

The path forward: Open, integrated, and collaborative

Final thoughts

Antti Lautanen

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Startup Funding: Q1 2026

Advanced Packaging Limits Come Into Focus

All AI Data Center Interconnects Will Be Optical Within 5 Years

The Sub-2nm Paradox

When Semiconductor Materials Misbehave

TSMC Tech Symposium 2026, By The Numbers

Silicon Photonics Lights The Way To More Efficient Data Centers

Memory Wall Gets Higher

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored