Examining The Impact Of Chip Power Reduction On Data Center Economics

Moving beyond conventional adaptive voltage scaling methods.

popularity

In the rapidly evolving landscape of data centers, optimizing energy consumption has become a critical focus. In this blog post, we’ll delve into the intricacies of power consumption, exploring the economics of three key components: CPUs, GPUs, and AI accelerators, and how the implementation of proteanTecs power reduction solution transforms both power efficiency and computational capabilities.

The significance of power optimization in data centers

Cloud-scale data center requirements push the boundaries of infrastructure performance and resilience, with power optimization standing out firmly as a critical factor. While data centers continue to expand to meet the demands of an increasingly digitized world, the efficient use of power becomes not only a cost-saving strategy but also a fundamental approach towards sustainability and environmental responsibility.

Why power optimization is essential:

  • Cost Efficiency: With the colossal energy demands of data centers, optimizing power usage directly impacts operational costs. Efficient power consumption translates into significant savings for data center operators, enhancing the overall economic viability of these facilities.
  • Environmental Impact: Beyond financial considerations, power optimization aligns with environmental sustainability goals. Reducing the carbon footprint of data centers has become a priority in an era where the ecological impact of technology is under serious scrutiny.

Beyond energy efficiency: Enhancing infrastructure performance

Reducing the power consumption of individual devices within a data center not only contributes to energy efficiency, but also holds the key to unlocking greater infrastructure performance. By lowering the power demand of each system, there is a proportional increase in available power within the data center’s overall budget. This surplus power can then be strategically redistributed, allowing for higher system utilization rates. Effectively, the optimization of per-device power translates into an empowered data center infrastructure, capable of handling more computational tasks without exceeding power constraints. This dynamic equilibrium ensures a responsive and efficient data center, where each watt saved on device power contributes to an amplified capacity for meeting computing demands.

However, achieving effective power optimization in data centers is not without its challenges:

  • Dynamic Workloads: Data centers host a diverse array of applications with varying workloads. Optimizing power across dynamically changing workloads requires sophisticated technologies that can adapt to fluctuations in demand.
  • Aging Infrastructure: Many data centers have legacy infrastructure that may not be inherently designed for optimal power efficiency. Upgrading or retrofitting such infrastructure poses challenges in achieving comprehensive power optimization.
  • Cooling Requirements: As data centers generate substantial heat, cooling systems are essential. The equation is simple: the more power a device consumes, the more power it takes to cool it. It’s a vicious cycle. Balancing the power consumption of IT equipment with the energy demands of cooling systems presents a complex challenge in achieving overall efficiency. Companies strive to lower the Power Usage Effectiveness (PUE) to an optimized 1:1 ratio.

proteanTecs has introduced a solution to the intricate task of power management in data centers. In response to the challenges of dynamic workloads, process variations, environmental factors, and aging effects, proteanTecs AVS Pro provides a real-time, deep data application that monitors power usage in mission-mode, going far beyond conventional adaptive voltage scaling (AVS) methods.

By employing on-chip Agents that continuously monitor millions of logic paths in real time, AVS Pro uniquely identifies the lowest margin to timing failure. This real-time resolution allows the system to dynamically adjust the supply voltage based on actual timing margins, optimizing power consumption while ensuring error-free functionality. Unlike traditional AVS methods limited by local on-chip sensors or emulators, AVS Pro’s Margin Agents address dynamic effects throughout the lifetime of the device.

AVS Pro not only leverages excess margins to enable power and performance optimization, but also safeguards against timing failures, redefining the equilibrium between reliability, efficiency, and performance. Proven in multiple customer systems, proteanTecs AVS Pro has demonstrated power savings ranging from 9-14%. For data centers and cloud providers, this translates to millions of dollars in savings per year.

Redefining the economics of power

Let’s take a closer look at the financial implications of system power reduction and its effects on infrastructure utilization. Factoring in the savings described above, the following model provides a comprehensive framework for evaluating the impact on both cost and operational aspects of data centers. [1]

Power gains across CPUs, GPUs, AI accelerators.

‍CPUs

Understanding the baseline: Before incorporating proteanTecs, the data center operated with the following assumptions for CPUs:

  • Power Consumption (kWh): 0.15
  • Utilization Rate: 60%
  • PUE: 1.3
  • Total Power per CPU (kWh): 0.117

Given 500,000 CPUs and electricity costs of $0.10 kW/hour, the annual power costs amounted to $51,246,000.

Impact: The introduction of proteanTecs technology resulted in a 11% power savings per CPU. This translates to an annual cost reduction of $5,637,060.

Transactional performance boost: Beyond cost savings, the transactions per second (TPS) can increase by 1.9 billion attributed to per-CPU power reduction.

GPUs

Understanding the baseline: Before proteanTecs, the modeled GPU infrastructure comprised of:

  • Power Consumption (kWh): 0.35
  • Utilization Rate: 60%
  • PUE: 1.3
  • Total Power per GPU (kWh): 0.273

With 500,000 GPUs and electricity costs of $0.10 kW/hour, the annual power costs totaled $119,574,000.

Impact: proteanTecs implementation brought a 10% power savings per GPU, equating to an $11,957,400 potential annual cost reduction for the data center.

FPS enhancement: If power reduction is channeled towards a utilization boost, the frames per second (FPS) stands to grow by 4 million, directly attributed to the GPU power reduction.

AI accelerators

Understanding the baseline: The modeled AI accelerator setup initially featured:

  • Power Consumption (kWh): 0.2
  • Utilization Rate: 60%
  • UE: 1.3
  • Total Power per AI Accelerator (kWh): 0.156

With 500,000 AI accelerators and electricity costs of $0.10 kW/hour, the annual power costs amounted to $68,328,000.

Impact: Integration of proteanTecs resulted in 12% power savings per AI accelerator, potentially translating to an $8,199,360 annual cost reduction in the data center.

Inference speed surge: In terms of inference, the inferences per second (Inf/s) can witness a remarkable 12.3 billion increase attributed to power reduction.

The case study underscores the transformative impact across data center electronics including CPUs, GPUs, and AI accelerators. Beyond the substantial cost savings, the technology unlocks increased computational capabilities, demonstrating a paradigm shift in data center efficiency.

From OPEX to CAPEX

Reducing energy consumption per device not only translates to immediate power/performance benefits, but it opens the door to a myriad of additional benefits, creating a ripple effect throughout the infrastructure. One notable advantage is the extension of the system’s lifetime. By optimizing power usage, the wear and tear on hardware components are mitigated, leading to increased longevity and a higher mean-time-to-failure (MTTF). This, in turn, contributes to lower maintenance costs and a reduction in capital expenditures (CAPEX) as the need for hardware replacements is deferred. In today’s data center landscape, optimizing CAPEX is a paramount strategy. Organizations aim to transition from the conventional 3-4 years hardware replacement cycles to more sustainable 5-6 years cycles, maximizing the useful life and reinforcing the long-term viability of their infrastructure.

Conclusion

Part of the proteanTecs power reduction solution, AVS Pro represents a significant advancement in power management technology, addressing the limitations of existing methods. With real-time monitoring of timing margins and application-specific workloads, it offers a comprehensive solution for optimizing power consumption while maintaining reliability. The benefits include reduced dynamic power, protection against timing failures, and the ability to adapt to changing conditions over the lifetime of the device. Integrating proteanTecs AVS Pro is a strategic move toward achieving superior power efficiency and performance in the face of evolving challenges in data center power consumption.

The journey toward power optimization in data centers represents a pivotal step in shaping a more sustainable and efficient future for the information technology landscape. Despite the challenges and limitations, the strides made exemplify the potential for significant improvements.

proteanTecs AVS Pro is proven and in use in ICs from leading chip makers, hyperscale cloud vendors and mobile brands. The solution is embedded in advanced process nodes down to 3nm.

[1] The model provides estimates based on assumptions. Actual results may vary depending on specific hardware configurations, workloads, usage patterns, and data center environments.



1 comments

Maury Wood says:

Interesting article thanks. The FPS metric you associate with GPUs seems odd, in that most 3D rendering is performed at the edge, in client GPUs I think.

Leave a Reply


(Note: This name will be displayed publicly)