SPONSOR BLOG

For AI Hardware, Power Optimization Starts With Software And Ends At Silicon

The new era of artificial intelligence hardware calls for efficient software to silicon power analysis and optimizations.

September 4th, 2020 - By: Solaiman Rahim

Artificial intelligence (AI) processing hardware has emerged as a critical piece of today’s tech innovation. AI hardware architecture is very symmetric with large arrays of up to thousands of processing elements (tiles), leading to billion+ gate designs and huge power consumption. For example, the Tesla auto-pilot software stack consumes 72W of power, while the neural network accelerator consumes 12W (Source: The Verge). A recent study from Stanford has shown that building and training a complex neural network can lead to up to 78,000 pounds of carbon emissions (the equivalent of flying 60 passengers from San Francisco to New York). Designing for efficient energy consumption for AI has become critical, not only to reduce the cost of running farms and improve battery life, but also for the preservation of our planet.

The challenge of optimizing AI power necessitates a comprehensive approach, which includes 1) analyzing software and hardware together with the goal to optimize both, 2) defining the best possible architecture and power management, 3) obtaining early total and glitch power at the RTL stage to identify the best micro-architectures, 4) making power a cost function during implementation, and 5) performing efficient power and signal integrity signoff.

1. System-level power analysis, or how to define the best architecture for AI hardware

System-level architecture is key to identifying the best architecture for maximum performance and lower power. Due to intense tile-to-tile traffic when the algorithm on the AI hardware is run and the huge amount of switching activity happening synchronously, it is critical to analyze the execution of the software application on the hardware model to define the best software and hardware architecture to spread the switching activity. Techniques include clock spreading, distributing memory access over time, developing better DVFS, improving power shutdown schemes, and optimizing power management strategies.

Example: Power vs. performance vs. energy trade-off analysis
Source: Synopsys

2. Power profiling of software and hardware using emulation

Another way to analyze the power of a tile in the context of the full chip and software is to use emulation. Emulation enables the user to do power analysis when the real workload (up to billions of cycles) is run on the chip and identify windows of interest for di/dt, peak power or average power analysis. Due to the large number of MAC operations per cycle, identifying these windows is critical for IR drop and peak power analysis. Emulation quickly obtains a power profile of the workload and provides feedback to the software and hardware engineers; for example, it can allow users to identify any power leaking during the tile-to-tile operation that can be turned off by changing the software to disable hierarchical clock gating, for example.

3. Early power analysis and optimization at RTL

Due to the symmetric and replicated architecture of AI hardware, it is very important to identify the best possible micro-architecture, clock gating, memory gating or data gating for the tile at the RTL stage. Reducing power for a highly replicated tile will lead to high-energy savings at chip level. This is enabled by physically aware RTL power analysis that can provide early but accurate power estimates (typically within 10% of signoff). RTL power analysis in turn enables fast what-if analysis to identify the best micro-architecture and provide guidance on how to improve clock gating efficiency and memory access rate. Additional data gating at this stage can lead to up to 25% power savings for an AI processing tile.

4. Glitch power – A significant concern for AI-style designs

Due to the huge number of operations performed when the AI algorithm is run on hardware, glitch power has become critical for power consumption. Glitch power can represent up to 40% of the total power. Typically, glitch power is computed very late in the flow when gate level simulation with timing delays is available. This is too late to perform changes to the micro-architecture, take glitch power into consideration as part of power costing during implementation, or perform specific ECOs to reduce glitch power.

Percentage of glitch power vs total power for different designs
Source: Synopsys

More novel approaches are available to anticipate glitch power accurately from RTL or 0 delay simulation. This approach enables estimating glitch power within 5 percent of signoff very early in the flow, driving better design decisions during RTL development and better power costing during implementation and ECO, and drastically reducing glitch power.

Early glitch estimator combinational power results within 5 percent of GLS
Source: Synopsys

5. Final chip-level power signoff

The last step is to signoff for power and IR drop. The main challenge is the size of the design and the number of cycles to analyze. This problem can be resolved by massively parallelizing the analysis workloads, while leveraging both on-premise and cloud resources that may be available. Chip-level signoff analysis can be further sped up by leveraging reuse of tile-based power analysis. For IR drop analysis, vectorless techniques can be used to generate vectors that achieve the maximum instantaneous peak power or maximum IR drop.

Conclusion

Powering modern and future AI hardware must start with understanding the software. A comprehensive design solution for AI power establishes an intrinsic connection with the micro-architecture early in the design process and provides the framework to follow through to design completion and final signoff, minimizing risk for late-stage surprises.

Solaiman Rahim

(all posts)
Solaiman Rahim is group director for R&D in Synopsys' Design Group.

Knowledge Centers
Entities, people and technologies explored

Shift Left Is The Tip Of The Iceberg

A transformative change is underway for semiconductor design and EDA. New languages, models, and abstractions will need to be created.

by Brian Bailey

Partitioning In The Chiplet Era

Understanding how chiplets interact under different workloads is critical to ensuring signal integrity and optimal performance in heterogeneous designs.

by Ann Mutschler

NAND Flash Targets 1,000 Layers

New techniques go beyond improved deposition and etching, but challenges stack up, too.

by Bryon Moyer

3.5D: The Great Compromise

Pros and cons of a middle-ground chiplet assembly that combines 2.5D and 3D-IC.

by Ed Sperling

AI’s Role In Chip Design Widens, Drawing In New Startups

Focus is on letting engineers do much more with the same or fewer resources — and less drudgery.

by Karen Heyman

What Comes After HBM For Chiplets

The standard for high-bandwidth memory limits design freedom at many levels, but that is required for interoperability. What freedoms can be taken from other functions to make chiplets possible?

by Brian Bailey

Memory Fundamentals For Engineers

eBook: Nearly everything you need to know about memory, including detailed explanations of the different types of memory; how and where these are used today; what's changing, which memories are successful and which ones might be in the future; and the limitations of each memory type.

by The SE Staff

New AI Processors Architectures Balance Speed With Efficiency

Hot Chips 24: Large language models ratchet up pressure for sustainable computing and heterogeneous integration; data management becomes key differentiator.

by Ed Sperling

For AI Hardware, Power Optimization Starts With Software And Ends At Silicon

1. System-level power analysis, or how to define the best architecture for AI hardware

2. Power profiling of software and hardware using emulation

3. Early power analysis and optimization at RTL

4. Glitch power – A significant concern for AI-style designs

5. Final chip-level power signoff

Conclusion

Solaiman Rahim

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers
Entities, people and technologies explored

Related Articles

Shift Left Is The Tip Of The Iceberg

Partitioning In The Chiplet Era

NAND Flash Targets 1,000 Layers

3.5D: The Great Compromise

AI’s Role In Chip Design Widens, Drawing In New Startups

What Comes After HBM For Chiplets

Memory Fundamentals For Engineers

New AI Processors Architectures Balance Speed With Efficiency

Sponsors

Recent Comments

About

Navigation

Connect With Us

For AI Hardware, Power Optimization Starts With Software And Ends At Silicon

1. System-level power analysis, or how to define the best architecture for AI hardware

2. Power profiling of software and hardware using emulation

3. Early power analysis and optimization at RTL

4. Glitch power – A significant concern for AI-style designs

5. Final chip-level power signoff

Conclusion

Solaiman Rahim

Leave a Reply Cancel reply

Technical Papers

Knowledge Centers Entities, people and technologies explored

Related Articles

Shift Left Is The Tip Of The Iceberg

Partitioning In The Chiplet Era

NAND Flash Targets 1,000 Layers

3.5D: The Great Compromise

AI’s Role In Chip Design Widens, Drawing In New Startups

What Comes After HBM For Chiplets

Memory Fundamentals For Engineers

New AI Processors Architectures Balance Speed With Efficiency

Sponsors

Newsletter Signup

Popular Tags

Recent Comments

About

Navigation

Connect With Us

Knowledge Centers
Entities, people and technologies explored