RTL Design-for-Power In Mobile SoCs

Power is a concern for hardware and software, and needs to be addressed throughout a design.

popularity

If you are one of the more than 2 billion smartphone users today, it is hard to imagine life without one! Breaking new frontiers, wearable smart devices and the Internet of things are the latest buzz. Mobile system-on-chips (SoCs) continue to clock faster and pack more functionality, yet are required to consume lower power for battery life and thermal considerations.

Power consumption is a key mobile concern both from a hardware and software standpoint and is addressed during all aspects of the design. This discussion focuses on Register Transfer Language (RTL) best practices for low-power semiconductor design including early power budgeting, reduction, debug and regressions. It explains the Design-for-Power approach at RTL—a shift from the traditional gate-level approach to power.

Mobile momentum to power gap
Mobile devices and applications have revolutionized communication and entertainment, bringing information literally to our fingertips. Social networking site Facebook recently reported 751M of its 1.1B monthly active users in Q1 2013 used a mobile device. The statistics from a Groupe Speciale Mobile Association (GSMA) study underscore the mobile momentum (http://www.gsmamobileeconomy.com). It reports mobile data traffic in 2012 was higher than all previous years combined. The study estimates the mobile ecosystem contribution to the gross domestic product (GDP) from 2013-2017 to be more than $10 trillion dollars. Semiconductor spending also reflects the mobile momentum as shown in this chart from iSuppli. The wireless communication market net spending growth forecast for 2013 is 13.5%, up from $62.3 billion to $69.6 billion, and double that of the other market segments.

apache1

Figure 1. 2013 Semiconductor Net Spending Growth by Application Market Forecast.

The increasing feature convergence and performance is great, but how useful is your mobile device if the battery won’t last a whole day? Mobile processors are clocking at more than 3GHz and packing more cores to enable multi-tasking, instant access to data, and high-performance gaming, all of which demand more power. The difference between the power a device needs versus what the supply can deliver is termed the Power Gap. The burden to manage the Power Gap is crucial for mobile devices limited by battery life.

Apache2

Figure 2. Power Gap Illustration

RTL versus gate-level power
Semiconductor design teams address power in multiple ways throughout the design flow. It requires technology, tools and methodology. The earlier you address power in the design flow, the greater the benefit. Once the design is implemented the architecture is largely determined, leaving room for only incremental changes. Managing power at RTL has become increasingly popular. As a high-level hardware design abstraction, RTL provides the flexibility and performance designers need in making early high-impact power related decisions. Because RTL is cycle-accurate, RTL power offers predictable accuracy, empowering designers to make reliable early decisions. A closer look at how RTL compares to the traditional gate-level power methodology highlights the RTL value:

1. Objective: RTL enables power related design tradeoffs, whereas gate-level is best suited for power signoff. RTL and gate-level power analysis for the same design demonstrates how RTL enables functional visibility and debug (Figure 3). The adder as an example is identified as a power hotspot at RTL. In the RTL view, the designer can explore shutting off the adder by tracing upstream/downstream. In contrast, the same adder is lost in the hundreds of gates after synthesis.

apache3

Figure 3. RTL Versus Gate-level Power Analysis

2. Performance: RTL power runs an order of magnitude faster versus gates. For example, a mobile graphics processor team took 22 minutes to run RTL power analysis for a design block that took 20 hours to get to gate-level power numbers, dominated by synthesis and gate-level simulation overhead.
3. Activity: Activity has a first-order impact on power, and power consumption management must adequately represent this. RTL simulations are easy to bring up and can provide wide coverage across multiple modes of operation. Gate simulations on the contrary are harder to bring up and are often available too late—sometimes after tapeout.
4. Accuracy: RTL power is typically within 15% to 20% of post-layout power numbers. It is possible to model synthesis and physical effects adequately at RTL for consistent accuracy versus gates, without compromising the runtime benefit. For instance, clock network and wire capacitance both have a large impact on power with very little information available at RTL. However, calibrating and characterizing representative layouts to generate RTL models for these can significantly improve RTL power predictability.

RTL design-for-power methodology
RTL power runs fast, enables debug, provides coverage and is reliable—well suited for making design decisions and tradeoffs early in the design flow. It is no surprise that power-sensitive mobile design teams are first adopters of the RTL design-for-power methodology. Some of their best practices include:

A. Performing design tradeoffs
RTL power runs within a few minutes for a 1 million-instance design, enabling fast and effective evaluation of multiple architectures for power efficiency. Examples of RTL design trade-offs include evaluating between a parallel and serial architecture, what-if prototyping of low power techniques such as power gating, and clock and data gating strategies to eliminate redundant switching.

apache4
Figure 4. Design Trade-off Example.

In another example, a mobile RTL designer cut peak power consumption 70% by re-architecting the design. Using RTL simulation activity analysis revealed high design activity within a short duration that was then spread over a longer duration.

B. Profiling simulation vectors
As mentioned, simulation activity analysis at RTL is a useful tool to uncover power ‘bugs’ related to activity. While functional debug tools provide visibility into signal waveforms, RTL power tools augment that by computing cumulative activity across all signal nets. Visualizing the hierarchical instances switching as a whole can identify significant clock and data gating opportunities. Figure 5 shows residual redundant activity otherwise undetected in functional simulations focused on individual signal waveforms.

apache5
Figure 5. Residual Redundant Activity

Power is not limited to one magic number today. Average power, cycle average peak power, transient peak power, and sustained worst-case power are all important for different design considerations. RTL activity analysis can be applied to rapidly scan through millions of simulation cycles and zero in on the worst peak power or the worst cycle-to-cycle change in power, selecting a simulation time window appropriate for power grid design as an example. Figure 6 demonstrates activity analysis leading to an informed selection of the simulation window relevant to the power metric, not to mention improved turnaround time by not analyzing FSDBs any larger than needed.

apache6
Figure 6. Identifying the Right Simulation Window

C. Checking power versus budget
RTL analysis provides early feedback for the design’s power consumption versus target. The decisions made when a design is 4X over its allocated power budget versus 40% over will be radically different. The earlier this knowledge is available, the more significant the impact will be due to greater flexibility! Some commonly tracked RTL power metrics include the following:

  1. Power by design hierarchy
  2. Power by switching, internal and leakage
  3. Power by category including registers, latches, memories, combinational logic, I/Os, and clock network
  4. Power by clock domain
  5. Power by supply domain

Another popular metric is clock-gating efficiency (CGE). Designers have adopted two ways of looking at CGE: Static CGE computes percent of gated flops, while dynamic CGE computes percent of gated clock cycles. Static CGE is a structural metric predicting the number of flops that synthesis will clock gate. A low static CGE indicates that synthesis will miss many. It is also possible that synthesis inserts a clock gate, but still allows clocks to toggle even when data is stable as shown in Figure 7. Dynamic CGE is a vector-dependent metric computed by a cycle-by-cycle analysis of clock, data, and enables – a low dynamic CGE indicates clock gate enables are ineffective in shutting off clocks.

apache7
Figure 7. Synthesis Clock Toggling with Stable Data

Mobile teams extensively use RTL tools to identify such inefficient enables at the block and register level. RTL CGE reports compute enable efficiency per clock gate along with a measure of the power each clock gate controls downstream. One mobile design team was indeed able to reduce 60% of idle power by focusing on inefficient block-level clock gates.

D. Identifying and debugging power hotspots
Interactive power hotspot debug is a critical step in understanding power consumption distribution in the design. A designer spots wasted power, understands what is causing it, and then develops a solution to eliminate it. Typically, interactive debug locates block-level power issues and provides significant power savings.

However, RTL designers may not necessarily be familiar with power, so a visual debug tool that presents power in a manner that closely interacts with RTL goes a long way toward being effective. Even power-savvy engineers inheriting legacy code will find an effective visual debug tool a great enabler. A sortable display of power per hierarchy, power-annotated schematics cross-probed to lines of RTL code, the ability to traverse cones of logic to understand the data/activity flow, and quickly accessing relevant power data are all useful tools.

apache8
Figure 8. RTL Power Graphical User Interface

Most mobile teams start their power debug graphically. Power is examined both from an absolute and relative perspective across hierarchies and categories for idle and active modes. Idle mode power is especially important for mobile applications that are not in constant use.

E. Reducing power with RTL automatic techniques
Automated RTL power reduction is an important tool, especially for those new to power. RTL analysis can automatically identify and implement low-power RTL changes that complement synthesis. Adding and improving clock gate enables, ensuring memory accesses are not redundant, and shutting off large cones of logic when not needed are a few examples of combinational and sequential RTL techniques. In addition, an RTL tool must accurately and reliably predict power savings for each identified reduction. This analysis-driven approach enables designers to focus on large opportunities and ensures that power savings do not disappear during the implementation phase.

Figure 9 underscores the pitfalls of blind automation. It plots cumulative power savings for an application processor against RTL reductions. Of the approximately 300 RTL power reduction opportunities identified, the top five are sufficient to realize half of the identified savings. In addition, power savings saturate at around 200 changes with the remaining 100 being ineffective for power and potentially impacting other design parameters such as timing.

apache9
Figure 9. Cumulative Power Savings Versus RTL Reductions

F. Tracking power via regressions
Like functional regressions, power regressions are intended to guard the design against an undue increase in power. Power regressions have been widely adopted – while the RTL is under development and also when it is functionally complete. To realize the value of regressions, consider another example where a power bug slipped in among 100 other RTL changes across several weeks. Regular regressions can efficiently catch the power bug when introduced, otherwise it is lost in a multitude of changes. In another example, the easiest functional fix for an RTL designer under schedule pressure was to constantly enable a block-level clock gate; however power went up by 40%. Without power regressions, this late breaking design change would have gone unnoticed. This design team now runs block-level regressions daily and longer chip-level regressions weekly.

apache10
Figure 10. Power Regressions

An important requirement for regression framework is a power database that allows custom queries to search and compare data across design versions. Indeed, custom reports can go a long way in automating power reduction beyond what a tool has to offer.

“RTL design-for-power is indispensable for mobile SoCs.”
The quote is from a mobile RTL designer whose team employed RTL design-for-power best practices to cut idle power consumption by a third across multiple blocks, with active power down from 5 to 25%.

Power is a key design concern, especially for mobile devices. RTL enables early power budgeting and design decisions, interactive and automated reductions, and a regression framework leading to significant power savings. Fast runtimes enable analysis across a multitude of complex operating modes. So, if you are a smartphone user—and also happen to be a mobile SoC RTL designer who can actually make the battery last longer—make sure you have investigated using a RTL design-for-power methodology.

To learn more, check out this Educast: RTL Design-for-Power for Mobile SoCs: Best Practices. This Educast focuses on RTL best practices for low-power mobile semiconductor design including early and reliable power budgeting, reduction, debug, and regressions.