Advances In Power Management For Physical IP In 28nm And FinFET Process Nodes

Advanced nodes bring new challenges to IP design, but there’s much already available to reduce a design’s overall power consumption.

popularity

Engineering techniques to reduce power consumption by lowering the supply voltage and slowing the clock speed have reached practical limits of the semiconductor technologies. Newer solutions, which not only reduce power but also actively manage the power during the course of the SoC (system on chip) activity, are emerging. This article describes these innovations from the foundation intellectual property (IP) level, including logic libraries and embedded memories, to more complex physical IP such as high-speed memory interfaces (DDR) and USB.

Up to the most recent planar processes, leakage was the IP designer’s biggest concern. The dynamic power was going down progressively with each node. With the finFET process adoption, designers see a large (~50%) leakage reduction. So while designers still focus on leakage reduction (using higher VT devices and longer channel lengths), they now focus on reducing dynamic power as well. The best way to reduce dynamic power (think CV²F) is to reduce the operating voltage when the highest speeds are not necessary, which is generating a lot of interest in low-voltage operation.

Multicore processing is commonplace in mobile and enterprise applications, and combined with the use of graphics processing units (GPUs), SoC power budgets are significantly increasing. This means power optimization is often the most important constraint, making the design challenge getting the best performance possible within the available power budget. The 28nm and finFET technologies bring a new set of challenges for logic library and memory compiler IP design, as well as DDR and USB interface design.

Logic Libraries and Memory Compilers: Taking Advantage of Multiple VTs and Channel Lengths for Power Optimization
At 28nm and finFET technologies, multiple VT, multiple transistor gate lengths with the same gate pitch are available. This process feature enables multi-channel libraries without the area penalty of designing to the worst-case channel length to achieve footprint compatibility. These interchangeable libraries facilitate late-stage leakage recovery performed by automatic place-and-route tools and very fine granularity in power optimization. Additional VT cells (ultra-high VT, ultra-low VT) provide even more granularity, but with increased costs due to wafer add-ons.

Multi-bit Flip-Flops
Using multi-bit flip-flops is an effective method to reduce clock power consumption. Multi-bit flip-flops can significantly reduce the number of individual loads on the clock tree, reducing overall dynamic power used in the clock tree. Area and leakage power savings can also be achieved simply by sharing the clock inverters in the flip-flops with a single structure.

Advances in Power Management for Physical IP in 28-nm and FinFET Process Nodes_Fig1
Figure 1: Combining two single-bit flops into a dual flop with shared clocking flops is an effective method to reduce power consumption.

Multi-bit flip-flops provide a set of additional flops that have been optimized for power and area with a minor tradeoff in performance and placement flexibility. The flops share a common clock pin, which decreases the overall clock loading of the N flops in the multi-bit flop cell, reduces area with a corresponding reduction in leakage, and reduces dynamic power on the clock tree significantly (up to 50% for a dual flop, more for quad or octal).

Multi-bit flip-flops are typically used in blocks that are not in the critical path of the highest chip operating frequency. They range from small, bus-oriented registers of SoC configuration data that are only clocked at power up, to major datapaths that are clocked every cycle and with a number of variants in between. SoC designers use the replacement ratio, measured by how many of the standard flops in the design can be replaced by their multi-bit equivalents and the resulting PPA improvements, to determine their overall chip power and area savings. The single-bit flip-flops to be replaced with multi-bit flip-flops must have the same function (clock edge, set/reset, and scan configuration).

With the aforementioned techniques, a new High Performance Cores (HPC) design kit was created to optimize all processors – CPUs, GPUs and DSPs. This HPC Design Kit offers a single package of enhanced memories and standard cells that enables optimal implementation across power, area and speed. Using the HPC Design Kit can yield the results shown in Figure 2.

template testCS5
Figure 2: Optimizing GPUs, CPUs and DSPs is critical to managing power. The DesignWare HPC Design Kit offers power, area and speed improvements for leading processors.

Unlike logic libraries, the memory’s lowest operating voltage is limited by bitcell performance. The application of assist circuitry techniques to enable bitcells to function well at low voltages is now being implemented. At 10nm finFET nodes, near retention voltage operation is an area that’s actively being researched.

DDR Roadmap Reflects Power Requirements for Enterprise, Mobile & Consumer Applications
DDR interfaces are commonplace in SoCs targeting enterprise, mobile and consumer applications. Enterprise applications favor the “PC” style of SDRAM, which is DDR3 and DDR4. Mobile applications favor the “LP” or “Low Power” style of SDRAM, which is LPDDR2 or LPDDR3 today with a roadmap to LPDDR4 in the near future. Consumer applications may use either type of SDRAM but they tend to favor the cheapest, which today is the PC SDRAM.

Enterprise applications typically require a large memory subsystem consisting of wide channel interfaces supporting a large number of ranks distributed across multiple DIMMs. These large memory subsystems require terminated interfaces [implemented via on-die termination (ODT) at each receiver] to obtain sufficient signal integrity of the interface. The DDR PHY on the host SoC needs to be very efficient with the enabling/disabling of the ODT not only in the PHY but also in the SDRAM. Disabling termination for all periods except when actually in use is critical to conserve power. The DDR PHY can go one step further and disable other circuitry when the PHY is not reading and/or writing. DDR4 also offers a data bus inversion feature which can save up to 25% of the interface power but unfortunately, this feature is not available with the 4-bit wide version of the DDR4 SDRAM that is favored in Enterprise applications.

Mobile applications typically require a single large capacity Low Power DDR SDRAM, often connected to the host SoC (e.g., an applications processor) using Package-on-Package (PoP) technology. These systems are most often point-to-point with very short transmission lines. At lower data rates, a well-designed PHY can forego termination to save considerable power. The DDR PHY in the SoC should be capable of training for two operating frequencies, one for full speed operation (e.g., 3D games) and one for operation where only low-performance applications are running (e.g., music playback). That allows the drain on the battery to be proportional to the tasks at hand and the battery charge will last longer when the device is not in use for high performance applications. LPDDR4 also offers data bus inversion, which can be employed for significant power savings.

Consumer applications can also benefit from lower power in the DDR interface. While the DRAM array is much smaller (e.g., 2-4 SDRAMs) than it is in an enterprise application and the devices are mains powered, lower power allows less expensive chip packaging to be used and can lead to less localized heating of the PCB. With DDR4, consumer applications typically use 8-bit or 16-bit wide SDRAMs, which brings data bus inversion and its power savings back into play.

Another complication that all applications encounter is the I/O transistor available in the logic process used for the SoC. For example, a 28nm process may use a 0.9V core supply but the I/O interface must use 1.8V transistors to support the DDR interfaces that operate between 1.5V (DDR3) and 1.1V (LPDDR4). The DDR PHY I/O must use receivers and drivers built with the 1.8V devices that are under-driven with the supply of the DDR protocol being used. For LPDDR4, that means a 1.1V nominal interface running up to 3200Mbps and using transistors designed to operate at 1.8V. The I/O must be carefully designed to operate at such high speeds without duty cycle distortion. The easiest method to use is to run the I/O receivers off of a higher voltage (e.g., 1.8V) than the interface requires to create operating headroom. This of course introduces more power into the interface. More complex receiver designs can circumvent this requirement to keep the power in check.

The final method to save DDR interface power sounds too simple but is often overlooked. Use a memory controller capable of scheduling the DRAM commands in the most efficient manner possible. No DDR interface achieves its maximum theoretical bandwidth (data rate * channel width) as the scheduling of various commands from various clients on the SoC to DRAM banks results in command-to-command blockages that eat away at the interface bandwidth. Re-ordering the commands to minimize these conflicts allows higher effective bandwidth, which may allow a lower operating frequency of the interface to save power. Benchmarking DDR controllers against typical traffic patterns can help to find the optimum solution.

Save Power by Unplugging the USB
For many portable products, power consumption when USB is not used is more important than USB’s active power. Even if USB is ubiquitous, USB is used sparingly. As an example: sync’ing a mobile phone takes a few minutes per day. Even 15 minutes sync time is just 1% of phone power-on-time. Power-optimized USB PHYs allow the power supplies to be collapsed or switched off when the PHY is not used to ensure there are no leakage current paths between the different PHY power domains, USB controller and SoC core.

Adding power switches to the PHY completely eliminates any leakage power. Power switches must be designed for each power supply to ensure the PHY is fully operational over PVT. Integrated power switches make it easier to integrate the PHY in an SoC and minimize silicon area.

DesignWare USB femtoPHYs and picoPHYs use innovative low power circuit designs to minimize active power consumption. The USB picoPHY and femtoPHY product lines use multiple voltage device domains to minimize total power consumption while meeting the USB electrical specifications such as 3.3V Full Speed and Low Speed operation and protecting the PHY during a 5V short event over the D+ and D- signal lines. For example, the USB 2.0 femtoPHY uses slightly more than 18mA current from the analog power supplies when transmitting High Speed USB signals with maximum transition density. The USB specification defines High Speed USB signaling as 17.8mA current into a 45Ω load.

The lowest possible USB “suspend” power consumption is critical for many USB products like wireless LTE modems used in USB dongles, PCIe Minicards and M.2 cards. When suspended, the modem can be in clockless sleep and the PHY can be in a low power mode. Even lower PHY power can be realized by implementing PHY state retention mode with reduced core voltage. In suspend, the Synopsys USB controller can be hibernated in suspend. When exiting suspend, USB controller state is restored from always-on memory. This feature, combined with Synopsys’ low-power PHY modes, provides very low USB system power consumption in suspend. Implementing system software control and side-band signaling allows the PHY and controller to be completely powered off with most of the modem when in suspend. This mode of operation is often described as PowerLess Sleep.

Summary
This article described some of the new power management techniques that require innovative design at the physical IP level. While 28-nm and finFET technologies bring a new set of challenges for IP design, SoC designers can take advantage of IP vendors’ innovations in logic libraries, memory compilers and interface IP to reduce designs’ overall power consumption.



Leave a Reply


(Note: This name will be displayed publicly)