IBM introduces dozens of new features to drop energy consumption in PowerPC chips by up to 50%.
By Pallab Chattejee
Just how much power can you squeeze out of a processor without destroying performance?
Ask IBM. The company introduced a new methodology for power and energy management on its multicore processor chips. The new PowerPC chip, the Power 7, has eight main processor cores each with its own L2 and L3 cache and two central memory controllers. The architecture for the design is built around an energy and power management schema called EnergyScale.
The EnergyScale system is a data-dependent, policy-based system that interprets activities in the processor cores, the memory hierarchy and the main memory. It is made up of four distinct parts: Sense, Decide, Control, and Actuate. The sense function is performed using both digital-thermal sensors (DTS) and critical-path monitors (CPM). The DTS utilizes 44 on-chip sense points that are organized as five per chiplet, emergency self-protect thermal throttling, and on the main memory controllers. The CPM detects circuit timing margin to help guide the optimal frequency and voltage adjustments.
The decide block is an off-chip, dedicated-function microcontroller that gets its information on the status of the chip though an EnergyScale I2C Slave communication port. To assist in the performance of the EnergyScale microcontroller, the system minimizes the communications bandwidth by packing the sensor data to reduce the number of read operations, multicasting the responses to reduce the number or writes and creating an automated on-chip transaction table which allows the sensor data to be streamed out in a single I2C command.
The control block features per-core frequency control ranging from -50% to +10% of the nominal frequency, on-chip support for off-chip voltage control, memory power management, and a command rate interface control. The core frequency control, in order to minimize latency, has an automated fast frequency slew of more than 50MHz per microsecond. The voltage control is done through a serial voltage I2C command interface, and is fully automated based on the policies that are defined. The memory management includes power-down modes for the DIMMs and also reducing the data access rate as needed. As the Power7 chip is an symmetric multiprocessing (SMP) system, and has SMP based memory interfaces, the command-rate interface control was built with asynchronous control to be as adaptable as possible while addressing the needs of any core chiplet.
The Actuate function uses three different power-down modes beside the normal operating mode. These modes are per-core, and are based on both levels of power reduction and latency to return to full function. The modes are “Nap,” which targets about 5 microseconds of latency to return to operation, and is structured on turning off the clocks to the execution units; “Sleep,” which features 1 millisecond of turn-on latency and which has the clocks shut off while also purging the local caches; and “Heavy Sleep,” which has a 2 millisecond target recovery time. In this mode, all the cores are in “Sleep” mode, and the voltage is reduced to all the cores, caches and the states are loaded into low-voltage retention registers. The exit from heavy sleep includes an automated voltage ramp back to full operating voltage as the hardware is automatically initialized. These energy policies are in addition to the per-core frequency scaling, and the associated core voltage scaling that goes with the frequency adjustment.
In addition to the direct sense, the firmware of the off-chip microcontroller can estimate functions based on the data coming in to adjust energy for leakage, temperature, and power supply variation. The last portion of intelligence for the energy-control system is the CPM. The circuitry dynamically detects margin in circuit timing and eliminates the potentials for static conservative margin guard-banding in the active designs.
The net result is more than a 50% improvement in the power for the individual cores as a system package using the automated on-chip controls and the off-chip microcontroller firmware based signal loop (as shown in the following figure).
Leave a Reply