Power Or Performance?

Most of today’s processors aren’t run according to the standard performance specs, which has an effect on power consumption.


By Pallab Chatterjee
Most microprocessors have shifted to new small geometry processes in order to be the most efficient at power and high performance. However there is always a trade-off between power, performance and area (PPA) for semiconductors, and this is especially relevant for processors. In the current design space, processors are created as general-purpose products, but they are generally put into user applications that need to be optimized for either power or performance.

The main CPU processors, such as Intel’s iX series, AMD’s Phenom II series, and Nvidia’s video GPU products are routinely not operated at their standard performance specifications. They are either over-clocked or operated at alternate cores voltages in the end-user applications to increase performance and data throughput. Because the processors are operated in a non-standard condition, the design requirements have to include acceptable limits for these additional modes of operation. The chip cores can either be run at a higher voltage—up to 50% more than the standard voltage. The main clock rate for the chips, the core master clock, may be as high as 50% faster than the nominal clock frequency. To support all of the other functions such as thermal management, I/O and memory interface, and the standard bus handshake, the chips have to have additional control logic to support operation at different performance specifications.

Nvidia’s GPUs support additional power supplies modes and connections as standard. The nominal core voltage is 1.2V, and can be increased up to 1.4V. This configuration alone does not maximize the performance. Additional adjustment of the over-clocking of key portions of the chip need to be performed both with and without the voltage adjustment. This over-clocking needed to be balanced for which portions of the chip get the performance increase so the design does not overrun the local memory or the bus interface and introduce wait states. When these performance changes are made, they are a static change thath affects the overall configuration of the graphics board and fan.

Parameters that can be adjusted include: the FSB, memory bus, AGP bus, PCI-E bus, GPU core clock, GPU memory bus, memory timing registers, and hardware-specific performance tuning registers. As these changes affect the dynamic power of the board, fan and cooling controls are included to help keep the design at the nominal die operating temperature. The higher-performance operation can increase the die temperature by as much as 20C if upgraded cooling is not applied. Due to the complexity of the performance enhancement, the voltage scaling and clock scaling are no longer done by just putting in a different regulator and a different crystal.

To control these changes and make sure that the chip still operates in a safe design area, Nvidia has produced a control software program nTune for end user to adjust these parameters.

General purpose CPU processors have a few more data dependencies than GPUs, but have the same customer performance issues. Since CPUs were introduced people have been pushing the performance aspect of the PPA tradeoff. Just like GPUs, you can adjust the core voltage, and also over-clock portions of the chip. Unlike the GPU, the setting are not static and do not produce the same results under all data conditions. For this reason, the higher performance processors now have automated algorithms for performance improvement based on the data set.

For the Intel processors this is part of the “Turbo Mode,” which does an automatic over-clocking for the duration of the processor operations that need the higher performance. The power envelope for the processor design, including the thermal management, has to take into account these dynamic over-clock modes in addition to traditional systematic over-clocking. Unlike most SOCs, processor designs and most multi-core embedded designs have data-dependent timing and performance characteristics as well as user adjustable applications ranges.