Microarchitecture Design For Low Power

A look at first-in, first-out design tradeoffs.


As designs move to finFET process nodes, dynamic power reduction has become a requirement. Designers have to eliminate or minimize all sources of redundant switching activity in order to reduce dynamic power in the design.

In our last blog, we looked at dynamic power wastage due to redundant adders and multipliers and how to gate these operators to save power. We also mentioned a couple of methods for gating these operators and the tradeoffs between them. In this blog, we discuss a few things to keep in mind when designing FIFOs for maximum power efficiency.

FIFO, or first-in, first-out, is a common data structure in any hardware design. It is usually “n” stages long. An n-stage FIFO is used if the data that gets stored in the first clock cycle is to be used only after “n” clock cycles. An n-stage FIFO typically has a read pointer, write pointer, and storage for “n” stages. How can this data structure be optimized for power? Before we answer this question let us look at how a FIFO can be implemented.

A FIFO can be implemented in two ways:

  1. We can implement a FIFO with fixed read and write pointers. The data shifts between the “n” stages. This is the most common and simplest implementation of a FIFO. This is also known as a shift register. The following diagram describes the scheme for a 5-stage FIFO. In this structure the write pointer always writes the data to location 1 and the read pointer always reads from location 5. During every clock cycle, the data is moved from one location to the next. This is shown in Figure 1 (a).
  2. Another method for implementing a FIFO is to use variable read and write pointers. In this method the write pointer originally points to location 1, when the first value is written. Instead of shifting the value to location 2 during the next cycle, as with a shift register, the write pointer is advanced to location 2 and a new value is written. In this fashion, the write pointer advances all the way to location 5. The read pointer follows the write pointer after five clock cycles and the stored value is read from location 1 to 5 in subsequent cycles. This implementation is shown in Figure 1(b).

Screen Shot 2015-05-13 at 8.13.58 PM

Which method is better for power? Obviously the first one is easier to implement. However, when the data moves from one stage to the next, there is a lot of redundant switching activity. The power penalty is greater if the FIFO is very long. In the second method, there is a fixed penalty for switching the read pointers and write pointers. But the data switching is minimized to only what is needed. We can even minimize the redundant switching of read and write pointers by using Gray-coding to ensure one bit switching between consecutive addresses.

In summary, we have explained a couple of ways to implement a FIFO and what to look for when doing trade-off analysis between these implementations. There are additional microarchitecture techniques that require similar analysis to understand how they can be used for low power designs. Analyzing and using appropriate microarchitectures not only results in the lowest power, but has a low impact on performance and area. This avoids trading off performance or area for power at a later stage in the design.

Calypto will be offering free tutorials on “Low Power Microarchitecture for FinFET Designs” during the upcoming DAC. To register for one of these sessions, please follow this link. In this blog, we outlined the trade-offs associated with one of the microarchitecture techniques for low power. Are you aware of all the microarchitecture choices for your low power design?