Another Tool In The Bag

Multi-bit registers allow engineering teams to realize additional power reduction on top of widely used clock gating. Their use in advanced nodes is picking up steam.

popularity

Clocks can account for 25% to 40% of total dynamic power consumption in a complex chip, so when looking for areas to reduce power, the clock tree network is a good place to start.

Structurally, it is certainly possible to have single-bit flip flops with a clock that connects to every one of the flip flops, and the power in general is proportional to the number of buffers in the clock tree on the order of 7X, not including the register power.

“Instead of having each individual flip flop connected to a clock (because in a synchronous design the clock drives every single flip flop), why not have the same buffer from a clock connect to multiple flip flops?” asks Krishna Balachandran, product marketing director for low power at Cadence. “If you could do that, and then if the synthesis engine could intelligently swap out single flip flops and instead use multi-bit flip flops (also called multi-bit registers, which are a combination of flip flops) then what you are essentially doing is sharing the same clock enable out of one buffer and reducing the load on the clock tree. By doing that you are cutting down on the clock power so you save a lot of the clock power. We’ve seen savings of 5% to 10% on the overall power of a chip just by intelligent swapping in synthesis, followed by placement-aware synthesis of these flip flops — swapping out the single-bit flip flops and substituting with multi-bit flip flops/multi-bit registers. That alone does that.”

Technically speaking, multi-bit registers are sequential elements that are built using multiple flops that share the same clock, explained Sudhakar Jilla, group director of marketing for place & route at Mentor Graphics. A single instance of a multi-bit register can represent multiple bits that have the same clock condition. Multi-bit registers are usually synthesized during RTL and are handled as banked registers during floor planning/physical design.

Further, because multi-bit registers share the same clock, they help reduce the number of buffers/inverters compared to single bit registers. This directly translates to lower dynamic power in the clock tree. And because dynamic power is becoming problematic for finFETs at 16nm and below, this technology is essential for effectively controlling dynamic power. Multi-bit registers also result in smaller area and lower leakage power, he said.

Increased density is the primary reason why dynamic power has become a serious issue at 16/14nm and below. While finFETs have sharply reduced leakage current, there are still more transistors packed into a tight space. And high-performance cores (HPCs) are characterized by very aggressive power, performance and area requirements, so controlling clock power is critical in achieving those targets, according to Jilla. “Multi-bit register is one of the power reduction techniques, and it needs to be addressed throughout the flow starting from RTL synthesis, floor planning, placement, clock tree synthesis, routing and optimization in a multi-mode, multi-corner, multi-voltage context.”

Ashwini Mulgaonkar, director of product marketing for physical implementation at Synopsys, Inc., said the concept of multi-bit registers is not new, but their inclusion is on the rise at 20nm and 16nm more than at 28nm and 40nm. She noted there were leading-edge customers using it even at 130nm, but these were the very, very power-conscious users. “Typically, if you’re looking at both leakage and dynamic power, until 16nm and finFET, most of the concern was on leakage power, less so on dynamic power. But with the advent of finFET technology, the leakage concern has been somewhat mitigated and dynamic power has risen to the top again. With the clock network getting bigger and bigger, it is consuming a bigger share of the total power on the chip, and a bigger share of the total area on the chip.”

Design teams are thinking from a dynamic power or even a total power savings perspective that it’s the clock tree power that must be reduced. One very easy way to reduce the clock tree power is to go to multi-bit flops because that helps not just in terms of a multi-bit flops — an 8-bit flop versus a 8 1-bit flops, for example. The flop itself is going to have less area because it’s essentially a bunch of these single-bit cells sharing the clock line between these single-bit cells, she noted. “So, you save on area, you save on the pin cap and also, now that you have multi-bit flops, your leaf-level flops reduce in number your net capacitance goes down as well. And going back to the engineering school equation of power, any reduction in capacitance is going to help reduce dynamic power.”

The choice of single-bit or multi-bit flops is made when accessing the cell library. “And here you really need to have a properly characterized library with the multi-bit flops in it — both from a timing perspective, from a scan perspective or DFT perspective,” Mulgaonkar stressed. “You need a good group of multi-bit flops characterized in the library. From a RTL versus netlist perspective, you can infer registers, you can infer multi-bit flops for those as you’re going through the synthesis flow and Design Compiler will do that. At the netlist level, what we do is if the customer wants to use multi-bit flops, then they can let us know that they have multi-bit flops in the library and here is the kind of mapping they want to see. The customer can provide a mapping to declare their intentions, and then in both synthesis and place & route, we can go in and do physically aware merging of these single-bit flops into appropriate multi-bit flops.”

Samsung and MediaTek have recently discussed their experiences with this, publicly.

Balachandran agreed. “Regarding the physical awareness, you could swap out these things in synthesis, but if your place & route is not smart enough later on to recognize these multi-bit flip flops, then you would have a clock tree where some of these things would be placed at different parts of the physical layout and then you would get long wires again and have the same problem. So the downstream technology — the floor planning and the place & route — have to understand this so that it doesn’t cause routing congestion, it doesn’t cause long wires to appear later. The whole thing has to be thought through as a total solution, starting at synthesis and going all the way, and making sure the place & route system can handle it too. Then you get the effect of having a multi-bit register, which is smaller in area than the individual flip flops because you’ve combined those and there is some optimization at the library cell level and then you’ve got less wires and less wire length, which also contributes to better power. You minimize the congestion, you even improve the timing as a result of this for the overall design, in addition to the power savings.”

The tradeoffs
Anand Iyer, director of product marketing for Calypto, conceded that using multi-bit registers might make clock gating a bit more complex, but the benefits of combining multi-bit registers along with clock gating are allowing design teams to realize greater power savings.

“The real penalty is actually now you’re characterizing two bits at a time and you’re making a new cell, so the cell library will increase. We have seen cases where the cell library went from 2,000 cells to 5,000 cells because you need to provide all the flavors of the multi-bit. As a result, the tools are slower sometimes because they need to read this gigantic library and all the algorithms need to understand both single bit and multi bit. Thus, the flow is more challenging but this is the methodology that people normally go through in 28nm and below,” Iyer pointed out.

At the end of the day, as power becomes a bigger and bigger challenge, it’s hard to meet the power numbers for a particular design. As a result, “designers are adding more and more techniques and adding complexity in their power design to make it happen in order to meet those goals. They are looking at where to shave off that extra 5% or 10% in order to achieve their power numbers. Using multi-bit registers becomes one more trick in the bag that they can use to reduce the power,” Balachandran concluded.