Understanding design topology and making decisions on clock/register gating has evolved to include physical data for more accuracy.
By Ann Steffora Mutschler
To best understand a design topology and make decisions on clock/register gating, vector sets are required for the RTL tools to understand how to gate clocks and registers. However, if certain constraints are set on all enabled signals in RTL they can be re-used for gating clocks and registers downstream where enablers are not available—even without needing a vector set.
Mike Gianfagna, vice president of corporate marketing at Atrenta, noted that clock/register gating are popular ways to reduce power and without vectors, but it’s not easy. “Clock gating requires the addition of circuitry, and more transistors means more power. You need to ensure the added overhead of those transistors “pays off” with a net (significant) power reduction but this isn’t always the case.”
Sometimes, adding gating will increase power, driven by the fact that the gating doesn’t turn off enough activity, which in turn is driven by use cases and thus vectors, he said. “It’s hard to get around that. One size doesn’t fit all. You need to understand the use cases for the design.”
Stepping up a level to the more general issue of topological optimization, Gianfagna explained power is a driver, but so is timing/performance and area. “If you can reduce the length of critical nets, you will get the chip to run faster. This is definitely a topological issue, and tools that can help that optimization at RTL are very useful, since the complexity of data is far less than if you’re dealing with a flattened netlist.”
Then, there’s area, which is driven by routing congestion. “There are certain RTL constructs that will simply give your place and route tool a headache. Very wide muxes is one example. If you can spot those issues at RTL and fix them, you will get a more compact, higher performance designs. Routing congestion is a very real issue. If it’s bad enough, there is no place and route tool that can solve it. You have to change the design, or you just will never close it,” he noted.
Christen Decoin, product marketing manager at Mentor Graphics, agreed. “At the RT level when you look at topology optimization, typically there is very little knowledge of what’s going to happen downstream: what is going to be the power, the timing.”
But he noted there are more and more commercial tools that enable estimation, power and timing optimization that can then change the topology of the RTL block to have more timing efficient or power robust software downstream.
New challenges with new technologies
With the move to 3D-IC integration comes new technology issues to work out, Decoin said. “Where it becomes critical, and where a lot of people talk about topology optimization, is for 3D-IC integration or SoCs. There is a lot of need at the SoC level, but with 3D IC integration the topology will be extremely important because for RTL topology optimization there will have to be a better solution from the thermal viewpoint, from the power viewpoint and the timing view. And what they say is you control 70% of the power issue and 70-plus percent of the thermal issue at the topology level of the RTL. The key is that at the RTL level they can have some estimation but they don’t have a clear feedback loop for design downstream yet.”
To do this, there is ongoing work in the industry. Mentor, for one, is looking at, “how, for instance with power, you have power estimation at the RTL level, you could go downstream and do some power analysis at the gate level and then at GDS, and you could feed that back—that will give you some knowledge at the RTL level for a given IP. When you re-use that IP in a different SoC or in a different block of the design, then you have some information.”
3D IC integration as well as the rise in the use of finFET transistors are changing power issues to be considered in an SoC. Given that finFETs are expected to be less leaky than CMOS transistors, dynamic power issues again will need to be focused on, observed Gal Hasson, senior director of marketing, RTL synthesis and test at Synopsys.
There are many techniques to do clock gating with the objective of reducing dynamic power. “Originally, it was fairly straightforward based on some circuit structures. If there was a register where the output went back through a MUX back into the data of the register, there might be an opportunity to gate the register. By doing that to make sure the register didn’t clock, the value stays, and there is power savings,” he explained.
Since then clock gating has evolved in many ways. Gating every flip flop can be expensive in terms of the number of clock gates, so where possible the goal is to gate multiple flip flops with one gate. “You can actually go up in the hierarchy and maybe you find out that you can gate a whole block with one gate,” Hasson said. “You can mix those in some cases, do a lower level at the register and also gate at the block level. One direction of evolution is looking at how to optimize the clock gates. How many do we want to insert, where do we want to put them, how many flops or blocks do we want to effect, how high in the clock tree do we want to put it?”
Another aspect of clock gating is related to vectors or activity. If added to the equation, it allows a view into the circuit structure and the flops that lend themselves to clock gating, as well as their activity, which helps determine if it makes sense to gate the flops, he said. “If you see flops with a lot of activity, your benefit from gating is going to be minimal and maybe there’s going to be an impact on timing because there’s a tradeoff, so you don’t want to gate those flops.”
There are also more advanced techniques including self-gating, where the new data coming into the flow is examined to see if it is different from the data that’s already left in the flop. If the data coming in and the value in the flop are the same, the flop doesn’t need to be clocked, it can just be gated there.
All of this has led to the relatively new approach of taking the physical information (the placement information) into consideration, which can help the designer reduce the length of the clock lines, improve the power on the clock tree, reduce the routing area many times and gives a more optimal result. “As the technology evolved over time to include more physical information in synthesis, it became clear that clock gating needed to be done in synthesis from the get go, with the physical information, with the placement information in mind so it will be better correlated,” Hasson added.
Finally, regarding the timing of enabled signals of the clock gates, when the synthesis looks at the actual placement of the registers and the clock gates, it can have a very good understanding of the enabled signals and the timing of the enabled signals. “If this is a critical signal, synthesis can work on it more and make sure it meets timing, so it has a very good visibility. This is a capability that’s coming out very soon,” he concluded.
Leave a Reply