Doing more with less equates to bigger design challenges.
A long-standing approach of throwing everything into a chip increasingly is being replaced by a focus on what can be left out it.
This shift is happening at every level, from the initial design to implementation. After years of trying to fill every square nanometer of real estate on a piece of silicon with memory and logic, doubling the number of transistors from one process node to the next, it’s becoming harder technologically and financially to keep up that pace. Heat, power, complexity and rising costs per transistor after 28nm have made it less attractive to continue doing things the way they were done in the past.
“Complexity has increased substantially in terms of how we manage 16, 10 and 7nm designs and to be able to get them to work as expected,” said Ruggero Castagnetti, distinguished engineer at Broadcom. “Even if you don’t run at lower voltages due to the challenges of scaling voltage, the voltage budgets get squeezed. And while total power may stay flat, the power or current density is increasing. Associated with that, even though current density has gone down with finFETs, dynamic power is a problem. So people start talking about the need to start early or you need more intelligent approaches. Fundamentally, from an end user standpoint, we strive for a methodology where there is a predictable turnaround time—because that’s what our customer requires—and a good tradeoff between not compromising on quality and providing sufficient overdesign. And we need to be able to that in the amount of time it previously took to get a design out at 130nm.”
Overdesign, or margining, is a common way of building in reliability and compensating for late errors. But it’s also an inefficient way to design a chip. That inefficiency becomes more apparent at advanced nodes, where extra circuitry impacts performance and power.
“This starts very early on when the flop boundaries are laid out and the design is pipelined,” said Abhishek Ranjan, director of engineering at Mentor Graphics. “The problem is that you don’t know how it will end up in layout, and more pipelines are put in than end up being needed. Maybe you could have put more logic between two flop boundaries, but you thought the downstream tools would have trouble meeting timing so you split it more evenly across several flop boundaries. Inevitably you put in more flops, and this means a bigger clock network, so conditions on the clock side and trying to balance the clock tree starts becoming a problem. That’s on the microarchitecture side.”
These issues get propagated across a design, especially in light of the fact that different members of the design team have a local scope of the block they are working on. As a result, they don’t put it in the context of the full design.
“What we have seen happening very frequently is people tend to use block-level clock gating cells very aggressively,” Ranjan said. “At every block interface, they put a block-level, clock-gating cell regardless of the fact that the same would have been put in at a higher level. Really, they are duplicating the clock-gating cells that are there in the design. And the more levels of clock gating in the circuit, the more difficult it is for the clock tree to get balanced and meet timing. Once you have assembled the complete chip, it’s very difficult to know this over-designing has been done. That is where a lot of micro-level tools are required to help designers figure out redundancies.”
Even mainstream designs at 16/14nm are beginning to recognize the need to address margin. “In today’s leading-edge digital designs, engineering teams are still putting everything but the kitchen sink on the die, but they are very careful about when it is enabled and when it isn’t,” said Pete Hardee, director of product management for the formal and automated verification group at Cadence. “That is from the point of view of not just using a power intent file with power domains to make sure logic is powered off, but also to save dynamic power. Powering off the circuitry obviously will save while it’s not being used, so there’s very little dynamic power, but you’re also saving the leakage power when you power something off. But when circuitry is being used, and it’s powered on, then there are still a lot of techniques used to save dynamic power to minimize activity.”
Adding more circuitry typically results in longer wires and routing congestion. That causes an increase in heat due to the resistance of the wires, which in turn boosts electromigration, slows performance, and affects signal integrity and sometimes even timing. The associated power noise has become a significant issue for analog and microelectromechanical systems (MEMS), as well.
“Years ago, you could never have imagined what the constraints would be for scaling,” said Tobias Bjerregaard, CEO of Teklatech. “Now, there are 10 or 12 routing layers, and often there still isn’t enough wiring. The biggest concerns are dynamic IR drop and timing. If you have routability issues and the wires are too long, you get timing delays.”
So how big an improvement can be realized by reducing margin in a design? “It all depends on what your tradeoffs are,” Bjerregaard said. “When you’re working with margins, you just need it to be good enough here for this parameter. But if you want flexibility to move things around for performance or power, mobility to move things around—if you just improve one of the parameters you can gain something in the others. Power integrity is a pivot point. If you can improve that, you can improve others. And that doesn’t even begin to address yield and total power. If you reduce your dynamic voltage drop you can reduce your supply voltage and lower the power.”
The big shrink
Still, how to achieve those improvements gets more difficult at every new node.
Mary Ann White, director of product marketing at Synopsys, noted that as design complexity continues to rise, there are still several ways to reduce area and power. “Ever-shrinking process nodes help to reduce design area, while ‘consumer/mobile’ variations of these technology nodes also help to mitigate power. It goes without saying that reducing area also reduces overall power. For example, our internal regressions have shown that an area reduction of 20% can save that much in leakage power, as well.”
One of the primary, and continuous, initiatives of the EDA community has been to improve quality of results in all aspects of power, performance, and area targets. Many of the technological advances introduced over the past five years have easily achieved 20% to 30% area reduction. Some recent examples of area, and subsequently power, saving optimization technologies include multi-bit register support, re-mapping and logic redundancy removal with more sharing of XOR functions across the design, and logic restructuring, she noted.
Specifically, this can include connectivity-based logic de-composition or re-composition. White said an example of this is where large, area- and utilization-intensive muxes can be decomposed, i.e., split into a number of smaller of muxes in order to eliminate any redundant output connectivity, resulting in smaller area and less congestion. Alternatively, the optimization engine might recognize that an extraction of discrete gates might be more area-efficient and remapped as a full adder (see diagram).
Using tools differently
The focus on reducing margin for power reasons is showing up in other ways, as well. Tools are being used differently than in the past.
“It’s becoming exponentially more difficult to close the signoff loop,” said Arvind Shanmugvel, director of application engineering at Ansys. “One way to address this is to make sure you are not using the most pessimistic analysis. You have to start thinking about power integrity from the signal to the multi-physics level. That includes timing, electromigration and thermal. A chip-package-system solution has an impact on different types of physics. Every timing path has the impact of voltage in it, but for a different situation. So, we have to take hundreds of different reports.”
That’s just one of the pieces that needs to be included, too. “If the engineering team doesn’t want to rely on the synthesis tool, which knows when a register is not being used, they have to identify clock enable signals that can switch off the clock based on certain constraints or certain conditions when they know that that piece of circuitry is not going to be used. It requires functional understanding of what the design is doing, when, and — importantly — it needs verifying,” Hardee explained.
This is driving increasing usage of formal verification tools to be able to verify these optimizations, he said. “They’re looking for enable signals that say, ‘I’m going to enable a clock tree in this whole block of circuitry. I’m going to do this at the RTL design level, I want to check that I’ve got this right, and that there is no functional difference between the reference design before I made the change, and the optimized design.’ Logical equivalence checking tools struggle to do this because we’re actually talking about changing the clock cycle boundaries in the design so a sequential equivalence checking tool comes into play here.”
There are a number of other optimizations that people are doing to save power that can be verified with the same technique, including power domains, Hardee pointed out. “The sooner you start having power domains in the design in powering stuff down, and powering stuff up, then you’re introducing a whole set of different components just to manage that. Power switches, isolation cells, and retention registers are likely to be used, with the latter needed in a power domain of any kind of size where power needs to come back up relatively quickly. They retain the previous state of the design so that you can bring power on quickly, but in normal operation they cost you a little more. They cost you a little bit of area, and a little bit more dynamic power. Whereas in designs two or three years ago, whether or not retention registers were used was a sort of binary yes or no decision, and in any given power domain if you wanted to use retention registers, you’d make all of the registers retention. Now, designers are starting to optimize that, and trying to figure out, based on the thousand flip flops in a power domain, how many of them need to be retention registers in order to get the reset cycle completed more quickly when power comes back on. Again, that is something where a power-aware variance of sequential equivalence checking can help the designer to verify that so they still get the same functionality after making those optimizations.”
EDA companies have been warning that overdesign was a growing problem since 40nm. It is no longer something to be deal with at future nodes. Any chipmaker working at advanced nodes recognizes this is a problem that has to be dealt with.
The problem is there is no simple solution. “Much of this comes from expertise—the confidence in the decisions that you are taking at that level, that the downstream tools will not be adversely burdened if you don’t put in margin there,” Ranjan said.
And after that, just keep your fingers crossed that it works.
—Ed Sperling contributed to this report.
Choosing Power-Saving Techniques
There are so many options that the best ones aren’t always obvious.
FinFET Scaling Reaches Thermal Limit
Advancing to the next process nodes will not produce the same performance improvements as in the past.
Overcoming The Limits Of Scaling (Part 1)
Complex tradeoffs involving performance, power, cost, packaging, security and reliability come into focus as new markets open for semiconductors and shrinking features becomes increasingly expensive.