Experts at the Table, part 1: What will it take to optimize a design at 10nm and 7nm? The problem gets harder with each new node.
Optimization used to be a simple timing against area tradeoff, but not anymore. As we go to each new node the tradeoffs become more complicated, involving additional aspects of the design that used to be dealt with in isolation.
Semiconductor Engineering sat down to discuss these issues with Krishna Balachandran, director of product management for low-power products at Cadence; Tobias Bjerregaard, CEO of Teklatech; Aveek Sarkar, vice president of product engineering and support at Ansys; and Sarvesh Bhardwaj, group architect for ICD Optimization at Mentor Graphics. What follows are excerpts from that conversation. Part two is here.
SE: What are the new optimization challenges for 10nm and 7nm?
Balachandran: There are challenges at different levels of the flow. At the start, 10nm requires a comprehensive look at power, timing, area—and also variation. You have to also be aware of thermal. You have to look at all of those vectors and be able to come up with the right architecture and micro-architecture that can be taken through the design flow. As you go down the flow, there has to be followed through. Synthesis has to be intelligent enough to understand the timing effects, the variation effects and be able to choose the proper gates so that you can meet the timing and the power. Place and route is a very critical step and power gets impacted in a big way plus there are special electromigration (EM) rules. The rules have exploded in the lower geometries, especially with finFETs, compared to the larger nodes. In those rules there is more to be adhered to, and one example is EM and what you can do about it. Signal integrity issues require even more from a place and route standpoint. DRC/LVS is impacted. At the start power estimation is important and the accuracy becomes even more important for 10nm and below.
Bjerrgaard: What is different is that what used to come for free by scaling is no longer free. Whenever you wanted better power or performance you just went to the next node, and you could do more or less the same as before and you got all of the benefits. But the benefits do not come for free anymore. A lot is being left on the table, particularly in terms of power. We are not using the technologies well enough. At 10nm the designs are truly becoming routability-constrained, which means the area is bigger than it should be, or needs to be, for that node. This means that the power goes up. Part of the whole power equation is that it needs to take a holistic approach. You cannot segment the design where power is an afterthought, which you could in the past. It is particularly true for 10nm and 7nm if you want to make economic sense of it. Otherwise you might as well stay at the older nodes and work more intelligently and creatively with those. 7 and 10 are pushing creativity in the industry, especially for EDA and at the design level.
Sarkar: What we see going into 10nm and 7nm, and finFET in general, is that the margin-based approach is not going to be sustainable anymore. If you look at rules, when people traditionally solve timing, you always said, ‘Let’s assume a uniform voltage drop across everything. Perhaps a 200mV drop for all cells operating along a path.’ With that approach, closing on timing is just going to be tremendously difficult. You will end up buffering a lot more than should be necessary. But you don’t know if it is going to be 200mV everywhere. You do that because you don’t have the coverage in the simulation, you don’t have the confidence that what you are simulating gives you all of the answers. People have defined boxes around the way they do design—when they do timing, when they do power, when they do routing. You define rules for yourself, and those rules are becoming over-constrained in these technology nodes. In 16nm you may have defined a 15% voltage drop margin, but for 7nm that will be horrendously difficult to meet. So people are asking do we really need to consider 15% or can we relax that by a couple of percent? If you do that, how does it impact all of the other variables in the design? Breaking the silos becomes important, and looking at it in a bigger coverage sense becomes important. Then you have to look at how you do it earlier in the flow. The more it is pushed toward the end, the more challenging it becomes. When we look at EM, we have followed rules—this is the number of mA per micron or similar. That is becoming difficult to follow. So people are now looking at statistical approaches. For the lifetime of the product, how long will it last if you follow this particular temperature profile? There are paradigm changes in the way that we look at doing design, and signoff has to evolve given that the headroom has diminished so much.
Bhardwaj: The rules have exploded, especially at 10nm and 7nm where we are seeing problems with designs driven by routability. Because of that, and also with triple patterning and color assignment, there is a margin that needs to be incorporated. The more margin that you incorporate in timing, the more power you leave on the table. Essentially what we need is a better modeling technique to reduce the pessimism. That could possibly be statistical analysis. We need to take into account the rules upstream in the flow so that you can place your design and optimize based on certain constraints. Then you have much tighter constraints further down. This makes it a lot easier to achieve convergence. So you have to spend more time modeling at the higher levels and take care of more of the issues there in order to make the convergence process simpler.
SE: FinFET gave the industry a bonus when it comes to leakage. Did that cause people to start getting more concerned about dynamic power reduction? And what will happen to leakage in the smaller geometries?
Balachandran: FinFET does help the leakage, and from what I hear at both 10nm and 7nm, leakage will not raise its head much. That is more of a 5nm problem. This means that dynamic power is still going to be the biggest concern. If you think about the wire lengths, there is a lot more wire on the chip. There is more logic and more memory. There is more integration, and therefore you will have more wire length and more to deal with in dynamic power. This is power consumption outside the cells. There is also switching power in the cells themselves, but resistivity has gone up a lot and this is causing a lot of problems in 10nm and 7nm. As a result, you need an engine that looks at the ROI equation for optimizing power. Optimizing power along with timing and area – you can’t first meet timing and then start working on power. That is too late, and from a margin standpoint you will leave too much on the table. When you are doing optimization you want the timing engine to be power-aware. For every transformation that is considered, you only want to accept those that are cost-effective from a power standpoint. At the end you will get a good power number but may not meet timing, so then you have to give up something. This is the reclaim step, and during this step, which is also power-aware, you can fix it. This is the way that tools are evolving.
Bjerrgaard: One of the main issues is the siloing. We are used to creating designs by attacking one challenge at a time. With the new technology nodes, we need to work more intelligently and the way to do this is to look at the problem holistically. Everything is connected — you cannot talk about power without timing. The traditional blanket margins are a killer, especially given that a lot of the timing is going into the routing. So routing is not just a question of area. It is also a question of timing, and that is driving the need for high strength buffers, which affects power. The cost of having blanket margins and the cost of siloing is going up. That is driving the need to avoid this kind of margining. When we get past these we see potential for new kinds of optimization. I don’t have to reduce the IR drop across the whole chip. If I do it right here, then I will get the power benefit because I get the timing benefit.
Sarkar: It is all of the kinds of optimizations that you talked about. Timing optimization is somewhat fine-grained. Where we start to see the big effect, which people have not traditionally done, is by looking at block-to-block interactions. If you have four different IPs, for example, each is created by a different design team. How does the overall chip interact and each IP interact with each other? And maybe the data flow is not happening to one block in the most optimal manner. So there can be so much wasted power even though the block does what you expect. How do you simulate, early on, at a high level of abstraction, taking into account real-life use cases. That is where we start to see the impact of creating interfaces into Emulation so that you take real-life cases where you can be driving seconds of data. This enables very fast use-case analysis so you can understand where you are wasting power. You find them using a big hammer and then fix those, and then you can drill down to the finer-grained levels of clock gating, etc. It has to become more systematic in looking at different levels of abstraction and letting overall power optimization drive things.
Bhardwaj: Previously we closed timing and then worked on power optimization as an afterthought. That cannot happen anymore because of the complexity of the design rules and routability. If you do not take the right decisions upstream in the flow, then when we come down to post-route you have locked yourself into a local minima where you don’t have a lot of opportunities available to you. So you have to consider power at the placement step, look at the wire lengths, and take data from emulation and make sure you are targeting the high-power cases in placement. Then during optimization, you can do remapping steps, which might combine cells to remove high switching activity nets or to consider IR drop. Then you come to post-route, where you are in a much better state compared to if you haven’t considered anything.
Part 2 Optimization Challenges For 10nm and 7nm
Heat is becoming a serious issue as thermal densities rise and this creates problems for industries such as automotive that require robust, long-lived components.
FinFET Scaling Reaches Thermal Limit
Advancing to the next process nodes will not produce the same performance improvements as in the past.
10nm Versus 7nm
The economics and benefits of moving to the next process node are not so obvious anymore.
7nm Lithography Choices
Four possible scenarios for patterning the next generation of chips.