Keeping Pace With Moore’s Law

Will place and route be able to keep pace with the increasing density of sub-20-nm design, 3D transistors and 3D die stacking?


By Ann Steffora Mutschler
As the number of transistors doubles with each move to a smaller manufacturing process technology, there are questions as to whether the current cadre of place and route tools will be able to keep in lock step.

Have no fear, assured Saleem Haider, senior director of marketing for physical design and DFM at Synopsys. “For the increase in densities that we get with every new node, as well as the increase in densities that is over the horizon with 3D stacking and so forth, overall there isn’t really a great big problem as far as place and route is concerned. Obviously there are things that we need to do in place and route to keep pace with the increasing density and with every new node. But overall from where we sit, there is not a great big problem. If anything, as far as this march to a new node every two years, the challenge is on the manufacturing side. There are far greater challenges on that side than place and route capacity.”

Increased design density is driving changes to the upstream design environment. One such change is that the number of gates being seen by place and route is doubling every 18 months, which have implications, explained Shankar Krishnamoorthy, chief scientist at Mentor Graphics.

An implication of that is on the core capacity or data model capacity of the place and route tools because, he said, “if you could represent, let’s say, 5 million gates in your data model in 65nm, you would have to represent 10 million gates at 40nm just for a comparable size. Then 20 million gates at 28nm and almost 40 million gates at 20nm. So you need to have a scalable data model, a scalable timing analysis environment because the design size is pretty much doubling every couple of years. More than that, the design size is really more than doubling because people are taking advantage of the increase in density to integrate even more. Today, some of the largest designs we are seeing within the Mentor place and route environment are easily approaching 100 million gates, which is something close to 25 million or 30 million instances.”

A very flexible, and scalable data model is needed along with algorithms that scale with this growth of data size. “What does not change even though the number of gates is increasing is that people still want to get their chips done in six months. As such, there is a need for scalable algorithms built on top of a scalable data model,” Krishnamoorthy explained.

To equip place and route tools for 20nm and below, the underlying models for electrical and physical data are changing a lot. “The electrical models are getting more complex because you have much more sophisticated interactions that have to be modeled in timing analysis. If we look at the physical side, there are a lot more DFM requirements coming into the place and route world and so the physical requirements are also getting complex. The algorithms for placement, the algorithms for routing, and the algorithms for physical synthesis all need to evolve to take these additional requirements into effect and still be able to produce legal layouts in about the same time, because the schedule is not changing. It’s more or less staying the same. It’s really multiple effects of complexity with the same runtime requirements so there is a big need for us to look at how to speed up the execution with multicore technologies with respect to parallel algorithms and so on. Really you are seeing a 4, 5 or 6x increase in complexity of the models, size of data and the algorithms have to produce results in about the same time. So we have to rely on parallelism and multicore to achieve that,” he continued.

Silicon density also impacts floorplanning as well as the size of the place and route block. “These three effects are playing with each other—ability for the data model and the algorithms to scale, the need for hierarchical floorplanning and the need for place and route block sizes to grow much larger,” Krishnamoorthy said.

Big iron needed
Not surprisingly, with the bigger data sizes and added complexity, extra compute resources are required to run additional processing of the place and route software.

“There is more processing going on,” Haider noted. “There are new things that we weren’t doing before. We are doing new timing calculations, new mask calculations, new rules that we weren’t doing before, so the processing need goes up. At 20nm you can expect that the designs themselves would be bigger. Otherwise, there’s no point. We’re doing more calculations and new ones that haven’t been done before. There is more on the chip—more functions—so the size is bigger. Therefore the compute resources required go up. But that’s exactly where ongoing R&D for capacity-oriented focus comes in. The burden is on us to go on doing things that would keep the need for additional resources somewhat in check. And the examples of this are obvious ones: We fine tune the algorithms, we put in more abstractions so that even though the size of the design and the complexity of the design is going up by 30% or 40% the runtime total turnaround time is not going up as much.”

Further, Krishnamoorthy said that at the algorithmic level basically everything has to be done in parallel to deal with this complexity. “You also need to think about how to have a methodology of implementing a design which does not expose all the complexity up front. For example, with double patterning, which can be notoriously complex, if you don’t abstract it to a certain extent—with some sort of prevention rules earlier in the flow so that you do correct-by-construction design earlier and then leave a lot of the complexities until later in the flow where you may have fewer errors to clean up—you could have an intractable problem. This calls for a combination of parallel algorithms and also being smart about how and where to introduce the complexity, because the way you construct the design can result in a lot more problems being manifested. How you do the prevention, how you lay the wires down initially, how do place the gates initially has a lot to do with how many problems you have to fix towards the end.”

For example, on the double patterning layers, if you try and keep the routing mostly in the preferred direction versus allowing a lot of same-layer jogging, you inherently create fewer double patterning violations. That’s not completely practical because your libraries are still as complex as they were at 28nm with all sorts of funky shapes, but if you have a very high penalty on jogging or on non-preferred direction routing on the double patterning layers, you tend to not create as many violations to start with and as a result you have fewer issues to fix as you go towards the end of the flow, he added.

Ready for finFETs
All of the preparation work between the foundries and the EDA companies is already afoot and it promises to bring significant gains in performance and in power to the flow, Krishnamoorthy noted. “In terms of impact on place and route, the biggest impact is really at the standard cell level. So being able to model the performance characteristics and power characteristics of these cells accurately, being able to satisfy some of the placement requirements of the cells in light of the fact that they’re going to be finFET transistors. There are definitely going to be some requirements at the placement level and also on the timing analysis level.

When it comes to 3D stacking of logic-on-logic, it is a different story for the moment.

Herb Reiter, president of EDA 2 ASIC Consulting, said memory suppliers already are using 3D memory stacks in various places because it’s very simple to stack memory. “You really don’t need any intelligence in regard to place and route for how to partition your structure. It’s a very regular structure. Every die level is identical so they just need to assign elevator shafts—basically the TSV chains—through their whole stack and they are done. No tools necessary at all. No place and route intelligence required at all.”

The next level of sophistication would involve putting a memory die on top of a logic die, which doesn’t require TSVs on the SoC or on the memory. This way, you get enormous bandwidth between the logic and the memory, and when you design a logic die you need to keep in mind what the best way of communicating with the memory. That could be a Wide I/O standard you are complying with, or whatever else is appropriate for this particular application. Again, that’s no new challenge for place and route tools, he explained.

“Now, looking at partitioning logic into different levels can turn out to be quite a challenge because you want, for example, to keep the clock on one layer and only put combinatorial logic on the upper layer so that you don’t have a clock spine on every die and burn a lot of heat by running clock spines more than necessary because you know the clock spine consumes a lot of energy in a logic chip. If you try to be smart and keep the sequential elements that need to have a clock on one layer and the combinatorial elements on the other layer, you save yourself from running a clock spine through the die and keep the power dissipation of the two die or three die combination fairly lower. To my knowledge, today’s place and route tools from Synopsys, Cadence, and Mentor are not really designed to do this,” Reiter noted.

However, to be fair, much with 3D is still in flux and 3D technologies are still emerging. There’s no doubt that when the tools are needed, the EDA providers will deliver an automated solution.

Something else to consider regarding the heroics that the EDA industry goes through to enable new technologies. “We think the EDA industry has spent about $2 billion so far on 20nm and below enablement,” said Steve Carlson, group director for silicon realization at Cadence. “We’ve spent thousands of man years … and there’s a real growing ROI issue for the EDA companies. If you look at market research that project design starts and you look at the number of 20nm and below designs in 2013, 2014 and 2015, it adds up to be about 764 projected design starts. Recalling that $2 billion investment, that means that there’s about a $2.5 million per project R&D investment by the EDA companies.”

That’s a lot of money by anyone’s standard.