Routing Congestion Returns

More third-party IP, more wires and more features are playing havoc with designs.


By Ed Sperling
Routing congestion has returned with a vengeance to SoC design, fueled by the advent of more third-party IP, more memory, a variety of new features, as well as the inability to scale wires at the same rate as transistors.

This is certainly not a foreign concept for IC design. The markets for place and route tools were driven largely by the need to automate this kind of operation and prevent problems such as electromigration, electromagnetic interference, crosstalk, and shorting-out circuitry. But by adding in more features, more wires, and IP that must be able to connect to a wide variety of devices—and densely packing all of it around memory—and then shrinking the feature sizes to 28nm or 20nm, suddenly the problems have returned.

“The primary cause is the huge number of IPs on the same die,” said Taher Madraswala, vice president of engineering for physical design at Open-Silicon. “Everyone is integrating IP rather than writing code. So we now have multiple interfaces, video codecs and ARM subsystems, along with homegrown embedded controllers. It’s the connection of the IP that is causing the congestion. We’re now doing a design with 37 million place-able instances. It’s like a big switchboard. It’s a massive cross connect.”

This is particularly true around the memories, which occupy a significant portion of the real estate on SoCs. The primary reason for this clustering approach is performance. Distance from memory is a critical component of that. The shorter the distance, the lower the latency.

“There’s an increase in the number of blocks of IP, more wires, and everything is talking to memory,” said Kurt Shuler, vice president of marketing at Arteris. “The area around the memory controller is very much in demand. That’s why we have three levels of cache on some chips, and there is talk about a fourth. So you need cache coherency to make it all work together, and that adds to the congestion. If N equals the number of things on a chip, the rule of thumb is that routing congestion increases faster than N. It’s combinatorial.”

What to do
What’s interesting about the most advanced process nodes is that they’re being treated separately for a variety of solutions, rather than as evolutionary. The list of things that has to be done at 28nm is different from the list of problems that needs to be addressed at 20nm, for example, and routing congestion is no different.

“At 28nm, the routing is already complex but there are tools to take advantage of it,” said Pravin Madhani, general manager for place and route at Mentor Graphics. “There are a couple reasons for this. One is that we have more pessimistic DRC rules. The second is lithography patterns, which create yield and DFM issues. The router has to worry about the lithography patterns, and more complexity means more complex lithography patterns.”

At 28nm, this requires concurrent steps for determining the via shape and the number of metal layers it connects, modeling signal integrity and electromigration, as well as the traffic patterns in and out of memory. At 20nm, it also requires a new set of rules, via bars and special rules for double patterning.

“The router needs to honor the colors (red and blue) and worry about following the rules,” said Madhani. “You need to enhance the database to follow the colors so the tools are able to deal with that.”

Trouble ahead
Until 65nm, most design teams never even considered electromigration. At 28nm it has become a huge problem because the gates remain fast but the wires are slow. Wires don’t shrink like transistors, and they add resistance in a way similar to water running from a fat pipe into a thin pipe, or highway traffic merging from five lanes to two. The solution is localized spreading of wires—which have to be kept at a minimum distance from each other—and that creates local hot spots.

The easy solution is keeping down the number of wires, but that’s becoming impossible. Even built-in self-test, which is seen as a breakthrough in testing IP and hard-to-reach embedded circuitry, adds wires. Open-Silicon’s Madraswala said on a large die BiST can add up to 30% more gates.

“It gets even more complicated than that,” said Madraswala. “In a lot of ASICs there are now power islands, but you can’t share BiST controllers across power islands. So that means even more wires. And if you go vertical with all of this stuff, you have blocks of wires, so you a new level of congestion.”

This is good news, of course, for EDA vendors. Mentor, Synopsys, and Cadence all have place-and-route tools, which are seeing new life in what five years ago was considered a relatively stagnant market. It’s also good news for network-on-chip vendors such as Arteris and Sonics, which have seen their businesses grow as congestion becomes more of a problem.

“There are a few ways to get around congestion right now,” said Arteris’ Shuler. “One way is to put in a big switch or crossbar. But you still need to stretch and squeeze those switches.”

Stacking effects and new options
As stacking of die becomes more common over the next couple of years, routing congestion literally will take on a new dimension. Traffic will flow across a die, but it will also flow above and below a die. This doesn’t necessarily simplify routing congestion, but it certainly makes it more complex.

“With a stack, you have to worry about timing and what is above and below and on adjacent layers,” said Mentor’s Madhani. “You need to ensure the router stops all violations. That means more rules and more complications. It raises a lot of challenges for EDA.”

So far, EDA vendors have resisted pressure from chipmakers to use diagonal routing, and polygons still are laid out in one direction. But with additional challenges such as dynamic IR, which requires DCaps—and more wires—even more IP and now stacking complexity, that may change. For the first time, place and route actually is becoming a competitive advantage.

“We won one design just on the basis of solving the routing congestion,” said Open-Silicon’s Madraswala. “It wasn’t cost or closing timing. It was routing.”