LDEs emerged at 40nm and are having a larger impact at 28 and 20nm. They introduce variability to circuit design, as well as impacting device performance and characteristics.
By Ann Steffora Mutschler
Designing for today’s advanced semiconductor manufacturing process nodes brings area, speed, power and other benefits but also new performance challenges as a result of the pure physics of running current through tiny wires.
Layout-dependent effects (LDE), which emerged at 40nm and are having a larger impact at 28 and 20nm, introduce variability to circuit design, and significantly impact device performance as well as characteristics. As such, the effects must be accounted for during the earliest stages of design when the chip architect is crafting the architecture. Otherwise, design teams may find out pretty late in the game that their design may not be what they expected.
“As we get to higher levels of integration this idea that the chip architect can live in a world where they are thinking about chip architecture without thinking about what the layout is going to look like – that’s a model the gets tougher and tougher to sustain,” observed Drew Wingard, chief technology officer of Sonics. “As the total amount of things that are being integrated grows, the number of things that have to be connected together grows. The IP blocks and IP-based subsystems being integrated together, and the interconnection networks that put those things together, by their very nature can’t be flat anymore. There has to be a set of them so they are cascaded in some way, and you can do that with cascaded sets of crossbars or you can do that with routers. In doing so the system architect is assuming that there is a cluster of things that are attached to one level of this hierarchy and that they are probably close together on the die, versus something that is in another level of the hierarchy.”
The significance to chip architects comes as they look at the technology that they are pushing that design down into.
“The first question that always comes up is what technology node they are working at and whether the chip architect is going to go with a planar view of the chip, or are they going to be looking at something like 3D-IC where they are going to stack the chip up so they can get their analog in, they can get their digital and they can make sure that everything is going to communicate with each other,” noted Steve Lewis, product marketing director at Cadence Design Systems.
If they are going more of the planar route the reason that they will be interested in the layout-dependent effects will have to do with the proximity of those blocks that they are pulling together—where are they going to land, the outer boundaries of the chip itself and making sure that the things that are landing towards the outer boundaries are not going to be unduly impacted by LDE, such as well-proximity effects and stress effects, he said.
While there are approximately a half dozen LDEs, the ones most likely to impact the chip architect are the well proximity effect and the stress effect.
“You’re going to need to look at the current flow going through the individual blocks that may be close to the boundary edges, so if the current flow is somehow interrupted that means that the specification that the individual blocks designers have been adhering to may no longer be meeting the specification,” Lewis said. “Wherever those blocks land on the grand scheme of things, you need to make sure that the current flow is going to be as expected by the individual blocks designer. The chip architect needs to know that. He also needs to know whether or not there is going to be a stress impact, which can cause a hotspot. A stress impact has to do with when multiple blocks are together and, depending again on the node technology that you’re working on, you may start to run into these hotspots because the proximity of the transistors causes them to interact with each other. The overall current flow through them is not like you expected them to be and the balance sort of gets out of whack.”
Chad Spackman, design manager at Open-Silicon, cited a recent design his team completed that had a very large memory in it. A system architect without physical knowledge might just put that large memory down as something that could be accessed in one cycle, which is the standard protocol for a large memory. However, if it is large enough the memory itself won’t perform. It won’t have an access time that is workable in the system.
“If you have a 40nm geometry with this big memory and it runs at 750 MHz, you have to be able to address that RAM and get the data out of it in 1.3ns, which is not a lot of time, Spackman said. “So the memory has to be carved up in smaller memories, and finding that size is a somewhat iterative approach in order to come to a size that will allow a given access time.”
In addition to getting memories to respond in an access time that doesn’t chew up the entire clock cycle budget, the chip architect also must deal with the fact that data signals have to now emerge from that tiled memory, and there are bound to be long wires because of the sheer array size itself.
“The flight time in these deep submicron geometries is actually the dominant timing effect rather than what it used to be, which was gate delays,” he explained. “So you may have to register these data pads five or six times before you actually get logic that is paying attention to the signals, and you may have to do that with the address going in, too. So all of a sudden you have this thing that was on the whiteboard and could be accessed in one cycle might now taking 10 cycles, but you need data on every clock. That’s an area where, if the architect has physical knowledge, what that person will know is that we are going to have to pipeline this and we are going to have to interleave the jobs. We may not know how deep the pipelines are, but that’s enough information for a designer to go ahead and do the rest of the work and not wind up saying, ‘What the heck was [the chip architect] thinking?’”
Many Approaches
Typically, the chip architect asserts certain things about the physical design, about the layout, at the time at which they are choosing the architecture, and right now this is the way the flow works.
“But they’re not really in a position to define that, so what you find out is in these designs with the cascaded typologies you really end up in the case where once the floorplan is discovered during the layout phase you find out that your layout can be substantially harder with a cascaded design like a set of crossbars or a NoC (network on chip) than it would have been with something like a single crossbar or a bus or something like that,” said Sonics’ Wingard. “With a single crossbar on a relatively flat design or a bus, we know that everybody is going to talk to everybody else and so we know we’re going to have this mess. But at least it’s a predictable mess. The problem with the cascaded things is, if you imagine you’ve got this set of clusters connected together, then suddenly you start moving the end points of the clusters around. You end up with all these wires that end up crossing each other in a really unpredictable way.”
He suggested an approach that includes an extra form of parallelism within the network, called virtual channels, which allows the performance to be isolated along with the throughput-oriented and efficiency-oriented performance aspects of the network design. This allows the performance to be isolated from the physical topology because the arbitration circuits work on a per-virtual-channel basis so the traffic things that are contending really only are contending against other things in the same virtual channel.
This also allows chip architects, when they discover the floorplan, to later go back and reconnect the network-the physical topology so it matches the floorplan. This preserves the arbitration and throughput aspects of the design, which is very different from the conventional approach, Wingard said. “With the conventional approach, when you build a deeper network, the behavior of the arbitration circuits is very complicated because at each cascade the behavior changes as you work your way down. So if you move a block from one crossbar or router to another one, then your arbitration behaviors change completely and you end up with a very different system result in which you have to go back and revalidate the whole system performance. That basically means when you try to adapt your network to match the floorplan, you are kind of starting over from the architecture perspective.”
In the case of the Open-Silicon team, Spackman explained, “The way we attack that problem is very early on we will produce a document that we call the ‘ground rules document.’ Included in that document is the average gate delay of a simple NAND gate or a buffer and then there is a wire flight time, etc.…that is the cornerstone of what the architect needs. It used to be that you might do to or three or four designs in a particular geometry so you would get an innate feel for this for those delays. The ground rules document allows you to take a geometry and have that information as though it was your experience, rather than have you depend on past experience.”
The good news for design teams and chip architects is that new approaches and technologies exist and are being developed by Cadence, Sonics, Arteris and others to attack the problem from a number of angles and in a more automated fashion.
Leave a Reply