Design complexities at advanced nodes are propelling new thinking on how interconnects should be addressed.
By Ann Steffora Mutschler
Chip interconnect protocol requirements are evolving as designs move to 20nm and below process geometries, and not always in predictable ways.
At least part of this is being driven by what an SoC is used for. The continued push to shrink features opens up real estate at each new process node. For the past decade, that real estate has been used to add more features onto a single die, but the emphasis on how to use that space appears to be shifting again.
“What people want to do right now is mostly run more applications,” said Drew Wingard, CTO at Sonics. “They want to make these consumer electronics devices more computing devices, so the convergence pendulum has swung back in the direction of compute and graphics. A technology that’s been labeled as being helpful to make it possible to program these parallel computers—these multi core machines—is cache coherence. Cache coherence definitely changes the protocols on chip. There’s a lot of extra signaling that is required.”
The basic concept behind cache is that data is stored closer to processor for faster access. Cache coherence allows copies of that data to be stored in multiple places. But to be coherent, it also has to be updated regularly at all places where it is stored, and that means the interconnects have to keep up with this whole process.
“So suddenly, instead of just talking to memory you’re talking to local memories, and those local memories are talking to other people’s local memories to try and make sure whenever you need something you’ve got the right version,” Wingard said. “That has a big impact on what happens at the interconnect fabric level on these chips.”
By far the most commonly discussed interconnect fabrics at this point are the ACE (AXI Coherency Extensions) protocols from ARM, which are seeing some substantial adoption. However, there seem to be a lot more people who have taken a wait-and-see attitude to try and find out how much performance benefit they get and how much simplification of the overall software they get.
A second impact on chip interconnect protocols with the move to smaller geometries is cost management—or at least cost containment. “Chips have gotten so expensive that everyone tries different ways of managing this cost,” he continued. “What everyone would like to do is get more sub-applications out of one design. You can do that couple of ways. You can overdesign a chip, but that’s really expensive. Another thing you can do is start to look at packaging technologies and you can build the core of this platform and personalize it by adding other chips around the outside. With 2.5D packaging you can do some pretty interesting stuff there. You have the computer part and then you’ve got different kinds of I/O parts which go around the outside. It leads you to wondering as you get to 3D integration in a more general sense, maybe you’ll be partitioning these systems across multiple die in the stack which could help from a power perspective. It’s certainly better than trying to go through bond wires on boards.”
Getting physical
From a physical perspective the protocols don’t change too much, because this is all digital RTL and it’s synthesized down to whatever the physical library is.
“The actual protocols, the language that the IP is speaking at the digital level (the 1s and 0s) doesn’t change at all, noted Kurt Shuler, vice president of marketing at Arteris. “As you go smaller and smaller the big question is, ‘If I use TSMC this.this.this [process] can I just shrink my chip? Can I use exactly the same RTL and netlist as my previous chip and shrink it?”
The answer isn’t always clear, because as the wires get closer and closer to each other there are all kinds of unexpected physical effects. At 10nm and beyond, those effects actually begin to merge with quantum effects.
“You’ll be able to take the same RTL through the front end, but all the back end stuff is different,” Shuler said. “The big question for a lot of people is, ‘Given how we currently do our front-end stuff, is that still going to work on the back end?’ From a physical standpoint, given that the wires don’t scale down at the same rate as the transistors, even though you’re getting more transistors in a certain amount of die area it doesn’t necessarily mean you can wire them up at that dimension, that way.”
Jeff Scott, principal SoC architect at design services provider Open-Silicon, says at least part of this is predictable. “For interconnect, going to 20nm its the same old story where we are seeing more and more integration, more IPs on one chip, more bus masters trying to initiate traffic through the interconnect and more resources competing for the memory. It’s still a lot about managing IP access to memory. That’s always the bottleneck. All the different IPs have to traverse the interconnect and access memory at some point typically.”
The goal is to provide some kind of management of the traffic from priority, latency and throughput standpoints, which are required of the various IPs. Some are latency-sensitive, he said. Others need a certain amount of throughput.
What lies ahead
What also has changed in the interconnect protocol space, at least from an architecture perspective, is that the nature of components being put together. They’re much more of a hybrid combination, observed Pranav Ashar, CTO at Real Intent. “You have to be able to connect up CPUs with graphics processors, with all sorts of interface blocks, and so on. These components have different characteristics and so the lower level details of the arbitration and the circuit-level information has to be abstracted out. Going forward, the circuit level details of the interconnect chassis are being abstracted out. It is sort of like the Internet, where you have layers on layers of protocols. At the lowest level you have the physical layer, but the software and all of the components that connect into the Internet don’t talk at that level. They talk at a much higher level and layers above that. A similar thing is happening in the hardware space so the components are being allowed to talk at the higher level of abstraction. This is becoming almost a requirement to be able to manage the complexity of the SoC’s that are being designed.”
Consider, for example, Intel’s entry into the SoC space. It was prefaced by the announcement of its IOSF (On-Chip System Fabric) chassis, which is a protocol that various components can use to connect up at a higher level of abstraction.
Ashar believes that connecting components through a chassis is going to become the only way to do an SoC in the future. “Today it’s happening somewhat, but a lot of stuff is being done on the fly. As a result you have a lot of different interfaces on a chip, and chip-to-chip verification is a big challenge. Maybe one way to mitigate that is going to be to standardize these interfaces and to mandate even—at the loss of some performance maybe—that all components plug into these standardized interfaces. Then the verification becomes a one-stop thing rather than having to go to into each different interface on the chip separately. Maybe some of those processes and design styles are going to have to be brought to bear to control the complexity. I don’t exactly know what 3D is going to bring, but it’s clearly going to bring a lot more interfaces on the chip and memory hierarchy is going to be more fragmented. As a result, issues like cache coherency can only get harder.”
All of this is still in the research stage, so it’s hard to say exactly what will happen. Open-Silicon’s Scott noted some interconnect topologies are still evolving. “We’re seeing more networks and less fabric or crossbar-type interconnect, and we are seeing some ring topologies emerging. That’s as much an attempt to manage the on-chip routing as it is the performance and the access to memory.”
Complexity continues to grow, and there will be more use of memory technologies that can reduce the contention for a single point of memory. Multiple memory types of devices are being considered so that the memory usage can be divided up across different physical memories in order to use the interconnect more efficiently. Which one ultimately wins is a matter of conjecture at this point, but change is a certainty.
Leave a Reply