3 Big Bottlenecks For Design

Power and performance share the same gotchas these days, and all of them involve memory.


Throughout the history of design for ICs, systems and software, bottlenecks emerge as one part of the design evolves more slowly than the next. It’s frequently due to the fact that difficult engineering issues haven’t been solved yet in one part of the design. Sometimes they can’t be solved in a reasonable amount of time or for a reasonable amount of money and something else has to take its place, which creates so-called inflection points.

For several process nodes, EDA tools were blamed for not keeping pace—complaints that seem to have miraculously vanished in the past couple years. The next target is lithography, which at 14/16nm, and especially at 10nm, will slow things down to a crawl. And at 10nm and beyond, design teams will begin to grapple with new materials as quantum effects begin to impact the movement of electrons.

But in addition to those process node-related developments, there are some problems that are node-independent. Because it’s impossible to turn up clock speeds without cooking a chip, and more cores don’t necessarily yield any improvements beyond a certain point, demands for better performance and lower power have fallen on memory. It’s around memory that researchers and engineers are seeking to break the next bottlenecks.

1. Memory Access. The complexity of SoCs even at 40nm is making access to memory more difficult. Think about a combined Los Angeles and Bangkok traffic jam, and then multiply it by an order of magnitude again. There are so many simultaneous operations in complex devices that gaining access to a single memory takes an eternity compared to the speed of computation.

The first solution to bring memory on chip, avoiding the slow and power hungry interfaces that have to cross a PCB. The parasitics associated with these tracks make signal integrity issues a nightmare. But there is limited space on chip, so only the most important memory can be fully integrated. DRAM is also a quite slow memory and so those memories tend to be converted to larger, but faster SRAM memories. But memory can still be a point of congestion in the system. The next step involved multiple memories scattered around an SoC, so that private access could be provided.

Even that isn’t enough these days. The latest tools, notably in software, involve thinking about access to memory in terms of events, such as what ARM has proposed with its mbed technology, basically grouping together memory access requests through a software scheduler.

This can have a significant impact on both performance and power. Rather than relying on the hardware alone to process requests, the software takes a more active role in bundling them. As long as they can be logically grouped and later parsed, the impact on both performance and power can be significant.

2. Interconnects. Getting signals to and from memory, to I/O and to processors used to be rather straightforward. That’s not the case anymore. And while the problem gets worse at each new node, it’s also present at every node to some degree, even at established nodes as companies seek to build more complexity into those designs.

Interconnects are getting smaller, wires are getting longer, and there are predictions that at 10nm they may become much more difficult to design effectively and begin to impact both performance and the amount of power required to drive signals. This has led to much activity around 2.5D and 3D ICs in recent months. Shorter wires, fatter pipes, not to mention less worry about developing analog at new process geometries, are generating new momentum around these packaging approaches, along with creating new issues involving who takes responsibility if known good die don’t work together as planned.

3. Cache. While cache coherency has received much attention in the multi-core world, caching strategies are less obvious.

Cache is a form of memory that is closer to the processor. It’s typically limited in size, though, to maximize performance and minimize power, which is why there are multiple levels of cache. And as wires stretch out longer and thinner at advanced nodes, there is always talk of adding yet another level of cache.

Cache typically acts like a bucket. When it’s full, it gets dumped and refilled. That may sound like the best way to keep data updated, but it’s not always the fastest and most efficient because having to wait until the cache empties can impact both performance and power. Rethinking how data gets cached is an enormously difficult task, though, which is why companies are only now beginning to address it.

Memory is a complex issue, and the various levels and types of memory, the best ways to access that memory, and the best approaches to improve performance and maximize energy efficiency are the subject of much debate at every level of system design. As designs begin to encounter the limits of physics, though, it’s also the place where the biggest gains—and losses—will be realized.