Lower Power Plus Better Performance

As new compute and chip architectures roll out, classic tradeoffs are changing.


The tradeoff between power and performance is becoming less about one versus the other, and more about a dual benefit, as new computing and chip architectures begin rolling out.

Neural networking, which is one of the hot buttons for any system that relies on lots of distributed sensors, is essential to get a true picture of what is happening around a car moving down the highway at 65 miles per hour or in an industrial setting where any hiccup in an assembly line or business process can result in very costly downtime. But as these systems are built, it’s also becoming obvious that at least some of the logic in these systems needs to reside much closer to the data being collected.

The big revelation in a big data world, and even with somewhat smaller amounts of data, is that it requires much less energy to pre-process data close to the source of where that data is created. Google started this with pre-fetch in search. Underlying all of this is the idea that it’s much faster, as well as more energy-efficient, to send smaller batches of relevant or “clean” data than large quantities of “dirty” data.

Data centers, which arguably are like giant neural network brains, have been wrestling with this problem for at least a couple of years. Rambus released a white paper last November talking about updating the von Neumann execution model by moving processing as close to the data (and the memory) as possible, rather than moving the data to the processing. There is way too much data to move efficiently, and it takes time and energy to physically move that data.

What’s changed here is this kind of computing architecture is not just about making tradeoffs between performance and power, which has been a fundamental decision in device scaling. There are now improvements to be made in every direction—even security. It’s easier to secure large amounts of data in place than in motion, and it’s easier to keep track of smaller amounts of data than large volumes. Think about trying to find something on a messy desk versus a clean one, or trying to locate lost keys somewhere in your office rather than across your entire commute to work.

On the chip side, the push into advanced packaging—2.1D (organic interposer), 2.5D (silicon interposer), monolithic 3D and fan-outs—offer the same kinds of benefits at the chip level. By improving throughput with larger pipes and many more of them, as well as moving memory closer to logic, there are huge benefits in performance and lower power. For one thing, data doesn’t have to move as fast to achieve big performance gains because there is less distance to cover, less resistance and capacitance involved in moving that data, and it requires less energy to send them back and forth.

Compare this to device scaling down to 5nm or 3nm, where the interconnects will require new materials just to be able to move electrons at a consistent rate, and the distances that data must travel between processors, co-processors, and memory will continue to grow. While it’s possible to make all of this work pretty efficiently with enough time and engineering, the tradeoffs involving power, performance, heat and NRE are becoming much more difficult to get right.
Most industry veterans still long for the days of classical scaling—pre-90nm—when each new shrink was fairly painless and there were automatic performance and power benefits. But there were still tradeoffs in terms of power and performance. With new compute models and architectures, there are still tradeoffs in terms of cost, complexity, and uncertainty, but at least for now there is far less of a performance/power tradeoff. There are gains to be made on all sides, and options for significantly more freedom and creativity along the way.