More Performance At The Edge

Scaling is about to take on a whole different look, and it not just from shrinking features.


Shrinking features has been a relatively inexpensive way to improve performance and, at least for the past few decades, to lower power. While device scaling will continue all the way to 3nm and maybe even further, it will happen at a slower pace. Alongside of that scaling, though, there are different approaches on tap to ratchet up performance even with chips developed at older nodes.

This is particularly important for edge devices, which will be called on to do pre-processing of an explosion of data. Performance improvements there will come from a combination of more precise design, less accurate processing for some applications, and better layout using a multitude of general-purpose and specialized processors. There also will be different packaging options available, which will help with physical layouts to shorten the distance between processors and both memory and I/O. And there will be improvements in memory to move data back and forth faster using less power.

The fundamental equation at the edge is less circuitry for signals to travel through, a reduction of bottlenecks for those signals at various junctions in a system, and much better interaction between software and hardware. Hardware-software co-design has been an on-again, off-again topic of discussion since mainframe days, when the real challenge was to get applications to work consistently without rebooting an entire machine. Intel and Microsoft improved on this with Windows on an x86 processor, particularly with the introduction of Windows NT, where applications could be written to an application programming interface and not crash the operating system. That was a major step forward, but it led to increasingly bloated applications, and the cheapest solution was a process shrink for processors and DRAM rather than focusing on a better way to write software.

That problem is only now starting to be addressed. A first step in that direction software-defined hardware. But even with better alignment between hardware design and the software that runs on it, the real performance killer is an endless series of security patches and feature updates.

What’s needed, particularly in the age of machine learning and AI, is hardware-software co-design, where the algorithms are much more transparent and flexible, and where the hardware can adapt to changes easily without massive amounts of margin or a fully programmable solution. Software is good for flexibility, but it’s slow compared to hardware, and it tends to grow over time as it amasses more patches. Hardware is much faster, but it’s fixed, and programmable hardware is not nearly efficient. There is room for improvement everywhere, and the best solutions will require collaboration on all sides.

In addition to all of these steps, three things have to happen.

First, margin needs to be reduced in designs. That requires more and better modeling and simulation of everything from process variation to circuit aging, and it requires much tighter integration of the various tools and processes required to develop chips. This can be done more effectively if everything isn’t integrated onto a single planar die, but the goals generally are the same, which is to shorten the distance that signals need to travel and to split up the functionality on different IP—possibly hardened IP. That will minimize contention for resources as well as the number of possible interactions, which in turn requires less circuitry for worst-case scenarios.

Second, security needs to be part of an overall system architecture so that patches don’t bog down functionality of the rest of the system. But all patches, no matter what they address, need to be a zero-sum gain for code, or at least that should be the goal. If OEMs can demand zero defects in hardware, they also should be demanding zero increases in the number of lines of software. Code pruning should be part of update processes. The more code that is added, the greater the chance for interactions that can slow or crash a system, and that in turn has a major effect on the functionality of the hardware, and how long a battery will last between charges.

Third, AI and machine-learning inferencing algorithms need to be written in a way that can be understood by both hardware and software engineers, who need to be able to adjust the weighting based upon probabilities for accurate results. Some of this is available to end users now, particularly with security applications, but it needs to be transparent to systems companies.

Unlike in the past, just increasing transistor density is not going to provide performance improvements of 30% or more gains every couple years. Those improvements increasingly are measured across a system, and that requires changes on every level to both prioritize which parts require more performance and what resources are required to achieve that, as well as the overall architecture for how those parts fit together and interact to handle an increasing volume and diversity of data.

Leave a Reply