Rethinking Architectures Based On Power

Performance is less of a problem in new devices than power — at least for now.


The newest chips being developed for everything from the cloud to the edge of the network look nothing like designs of even a year or two ago. They are architected for speed, from the throughput of high-speed buses and external interconnects to the customized accelerators and arrays of redundant MACs. But many of these designs have barely scratched the surface for saving power, which will become much more of a concern as the edge takes shape and more computing is done closer to the source of data.

There are several different knobs to turn when it comes to power efficiency, and progress is mixed for each of them. At present, implementing low-power techniques impacts cost, yield, and time to revenue. But over the next year or two new approaches for dealing with power will become much more common, driving down costs as economies of scale begin to kick in and as tools are adapted to deal new approaches to saving power.

Three of these stand out so far, although others will almost certainly enter into designs as more computing is done both in and out of the cloud.

Proximity. One of the big shifts here involves the location of the processor in relation to memory. In-memory and near-memory computing aren’t necessarily more energy-efficient from a processing standpoint, but limiting data movement and separating good data from bad close to the source can reduce the amount of data that needs to be moved. That has a huge impact on both power and latency, and the potential for saving power — and for doing more within a given power budget — is unprecedented.

Precision. Ironically, one of the key variables to making these schemes work is as much about precision as distance. There’s more to power reduction than just reducing the distance that data needs to travel. It also involves a reduction in the amount of data, and that means both cleaning up that data, trashing whatever is non-essential, and structuring it so that it can be used locally and in the cloud. The problem is making sure nothing important has been discarded, and that requires more processing and localized intelligence, which in turn require more power. The more precise that equation, the more processing power is required. This will become one of the more challenging tradeoffs over the next couple years, but it’s one that can have a big impact on overall performance and power.

Software. In addition to defining the precision needed, software needs to be developed much more closely with hardware. This isn’t a new idea. Co-design has been talked about for decades, and big systems companies already are doing this with the shift to more customized processors. But as the industry begins shifting to more of a chiplet scheme, it’s not clear whether software will need to be developed at the chiplet level, or whether it needs to be customized at the system level. The former would allow individual chiplet makers to optimize software for a particular function and speed time to market, while the latter would take into account overall system behavior but take more time.

Both approaches have their applications and limitations. The challenge with optimizing software at the chiplet level is that all of these chiplets need to work well together. Black-box approaches can be effective in assembling systems more quickly, but they lack the overall ability to fine-tune the individual components of a system. This is why major foundries are looking at limiting the number of approved chiplets, at least initially, to ensure they can be fully characterized to work together on a pre-set platform.

Using any or all of these techniques will have a big impact on power and performance. In fact, the gains from node shrinks likely will pale in comparison to these architectural improvements, even at established nodes. And the ability to mix and match components will provide much faster time to market for a multiple process nodes, which in turn could make them far more affordable to a much larger number of chipmakers.

Winners And Losers At The Edge
No company owns this market yet — and won’t for a very long time.
Compute-In Memory Accelerators Up-End Network Design Tradeoffs
Compute paradigm shifting as more data needs to be processed more quickly.
Utilizing Computational Memory
How processing near memory could change the compute landscape.

Leave a Reply

(Note: This name will be displayed publicly)