Scaling, Packaging, And Partitioning

Why choreographing better yield is so difficult.


Prior to the finFET era, most chipmakers either focused on shrinking or packaging, but they rarely did both. Going forward, the two will be inseparable, and that will lead to big challenges with partitioning of data and processing.

The key driver here, of course, is that device scaling no longer provides appreciable benefits in power, performance and cost. Nevertheless, scaling does provide more area for additional transistors, which is why scaling is continuing. Area is critical for AI chips, which need all the processing capability they can get to handle more data from more sources.

That’s also one of the key drivers for advanced packaging. In some cases the chips themselves are too large. So rather than stitching together two chips, because one is larger than reticle size — a major challenge in itself — those chips can be packaged together. But in some cases, it’s also faster and less expensive to package chips together. So rather than cramming them on a single chip, distances to memory can be shortened by connecting two chips together using a high-speed interconnect.

All of this is required to process more data more quickly. But that’s only part of the challenge. There also are more types of data from more sources and more sensors. Optimizing the processing for some of those data types requires highly specific or customized compute elements. In some cases — particularly where various compute operations are either very similar or very different — that processing can be partitioned and parallized across multiple processors, which can be either homogeneous or heterogeneous.

Making sense of data
The big challenge comes at the end of the processing, because data that has been partitioned needs to be recombined into something useful and coherent. That can be relatively straightforward, as in the case of embarrassingly parallel applications, or it can be incredibly complex, which results in a lot of additional coding. And depending upon the outcome, it can add a significant amount of overhead in terms of power and performance.

The problem potentially becomes more convoluted if the computation is partitioned across chips because it can be subject to various types of interference, physical effects, and unexpected interactions that are difficult to predict during design, manufacturing and test. All of that, in turn, can impact latency, which can affect overall performance. Or put simply, the whole operation may run no faster than the slowest-performing element.

AI everywhere
Nor is this just confined to AI chips. There are AI chips, and there are applications of AI across large data sets. Both require close to real-time results. That is particularly important for data generated by sensors in semiconductor manufacturing equipment, because any delays add cost to the manufacturing process. Moreover, getting the partitioning wrong can affect the speed at which those sensors can make adjustments in the manufacturing flow, which has a direct impact on yield and scrap.

As more sensors and AI/ML are added into manufacturing, the same issues that affect semiconductor performance in large centers now apply to manufacturing equipment and processes. So the further that data has to travel for processing — and in the case of parallel processing, the more elements used in that processing — the more energy required to complete the various processing steps. And that means more complex architectures for chips in manufacturing equipment, and more partitioning of data.

Putting it all together
But partitioning becomes particularly complicated in these industrial types of applications. Some chips or systems will need to be prioritized so that multiple data inputs are processed at consistent speeds across multiple processing elements located on one or more die. This is choreography on a grand scale. It’s hard enough to get right in a simulator, but in a production environment there are all sorts of issues that can crop up that make this a very difficult problem to comprehend, let alone solve. And as more variables are thrown into the mix, from packaging to scaling, that becomes more challenging.

This gives new meaning to the idea of a finely tuned machine. In the future, that may be much more complicated to achieve, but the upside of getting this right are improved yield and fewer defects. And for anyone who thought process engineering was running out of steam due to a slowdown in scaling, it might be time to reassess.

Leave a Reply

(Note: This name will be displayed publicly)