Next Challenge: Known Good Systems

Test is turning into a continuous process rather than a discrete step as chip architectures become more complex.


The leading edge of design is heading toward multi-die/multi-chiplet architectures, and an increasing number of mainstream designs likely will follow as processing moves closer to the edge.

This doesn’t mean every chipmaker will be designing leading-edge chips, of course. But more devices will have at least some leading-edge logic or will be connected over some advanced interconnect scheme to one or more of those leading-edge chips or chiplets. The challenge will be verifying and debugging all of these devices in the context of how they will be used, and then testing them repeatedly in the lab, in manufacturing, during and after packaging, and for as long as they are used in the field.

The common thread that is emerging across all of these areas is data consistency. To make a packaged system model work, that data needs to be accessible and usable across every step of the value chain, from design all the way through to final silicon and packaging. When problems are identified at any point in that flow, data needs to be fed to the appropriate source of those problems, whether that’s the design team or whoever is managing the supply chain for that device. And when failures occur in the field, they need to be looked at in the context of other devices in the same manufacturing lot to determine whether the root cause was an isolated occurrence, such as a stray alpha particle, or whether there is a flaw that affects a whole range of products.

This sounds straightforward enough, particularly with all of the big data tools and machine learning capabilities. But there are a number of complicating factors to contend with. First, failures open the door to liability in safety-critical applications, which includes everything from cars to robots. So failures are no longer the problem of just the OEM or systems company. They are a shared problem, and this presents a whole new challenge for the supply chain.

This is exacerbated by the fact that chips in these devices are expected to behave flawlessly for a decade or two. Moreover, in many cases they will use at least some leading-edge logic in a package, or they will be connected over some high-speed interface. That logic, which will be developed at the latest process nodes, will be required for AI/ML/DL in order to make safety-critical decisions.

Thus far, no one has ever used advanced-node chips in extreme temperatures and exposed to nearly continuous vibration. In fact, most devices simply shut down when they get too hot. That’s not an option in automotive or robotics. That puts the onus on design and test teams for more rigorous simulation, testing and constant monitoring to identify even the slightest aberrations in behavior. In effect, test will need to become perpetual, and so will connectivity.

Second, because many chipmakers working at advanced nodes have adopted multi-die architectures — it’s the only way they can get big improvements in performance and much lower power — they now have to test devices that are not always accessible after packaging. So instead of known-good die, they now have to think in terms of known-good systems. In the past, this was as simple as testing a board for PVT. But in an advanced package, everything is sealed and the package is integral to the successful functioning of the device, particularly for thermal dissipation and noise insulation. Understanding how those pieces work together over time, including how various components in a package age, is far more complex than understanding a single chip’s behavior. A 5nm chiplet in a package requires the same kind of testing and monitoring as a 5nm planar chip, but not everything will be accessible from the outside, and the cost of a failure will involve multiple chips rather than one.

Third, big systems companies have started developing their own chips, and they expect those chips to behave as expected throughout their expected lifetimes. They are now intricately involved in developing metrics for behavior, and most of them are experts at utilizing data and machine learning techniques. As a result, they will apply pressure on the entire supply chain to develop data that is consistent, useful and extremely accurate. And for those with outdated data models or languages, this could be a painful transition.

The industry is shifting from functional correctness to a more holistic approach, where the emphasis is on function in context over time. Data will be the proof point, and ultimately all data will need to be either in the proper format, or translated into formats that are usable across a wide range of process steps.

Leave a Reply

(Note: This name will be displayed publicly)