The Great Test Blur

Why test may become the next bottleneck in design.


As chip design and manufacturing shift left and right, concerns over reliability are suddenly front and center. But figuring out what exactly what causes a chip to malfunction, or at least not meet specs for performance and power, is getting much more difficult.

There are several converging trends here, each of which plays an integral role in improving reliability. But how significant a role each trend plays varies from one design to the next, and sometimes from one wafer to the next. That makes it difficult to explain, let alone create solutions that can add consistency.

To begin with, the picture that is emerging is that test is no longer a discrete operation. It is a complicated process, and that process is being pushed both left into the design space and right into post-manufacturing.

This is evident at the far left of the flow, where design for test has been pushed into the architectural phase of chip design. That shift began at 28nm, but it really kicked into gear with the advent of finFETs and multi-patterning at 16/14nm. By 10/7nm, with greatly diminished power/performance/area benefits from scaling, chipmakers began playing around with architectures, adding more heterogeneous processing elements, memories, I/Os and advanced packaging. Put all these pieces together and becomes clear why a test strategy is a major concern early in the design cycle.

Built-in self-test is a key piece of that strategy, and in some chips BiST and memory constitute the majority of the on-die real-estate. BiST works something like a pilot’s checklist before an airplane can take off, although there is talk about extending BiST to include more than just simple operational checks. For example, it also can be used as part of a security strategy to make sure there are no irregularities, and it can be incorporated into on-chip monitoring systems to determine if there is any irregular activity underway in a chip. At that point it’s questionable where the line for test stops and other functions begin, and that becomes blurrier still when AI/machine learning are added into the mix.

Machine learning is being applied across all test data. It has become the glue of test, identifying patterns in different types of test rather than specific functions. And like all AI components, it can adapt to use models that are supposed to fit within a range of acceptable behavior. What’s interesting about ML in relation to test is that it can identify aberrations much faster than traditional approaches because it is looking at patterns of data rather than individual bits. Moreover, it can follow those patterns from the intended function all the way to degradations in performance and subtle changes in power.

Put all of these pieces together and test now spans from architecture to real-time testing post-silicon, and that testing is expanding from simple functional tests to secure boot and aberrant behavior. That can make chips significantly more reliable and more secure, but it also makes it much harder to create a cohesive test strategy without a massive dose of domain expertise and foresight about how chips will be used and under what conditions.

Designing once and expecting it to play across a host of devices is getting more difficult for a variety of reasons, but for the first time test is emerging as one of them.

Leave a Reply

(Note: This name will be displayed publicly)