Automotive, AI Drive Big Changes In Test

DFT strategies are becoming intertwined with design strategies at the beginning of the design process.

popularity

Design for test is becoming enormously more challenging at advanced nodes and in increasingly heterogeneous designs, where there may be dozens of different processing elements and memories.

Historically, test was considered a necessary but rather mundane task. Much has changed over the past year or so. As systemic complexity rises, and as the role of ICs in safety-critical markets continues to grow, design for test (DFT) has evolved into an essential ingredient for ensuring reliability. It now must be considered at the very beginning of the design flow, and it needs to be inserted at various strategic places in the flow and within the chip itself to ensure chips and entire systems are operating properly.

This has turned DFT from what was almost an afterthought into a much more interesting challenge, with a variety of new options and innovations.

“It’s a cool area of semiconductor design,” said Rob Knoth, product management director at Cadence. “Historically, DFT always has been at the kiddie table, in a position of having to justify why it’s there. Originally people asked, ‘Why are you adding all this area and routing to my design?’ Now, it’s totally normal. No one talks about it anymore. It’s just assumed.”

There are several reasons for this. One is that most of the chips being developed at the leading edge are not billion-unit designs for mobile phones or PCs. Many of them are more customized, heterogeneous designs being manufactured in smaller batches at the most advanced nodes or in advanced packages. The result is that automatic test pattern generation (ATPG) is no longer sufficient to guarantee quality even in consumer applications. But chips also are being used in more safety-critical markets such as automotive, medical and industrial, where they are expected to last a decade or two, and test is a critical element in that equation.

The challenge is maintaining sufficient test coverage, while also adding in a variety of new tests. And this becomes more difficult as the level of test abstraction changes. So while more precision is needed, there is also a need to raise the level of abstraction to test more features and interactions.

“This need to have more precision goes in both directions,” said Ron Press, technology enablement director at Mentor, a Siemens Business. “So we go much finer and to a lower level of detail in what we can do. But for the user, they can’t spend all their time doing that so they have to go a little bit higher level of abstraction.”

This becomes particularly important in large, complex designs, because one type of test is no longer sufficient. Most complex SoCs and AI chips require a variety of testing procedures that range from built-in self-test to mixed-signal test, system-level test, as well as the typical testing done using automated test equipment that has become the workhorse of testing in the fab. The role each of those pieces need to to be mapped out very early in the design flow, and they need to be tightly integrated because they operate from a single code base. So if memory BiST is put in a design with test compressions, other types of test are aware of it.

“When I’m going to do my pattern generation, it knows about what the memory BiST did and how to configure it and what type of fault the accounting is there,” Press said. “This makes it so that we could solve bigger problems much easier.”

This kind of test integration is essential for automotive and mission-critical applications, which require visibility into a lower level of fidelity, he explained. “After production test, we want very few defective parts to be shipped or go to system test. Automotive-grade ATPG adds higher fidelity patterns to catch the little bit of things that might have escaped with traditional tests, going beyond cell-aware test.”


Fig. 1: Memory BiST in automotive design. Source: Mentor

More data
More tests also result in more data, though, and that test data needs to be moved quickly. This is is driving the need for increased bandwidth both on-chip and off-chip, including some innovative strategies to speed up the flow of data without adding dedicated circuitry.

“We’ve been hearing that it’s been getting harder to get enough bandwidth/test time,” said Steve Pateras, senior director of marketing for test automation at Synopsys. “You want to maintain a certain test time as designs continue to follow Moore’s Law, but it’s not only just the design side. It’s also automotive design that require lower DPPM (defective parts per million), you need different fault models, cell-aware, etc. The only way to keep the test time or cost down is to increase bandwidth onto the chip, but that’s getting more difficult because even though the chips are getting larger, the number of I/O pins is not growing proportionally—and certainly the number of digital pins is not growing.”

As a result, test teams are struggling to find enough test pins to bring in test data, he said, which led to a different way of thinking about the problem. “Why worry about all these dedicated test pins and trying to grow that number? Why not try to get data onto the chip some other way? The realization was that pretty much every chip today out there will have some form of high-speed functional interface like a USB or PCI Express, and these are very high bandwidth interfaces. They can go anywhere from 5 Gbps, all the way up to more than 100 Gbps—depending on the number of lanes.”


Fig. 2: Testing through high-speed functional interfaces. Source: Synopsys

Using data differently
DFT has been around since the earliest days of the chip industry, and the focus always has been on ensuring that engineers can get visibility into what’s going on inside the chip for post-manufacturing structural test, or diagnostic testing. The concept is still pretty much the same today—investing in “testability” early in the design flow by putting extra logic on-chip, and just as important, “connectors” that give the engineer access to that information. That approach can save design time, improve quality and increase profitability.

But there also are multiple ways to achieve that. “The principle is that traditional BiST and DFT has been focused on electrical and logical checking, but the problems now are at the system-wide level, giving rise to the challenge of systemic complexity,” said Rupert Baines, CEO of UltraSoC. “The main applications are debug, development, verification, and integration, but there are also applications in the testing area, as well. When you say ‘test’ to people, they tend to think of a pass/fail scenario, but as we start to gather ever more data and apply data science techniques, we can go further than that. We can get a detailed view of what’s causing particular failure modes—perhaps process variations across the wafer, or a block that’s working on the edge of spec.”

That data also can be used in a variety of other ways to improve reliability and time to market. “Data can also be fed back into the design process, gathering real-life data from the chip,” said Baines. “That can be used to instrument the device for the commonest failure modes and how to improve yield.”

AI test challenges
AI chips add their own special challenges, particularly inferencing chips on the edge. They need to be nimble and lightweight, but they also are very specific in their design.

“One critical consideration for companies designing for AI is not just the ability to be plug-and-play, but also getting to market,” said Mentor’s Press. “Automotive chips are like this, and a lot of the AI is going to automotive. A lot of companies are trying to compete here, and they want to get to market as fast as they can. They don’t want test to slow things down, either, but there isn’t one type of design. Some companies may have a couple dozen duplicate blocks, while others have thousands of duplicate blocks. They each approach it differently, but all of them need to have a simple plug-and-play methodology so they can finish their work at the early level as fast as they can by doing the DFT block patterns, have it plug and play, and then get to market as quickly as they can.”

Going forward, just as in other parts of the design flow, machine learning is expected to improve and augment test. These techniques already are being used to conduct diagnosis from the patterns that fail on the tester, Press said. “We use a machine-learning approach to determine systematic problems that are invisible to a human. When we have a lot of this big-data analysis, one of the changes we’ve seen is that leading companies now are taking everything that fails in production—the failed data, like scan pattern fails—and it goes up into a server and it’s doing this type of diagnosis. Another system does machine-learning analysis to identify systematic problems that should be addressed, and where are the most representative ICs that explain it in what amounts to a virtual failure analysis of thousands of ICs. Then we can get to something like a geometry that’s failing too often, and it could change some of their physical rules or maybe a certain type of via—something that tells the fab people how to improve the system and the human can’t do it. If you’re at a yield of 96% and you go to 97%, that’s worth a lot of money—and this is just running in the background.”

Cadence’s Knoth sees DFT advancements morphing dramatically from safety-critical applications like automotive into the AI/ML class of chips. “Some of them are safety-critical, some of them aren’t, but it’s this AI/ML architecture that is driving EDA and semiconductor funding for startups, and they’ve got two big challenges—one is power efficiency and the other is throughput. That’s the whole reason that they’re in silicon and that there are dedicated AI/ML processors, because if they could solve the power efficiency and throughput problem, there are massive gains to be made.”

That has a big effect on architectural choices, and ultimately on test strategies. “You’ve got tons of on-chip and distributed memory,” Knoth said. “They’re being pushed into the more advanced nodes as much as possible because you want to cram as many of these replicated processors as possible. There is massive multiprocessing, which results in multiple levels of hierarchy, along with massive, multiple instantiations.”

For example, in a stepped-and-repeated processor array, this results in huge data on the design side. The number of instances for ATPG faults are high. And while that main processor core might be stepped and repeated, all sorts of glue logic has to fit into irregular leftover shapes so the physical challenge gets worse. “The number of signals that are allowed to go from those blocks to that top-level logic is even more reduced, so you have to start thinking about hierarchical insertion of scan, hierarchical insertion of compression, what kind of signals are you broadcasting; a ton of concerns that falls back into the test camp to deal with,” Knoth said.

From a DFT standpoint, this requires a super high requirement for test quality of results (QoR) at that lowest level of hierarchy, he said. “If you give up any little bit of test QoR at that lowest level of processor, once you step and repeat it, it’s a death of a thousand cuts. While at the automotive level there’s a drive to put everything in RTL, when you start thinking about this kind of massive reuse, massive hierarchical design, a combination of RTL and gate-level insertion gives you your best balance there.”

Conclusion
As today’s design automotive and AI/ML challenges evolve and intensify, all design and verification practices and technologies must evolve with it—including test. Particularly in mission-critical applications, the ability to monitor and correct while in operation may drive even more creative and effective methods of making sure devices always operate properly.

This is a big shift in what was considered a sleepy part of the chip industry for many years. But for chips to evolve and play in more markets, test needs to evolve, as well. So now, rather than working in isolation, DFT engineers are facing many of the same challenges as other parts of the design team. And that may be the most important change to hit the test world in decades.

Related Stories
Why Analog Designs Fail
Analog circuitry stopped following Moore’s Law a long time ago, but that hasn’t always helped.
Design For Test Knowledge Center
DFT top articles, white papers, and blogs
Looking At Test Differently
How test strategies are changing to adapt to smaller batches of more complex designs and new packaging technologies.



Leave a Reply


(Note: This name will be displayed publicly)