Toward System-Level Test

What’s working in test, what isn’t, and where the holes are.

popularity

The push toward more complex integration in chips, advanced packaging, and the use of those chips for new applications is turning the test world upside down.

Most people think of test as a single operation that is performed during manufacturing. In reality it is a portfolio of separate operations, and the number of tests required is growing as designs become more heterogeneous and as they are used in markets such as automotive and industrial markets where chips are expected to last 10 to 20 years. In fact, testing is being pushed much further forward into the design cycle so that test strategies can be defined early and built into the flow. Testing also is becoming an integral part of post-manufacturing analysis as a way of improving yield and reliability, not just in the chip, but across an entire system in which that chip and other chips are being used.

Under the “test” banner are structural, traffic and functional tests, as well as built-in self-test to constantly monitor components. The problem is that not all of the results are consistent, which is why there is a growing focus on testing at a system level.

“From a system point of view, the focus is on the traffic test,” said Zoe Conroy, test strategy lead at Cisco. “But just putting a sensor in the corner of a die doesn’t measure anything. You need to put it right in the middle of a hotspot. The challenge is understanding where that is because hotspots found during ATE are different than the hotspots found during a traffic test. You also need to understand how memory is used in functional mode because what we’ve found at 28nm is that the whole memory is not being tested.”


Fig. 1: Different test approaches. Source: Cisco/IEEE Electronic Design Process Symposium 2017

Problems are showing up across the test spectrum as existing technologies, flows and expertise are applied to new problems—or at least more complex problems.

“With deep learning and machine learning, Nvidia is selling a lot of boards and systems into the data center,” said Craig Nishizaki, senior director of test development at Nvidia. “So we now have responsibility for board test and system test. The good is this is all in-house. The bad is this is a big transition. We’re trying to adapt integrated test flows so they do not just optimize one area. It needs to flow from the chip to SMT to board test to system test.”

This is turning out to be harder than anyone initially thought, however.

“The challenge is there are a lot more test groups, and now that they’re talking to each other we’re finding that everyone has their own test procedures,” said Nishizaki. “What unites everyone is data and information. We’re trying to adopt common tools, but there is no good tool for full coverage of system-level test. With system-level test, it’s more difficult to know how we will test.”

There are economic considerations, as well. Chipmakers and OSATs traditionally have used a fixed percentage of their total operating budget for test. But test is getting more complex alongside chips. It’s taking longer, and it is requiring much more up-front planning.

“You can’t test everything at the same time,” said Derek Floyd, director of business development for power, analog and controller solutions at Advantest. “No one will pay for it. Tests need to be multi-domain, but that’s very different. ATE is predictably deterministic. It’s a nice clean environment. With system-level test, you’re adding in things like cross-talk, jitter, parametric effects, and you need to do code monitoring. But you don’t necessarily get the access you want inside of the chip, so you look at the limits and what is the most critical thing to isolate in a design.”


Fig. 2: A comparison of two test approaches. Source: Cisco/IEEE Electronic Design Process Symposium 2017

Defining system-level test
System-level test is the ability to test a chip, or multiple chips in a package, in the context of how it ultimately will be used. While the term isn’t new, the real-world application of this technology has been limited to a few large chipmakers.

That is beginning to change, and along with that the definition is beginning to evolve. Part of the reason is the growing role that semiconductors are playing in various safety-critical markets, such as automotive, industrial and medical. It’s also is partly due to the shift away from a single processsing element to multiple processor types within a device, including a number of accelerators such as FPGAs, eFPGAs, DSPs and microcontrollers. But even within a variety of mobile devices, the cloud, or in machine learning/AI, understanding the impact of real-world use cases on a chip’s performance—and such physical effects as thermal migration and its effect on electromigration and mean time to failure—are becoming critical metrics for success.

This requires much more up-front planning, however. Rather than waiting until a chip gets into manufacturing, the strategy for what gets tested, when it gets tested, and how it will be tested need to be well thought out at the beginning of the chip design process.

“Packaging, test and DFT are now the rock stars,” said Mike Gianfagna, vice president of marketing at eSilicon, explaining that until recently test and packaging were rather straightforward exercises. As a result, test and packaging discussions happened much later in the design flow. “DFT is now involved much earlier in the process. It’s a critical pacing item for designs.”

But building test for fan-outs and 2.5D chips into the design cycle is a challenge. “You have to convert design vectors into test factors,” said Calvin Cheung, vice president of business development and engineering at ASE. “The goal is to know ahead of time what the expected output is on the test side, so that when the silicon comes out all you really have to focus on is performance. In the past this was a ‘nice to have.’ It’s now becoming a requirement.”

That’s a good good, but at this point test remains fragmented at each step from design through manufacturing.

“In the flow from pre-silicon or pre-PCB to post-silicon or post-PCB, every step of the process is siloed,” said George Zafiropoulos, vice president of solutions marketing at National Instruments. “There are discontinuities at each step, too. At the early behavioral level, which is an abstract algorithmic model, there is no implementation detail. At the implementation stage, which is the detail stage, you’re relying on SPICE or SystemVerilog. With post-silicon, in the lab, you bring up the chip and assume it’s functional, and measure parametric performance on boards. Then it goes to a manufacturing production line where you test each one.”

Testbenches and analysis are developed around the design under test (DUT), but the DUT changes along the way from code to a netlist or RTL model and finally to a physical chip. “The problem is that at each stage, tests are only applicable to that stage,” said Zafiropoulos. “When you get to the last couple stages, you beat up the chip to make sure it works and do an exhaustive set of tests, and then at the last stage you do a minimal amount of testing but make sure you have adequate coverage. The goal there is to get it through the tester as fast as possible.”

Different approaches
This is one of the places where change is required. Speed is essential on the manufacturing and packaging side, and there are several distinct approaches emerging to limit the time it takes to do system-level test and thereby minimize the cost. One involves testing more things more quickly using existing equipment, which is where Advantest is heading.

“We’ve built redundancy into into ATE with core processors to add more features,” said Advantest’s Floyd. “So a test may be application-dependent, but you can have redundancy.”

A second approach extends that further by adding massive parallelism into the test equipment to be able to test more things simultaneously.

“We’re solving really tough test challenges in a different way,” said Anil Bhalla, senior manager at Astronics. “Most people try to take a commercial solution and apply it to other companies. We look at a customer’s problem first and then build the right building blocks to solve that economically. Then, if it looks like there is a bigger opportunity in the market, we take it commercial. With system-level test, we’re trying to find something that works for the industry in a way that isn’t currently available, which is our massively parallel approach.”


Fig. 3: Relationship between overall SoC transistor count on left axis, and missing coverage (<1%) transistor count on right axis. Source: Astronics

Speed of test in manufacturing is particularly important for complex SoCs at advanced nodes as well as in packaging, because there are multiple chips to test. From the outside, a system-in-package (SiP) looks the same to a tester. There are only so many external leads for connecting a tester, and that doesn’t change whether there is one chip or five. But there is more to test, and access to some of those components may be limited.

“Accessibility decreases because the connectivity is all on the inside,” said Robin Wei, director for global test and strategy at STATS ChipPAC. “Even with known good die, there are a number of variables. Every die in a package may be operating at the corner of the spec, so all of your performance budget gets eaten up. Or from an assembly point of view, you are dealing with vias, traces on the substrate and bumping, which are all grouped under connectivity. If there’s a problem, it’s hard to narrow where in the connectivity piece it failed. And as you approach the edge of the wafer, it’s more sensitive to drift. On top of that, the test area is in the center of the wafer and the die area is around the test pattern. It’s different there than across the rest of the wafer.”

Wei noted that equipment today is keeping up with the demand, but as new nodes are added and more companies turn to packaging, parallelism will be required.

A third approach uses big data techniques to improve coverage, regardless of which equipment is employed, by pinpointing where problems occur during and after manufacturing.

“Treating everything the same is wasteful,” said David Park, vice president of worldwide marketing at Optimal+. “So you need to test some populations of chips more and some less. If you have a fixed test budget and you can excuse some devices from that level of testing, you can apply those resources to more exhaustive testing where it is needed.”

To maximize the value of the big-data approach, electronics companies need to share data with semiconductor companies. That allows companies to trace back problems to the root cause, which may be as detailed as the day and time it was manufactured, when it was put on a tester, or the origin of a particular lot of chips. The goal here is to identify the aberrations in a data plot, and to find patterns that are not visible with individual tests.

“You can see this already where performance is different with a part from vendor A compared with a part from vendor B,” Park said. “And that’s with no shared data. With shared data, you can correlate everything and figure out only 10 tests are relevant to a PCB failure, so you can relax the others. In some cases, yield will increase for zero difference in cost. In others, yield may drop initially, but there are still more products to sell because you know what to look for.”

Cisco’s Conroy pointed to the value of traceability and data, as well. “The supplier has numbers and history, but they have no idea what’s happening with yield. In the future, suppliers might have to use machine learning for this. They don’t need to necessarily ship back parts if they have good enough data.”

Yield is a particularly critical subject when it comes to 2.5D, because losing one of these packages involves the cost of all of the chips involved. ASE’s Cheung said yield for chips that will be used in an advanced package needs to be at least 99.8%. “You need a methodology to screen out process defects at the edge or center of the wafer, which is the structural test,” he said. “When it goes to the OSATs, we need to know how to characterize it, how to do spin lock and corner lock. We need to be able to screen out process defects so that silicon will work in all corners.”

Conclusion
System-level test is happening today in a somewhat ad hoc way today. Companies are making use of the tools that are available to them and crafting methodologies that can help improve quality and reliability.

“We made a shift on high Vt because we couldn’t catch all of the problems,” said Nvidia’s Nishizaki. “Now if we have high Vt, we increase the level of characterization. Our characterization has increased almost four times in the past couple of years with the move to 16nm.”

System-level test is just beginning to gain attention in the market. How this ultimately looks or is defined remains to be seen, but test equipment makers and EDA vendors are just beginning to take a serious look at how this approach can be streamlined, improved, and where it can be applied. But almost everyone agrees that at advanced nodes, within new markets, and with advanced packaging, test is going to look very different than it did in the past.

Related Stories
Time For Massively Parallel Testing
Increasing demand for system-level testing brings changes.
Module Testing Adds New Challenges
Technology is shaping up as system-level functional test.
Chip Test Shifts Left
Semiconductor testing moves earlier in the process as quality and reliability become increasingly important.
2.5D Adds Test Challenges
Advanced packaging issues in testing interposers, TSVs.