Software-Driven and System-Level Tests Drive Chip Quality

A new system-level test for SoCs is gaining traction because it catches problems not detected at wafer probe and package test.

popularity

Traditional semiconductor testing typically involves tests executed by automatic test equipment (ATE). But engineers are beginning to favor an additional late-test pass that tests systems-on-chip (SoCs) in a system context in order to catch design issues prior to end-product assembly.

“System-level test (SLT) gives a high-volume environment where you can test the hardware and software together and find design faults that you probably would not have found in lower-volume engineering testing,” said Peter Reichert, SLT system architect at Teradyne.

While the design-for-test (DFT) infrastructure on an SoC is used for delivering test vectors in a way that keeps the test cycle as short as possible, focus is moving to tests that are executed by software on the CPU. This aligns well with a new system-level test insertion following final test.

Testing of ICs always involves a delicate balance. On the one hand, the quality of outgoing product needs to be high in order to minimize returns and maximize customer satisfaction. On the other hand, the cost of testing must be minimized wherever possible.

That often can mean some strategic shuffling of where certain tests are performed. Traditionally, there have been three insertion points — before wafers leave the fab at wafer-acceptance test (WAT); at wafer sort, prior to dicing up the wafer; and at final test, after a device is packaged.

Tests that have a high failure rate should be done as early as possible — near wafer sort — to ensure additional money isn’t spent manufacturing parts that will be thrown away.  But there also may be opportunities to perform high-yielding tests in the lowest-cost insertion, since the likelihood of scrapping a device is low.

For SoCs — or any device that can execute software — an additional SLT insertion is gaining momentum. SLT tests the chip in a system context, and that includes the ability to run tests implemented in software and run on the chip’s CPU.

“Engineers are seeing higher defect rates due to shrinking process nodes,” said Paul Klonowski, SLT marketing director at Teradyne. In addition, “Packaging has become more complex as higher-functionality devices are getting smaller. You get escapes from wafer test, and you get escapes from package test. You want to catch those using system-level test.”

Vectors vs. software
Logic chip testing involves two types of tests, functional tests, and parametric tests, which measure things like current and voltage. To minimize test costs, functional tests are executed using dedicated hardware inside a chip. This DFT infrastructure allows test vectors to be applied directly to internal circuits. After the chip is then clocked appropriately, the results are sent to the outputs for evaluation.

To keep the silicon cost low, the input vectors and output results are scanned in and out. Results aren’t output directly, but are rather compressed into a signature that can be compared to an expected signature for pass/fail determination.

Fig. 1: Traditional scan-based testing uses built-in design-for-test (DFT) infrastructure that is separate from the main functional architecture. This simplified conceptual drawing shows scan chains in blue controlled by a block that connects to external test pins. Source: Bryon Moyer/Semiconductor Engineering

Fig. 1: Traditional scan-based testing uses built-in design-for-test (DFT) infrastructure that is separate from the main functional architecture. This simplified conceptual drawing shows scan chains in blue controlled by a block that connects to external test pins. Source: Bryon Moyer/Semiconductor Engineering

The intent is to pack as many tests into as few vectors as possible. That efficiency allows for higher test coverage in the shortest possible time, keeping the test cost in range.

What is important to note, however, is that such testing does not resemble how the chip would function in a system. Multiple blocks, which might functionally never be used at the same time, can be tested at the same time simply to reduce the number of vectors. So the test modes put the device into an unnatural state from the standpoint of a target system — so much so that power profiles during testing can be far higher than would be experienced in so-called “mission mode.”

Driving tests through software executed on the CPU keeps the device in a natural state, but it doesn’t have the efficiency and test density of scanned vectors. It does allow specific use cases to be tested, however – in particular, corner cases involving the interactions of multiple blocks in the SoC.

Fig. 2: Software-driven tests originate as programs loaded into the CPU. The CPU then executes those programs to control various elements in the SoC. A simplified flow is indicated with red lines. Source: Bryon Moyer/Semiconductor Engineering

Fig. 2: Software-driven tests originate as programs loaded into the CPU. The CPU then executes those programs to control various elements in the SoC. A simplified flow is indicated with red lines. Source: Bryon Moyer/Semiconductor Engineering

Generating software-defined functional tests
The generation of vector-based tests uses technology that has evolved over decades. It leverages stimulus and observation of specific points within the circuit, and the scan chains allow direct access to those points.

So traditional vector generation involves identifying for each point what is needed to activate the needed state, and then what the desired output should be. Effort is then put into combining as many vectors as possible to reduce the overall count and, hence, the test time.

Software-driven tests are different. The CPU has no direct access to the internal circuits in the way that the scan chains do, so this is less about confirming that specific circuits are working. It’s more about exercising real-world scenarios to confirm that the chip is operating as expected overall.

Those scenarios may be chosen because they’re expected to stress the chip more. Or the intent could be to run the tests under extreme voltage or temperature scenarios to make sure the chip works as expected under those conditions.

That means, in general, that SoC design teams need to decide which tests to perform and how to perform them. In reality, some software tests may be run during chip verification before the chip is even built, often for use during emulation.

Unlike lower-level tests, which operate at the bit or vector level, these tests operate at the level of the C language and are compiled down for the CPU in the SoC.  “Software-driven tests are typically easier to create than they are in ATE, because it involves standard operating steps that you would see in a device that’s out in the field,” said Klonowski. “There’s hardware initialization, there’s memory training, there’s booting the operating system, putting the device into sleep and low power modes, performing your high-load operations, and checking against benchmarks. But you’re really operating the device as it would be in an application and then running these different operational tests against them.”

The Portable test and Stimulus standard (PSS) provides one way to generate tests that can be ported to any of the verification nodes, silicon bring-up tests, or high-volume commercial test. PSS helps to promote re-use.

Moshik Rubin, product management group director at Cadence, said that some tests can be re-used across multiple target platforms, such as simulation or emulation. “The generated tests can easily be ported to post-silicon and manufacturing tests,” he said.

But that re-use may not be as prevalent today as it could be. “I’ve seen teams on the pre-silicon verification side that did directed tests or maybe integrated software test,” said Filip Thoen, scientist at Synopsys. “You see people doing post-silicon testing for silicon wakeup. I’ve seen teams develop artificial test suites to stress the chip to optimize yield. And then later on, you still have to develop software for your development board, or maybe you’ll develop diagnostics. And there’s no re-use across all of this spectrum, which is a huge waste.”

For instance, SoC software tests done during emulation could be re-used for SLT. “What you’re loading into the emulator is probably not completely identical to the final software,” said Thoen. “What you have to customize [for emulation vs. testing] are the drivers, which set up the external environments.”

Acquiring test results
While the focus of the test is software, some hardware consideration still may be needed for obtaining the test result and for examining chip characteristics for diagnostic purposes.

It’s easy to think of a software-defined test passing simply by virtue of the fact that “nothing went wrong.” But a more typical test would look for some specific result, and that result would need to be delivered to an output somewhere.

Some such tests might naturally have results that show up on output pins (like, perhaps, a memory address appearing on a memory port). But that often may not be the case, so some other way of bringing out an internal result is needed.

One way to provide more output detail is to leverage the scan-chain structure inside the chip. That would suggest some interplay between the chip in this mostly functional mode and the test modes. But context also would help to understand why something failed.

“If I load software onto the processor, the software runs, and I get a pass/fail back,” explained Klaus-Dieter Hilliges, platform extension manager at Advantest. “I just ran a black box. I have a hard time making it work, because I’ve little observability of what’s going on. We have to be able to receive trace information that observes the functional test execution. If you go to Cadence, Breker, or other EDA companies that auto-generate such functional test cases, you can see what’s going on while you execute the test.”

In addition, many chips on modern process nodes also contain sensors and monitors. In the event of a failure, those circuits may provide valuable clues as to the conditions under which the failure occurs. So for diagnostic purposes, there also should be a way to take those sensor signals and send them out with the results.

“There’s a DFT interface, like JTAG, and I can go to a sensor hub that I can read to get this or that information,” noted Hilliges. “Those sensors are typically accessible by the embedded software. If I run a functional test, then it can read an internal interface.”

Despite all of that, however, it’s harder to quantify what you’re testing with software-driven tests. “That’s a huge challenge of SLT. There’s no good way to evaluate your fault coverage,” said Reichert.

Fig 3: The outputs of software-defined tests may be routed back onto the DFT infrastructure in order to derive diagnostic information about a failure. Those diagnostics may be assisted by data from on-chip monitors and sensors (orange lines) that may provide context on system conditions when something failed. Source: Bryon Moyer/Semiconductor Engineering

Fig 3: The outputs of software-defined tests may be routed back onto the DFT infrastructure in order to derive diagnostic information about a failure. Those diagnostics may be assisted by data from on-chip monitors and sensors (orange lines) that may provide context on system conditions when something failed. Source: Bryon Moyer/Semiconductor Engineering

System-level test
The whole point of software-driven tests is to focus on scenarios that can occur in a system context. There is a newer type of test being performed on some sophisticated chips after the traditional final-test insertion.

This SLT insertion runs on a completely different tester from the ones used for wafer sort or final test. “The test board is typically very much like the application board or the end use board that’s going to be seen out in the field,” said Klonowski. “You’re actually testing the device as it would be operating in the field.”

Teradyne’s Reichert agreed. “It’s real functional mode, as opposed to test mode. Things like thermal patterns on the die, clock noise, or power-supply noise will be different in a test mode versus actual operation. It’s a chance to get at other faults in the chip that you might not get during a normal structural ATE test.”

SLT has, to some extent, been around for a while. But in the past, SLT was done using smaller testers that handled smaller volumes, and there was not enough floorspace available to accommodate the number of testers that would be needed for high-volume test.

“That worked okay until the volumes increased dramatically and the SKU counts increased dramatically,” said Keith Schaub, vice president of technology and strategy at Advantest. “But the silicon process nodes shrank dramatically, which made thermal a much bigger problem and made SLT that much more important. So now we need to have something that’s really HVM (high-volume manufacturing) production-worthy.”

Lower-cost testing
While these testers can be more expensive than standard testers, they also can run hundreds of sites in parallel. And unlike final test, some SLT testers can run tests on each site independently, or “asynchronously” – not in lockstep. So if a device fails, it theoretically can be ejected and replaced with a new device to test without waiting for all of the other devices in its “cohort” to finish their tests. It even may be possible to run tests on two different chips at the same time, dividing up the sites between them.

Each individual site may not be completely independent, however. The testers have boards or boxes with some number of sites. These boards are inserted and removed as a unit. The sites still run their tests on their own, but all sites on a board start at the same time. If one fails early, it must wait until the entire board is complete to be removed. But that’s less restrictive than having to wait for hundreds of units to complete.

This independent characteristic allows more flexible flows for each device, according to how the tests proceed. “Asynchronous test sites allow for fault tolerance and repairs while online,” noted Klonowski.

Asynchronous testing, coupled with the high number of sites, makes this a lower-cost insertion. While wafer sort and final test try to minimize every microsecond of test time, SLT tests may run for minutes. “With hundreds of sites in the system, you get an overall lower cost per site, which enables longer test times,” said Klonowski.

Unlike the earlier test insertions, SLT usually doesn’t make explicit use of the scan test infrastructure – at least form the standpoint of executing vectors. It’s much more natural to run tests that execute on the CPU, and software-defined tests fit that description nicely. There still may be some connection to the internal test and sensor infrastructure for reporting purposes, but it’s the CPU that kicks off the tests.

If necessary, it still may be possible to run standard vectors if the tester supports it. “SLT testers can be equipped to run scan,” said Klonowski. “In fact, given the lower per-second cost of test and the high parallelism SLT systems provide, SerDes-based scan is one of the most likely ATE test candidates to be ported over to SLT. Equipping an SLT tester to run scan requires scan-specific hardware and instrumentation.”

It’s important to emphasize that, despite the SLT name, this is not the test of an actual system. It’s a test of a chip in a system context before the chip is shipped for use in building a system. It’s part of the chip manufacturing flow, not a system manufacturing flow.

When to execute software-driven tests
While additional test insertions usually are viewed as adding costs, the addition of a potentially lower-cost SLT insertion may help to reduce overall test costs.

Vector tests can be run at wafer sort or final test, so it might seem that software-defined tests could run in both of those insertions and in SLT. So what’s the right place to run them?

First, there’s a practical aspect to this. Software-defined tests need access to memory, which is typically not available on standard probe cards for wafer sort – especially multi-site ones. “Let’s say you bought some memory chip, and you try to figure out how to get it close to the ASIC,” explained Schaub. “How are you going to do that? Customers have tried, but have been unsuccessful.”

It’s easier to do when you’re not dealing with a whole wafer. “They’ve figured out how to get it to work at final test,” he added.

Test infrastructure aside, yield expectations or experience can help determine where different tests should be run. “If I find a high failure-rate test, I want to get that test over to wafer test if I can,” said Klonowski. “If it’s a very high-yielding test, I want to try and do it in system-level test, because that’s where my lowest cost of test per site exists.”

As a product matures, tests may move back and forth between insertions to optimize yields and costs. But that’s where companies would benefit from an easy way to convert vectors to software, and vice versa, so those tests can be moved around easily. “If you do find these software-test failures, you want to quickly be able to turn them into vectors to push it back from SLT to final test,” said Schaub.

Fig. 4: Two SLT systems. On the left is Advantest’s version; Teradyne’s is on the right. Source: Advantest, Teradyne

Fig. 4: Two SLT systems. On the left is Advantest’s version; Teradyne’s is on the right. Source: Advantest, Teradyne

Users of SLT
The SLT insertion isn’t appropriate for every chip. It tends to be used for system-oriented chips in markets where quality is critical.

“End users are driving higher quality requirements, which is leading people to say, ‘Through SLT, I will find faults I wouldn’t find in standard ATE,’” said Klonowski.

This is especially true where the external environment during testing is important. Temperature and voltage can be controlled on a per-site basis during SLT, so this can be a lower-cost way of ensuring that chips perform properly under all conditions.

Three markets in particular have demand for SLT — smartphones, automotive, and high-performance computing. They do so for different reasons and drive different test conditions.

Cell-phone chips usually will be tested at room temperature. The goal of this insertion for these high-volume chips is to ensure the highest quality to minimize equipment returns.

Automotive chips are more demanding and make use of automatic temperature control. They need to be tested from -40°C to 150°C to ensure chips can survive these environmental extremes in a safety-critical system.

Makers of chips for high-intensity computing, on the other hand, are mostly concerned with ensuring their chips don’t overheat during test. So rather than the test site enforcing a specific temperature, the goal is to provide enough cooling to keep the temperature below 125°C. Cold testing isn’t generally needed because the chips will quickly self-heat once started up.

Chips are now systems
With the advent of sophisticated SoCs powering applications running in high volumes and where the costs of failure are high, it can make sense to test the chip in a system mode under real-world conditions. The first word in SoC is, of course, “system,” making them systems in their own right.

Software-driven functional tests and SLT provide ways of ensuring these complex chips are shipped with high confidence, and that they will be able to fulfill their mission successfully in some larger system.



Leave a Reply


(Note: This name will be displayed publicly)