Dealing With Test More Effectively

Routing congestion, noise, power and performance impacts are forcing designers to rethink when, where and how they deploy test.


By Ed Sperling
Shrinking geometries are starting to have the same effect on test as they are on other parts of an SoC, with the focus shifting from area to leakage, heat, noise, signal integrity, and the impact on overall system performance.

The warning that design teams have to consider test much earlier in the design was issued to chipmakers years ago and largely ignored. At 28nm that warning is finally starting to be noticed, but it still doesn’t completely solve the problem. Testing a chip is becoming more difficult, and the level of difficulty will grow at each new node, as well as when they get stacked together into a single package.

Testing can be done one of two ways. Either the chip can be hooked up to an external tester, which is the fastest and easiest, or it can be done with memory or logic built-in self-test (BiST). Testing has always been about getting done as quickly as possible because time literally costs money. But as features shrink and wires become thinner, using normal testing methods can destroy a chip. The accepted alternative is slowing down the testing by lowering the power, but that also slows down the testing—compounded by the fact that there are more features and circuitry on every new generation of SoCs.

“Test is also more susceptible to noise and switching activity at advanced nodes,” said Brion Keller, senior architect for Encounter Test in Cadence’s front-end design group. “The way around this is to decrease the switching activity during test and apply more tests. But that takes more time and increases the cost.”

Those kinds of tradeoffs are becoming more common these days as testing circuitry becomes more complex in order to deal with everything from power islands that may be off or on, multiple voltages, more features on a chip, and more cores in a processor. Consider, for example, what to do with a chip that has multiple cores but not all of them are functional.

“All of this causes complexity issues with test,” said Keller. “You may not be able to switch all the gates on a chip. Or you may have lots of instantiations of the same core and a partially good die that is still good enough to sell but maybe one core is bad.”

Logic and memory BiST add even more circuitry and overhead to the die, and there is even circuitry that enables in-field defect repair.

“Given the complexity of today’s systems, this is the only way to guarantee a high level of product quality,” said Indavong Vongsavady, CAD director of technology and research at STMicroelectronics. “But test circuitry consumes area and power and may affect performance if not properly designed. Overall, it is a tradeoff between design overhead and product quality. The higher the quality requirement, the greater the overhead.”

So far, he said, the overhead ratio has not been affected by shrinking dimensions. But as more logic functions are added into the same device, the overall test time does tend to increase. ST’s solution to this problem is more elaborate test compression solutions, Vongsavady said.

Complications and solutions
Test creates other issues, as well. Power, in particular, has become an issue for testing because it literally can cook an SoC if the circuits are blasted with high currents. Robert Ruiz, product marketing manager at Synopsys, said that earlier this year there was a case where BiST melted the socket balls on a chip.

“A limitation of BiST is high coverage,” he said. “But there also are self-test versions, which is what they’re starting to use in automotive applications. When you stop at a red light it scans the circuits to see if there are any problems. We’re also seeing more system-level tests, where customers are moving away from internal self-tests to a commercial solution. There are networking applications where tests after manufacturing now include environmental factors.”

Still, adding circuitry into a chip for test also adds the same kinds of complications that any active circuitry does. To control power, clock gating is often necessary during a scan test. That adds more circuits in itself. So does compression, which can cause congestion because it requires more connections between the scan pins and the scan chain.

One solution to that problem is a hybrid compression model, which uses multiple test engines to complete a test rather than a single one, as well as multiple channels to do those tests.

“There are more patterns to test if you’re using a lower toggle rate,” said Steve Pateras, product marketing director for Mentor Graphics’ silicon test products. “With a hybrid design you use the capabilities of both engines and decrease the combined overhead by 30% to 40%.”

Improved relevance
A second solution is to not test everything—or at least not twice. This becomes particularly important in stacked die, where testing can do damage to a 50-micron-thick chip and access may not be so simple. The IEEE 1838 working group is attempting to solve this issue by defining chip-to-chip communication for running tests, but so far there has been no standard released.

In other cases, tests need to be routed around critical paths, so they can be run as needed in the background. Mentor’s Pateras said programmable test engines help significantly there, because each chip’s critical paths are different. “We’ve reduced overhead by three to four times using that approach,” he said. “In some cases we’ve shared a bus interface with ARM.

Having flexibility to reroute tests is important. Done wrong, tests can wreak havoc on yield, result in false positives for errors—particularly at advanced nodes where noise and heat can affect result—and impact the overall performance of the SoC.

Having standard ways of doing tests is likewise important, particularly in stacked die. “The reality is that you have to plan for this very well,” said Bassilios Petrakis, product marketing director for Cadence’s front-end design group. “There are a lot of proprietary tests out there. When you mix and match die with different tests, that will be a challenge for engineers.”

And having an understanding of what can go wrong at the architectural level, and building that into test approaches early in the flow is perhaps the most important. Most experts agree that test is no longer something to be done later in the design. It now needs to be part of the initial planning or chipmakers will pay in time, in performance, in power, and in trying to fix problems that should never have occurred.