More complexity and increasing difficulty in testing SoCs is forcing a rethinking about when and what to test.
By Ed Sperling
The rule of thumb at 90nm—still one of the mainstream process nodes—has been that test is something you do when a chip is done. You attach electrodes on either side, make sure the signal comes through clearly, and that the SoC functions properly.
Try the same thing at 40nm, with multiple power islands, multiple voltage rails, lots of third-party IP and usually a slew of processors, and it doesn’t work. This is no surprise to design engineers working at these process nodes. What is surprising, though, is just how far up in the design process test has moved, how quickly it has moved there, and how many different facets there are now under the general heading of “test.”
The reality is that there is no longer a single test function. There are a suite of tests that increasingly need to be considered first at the architectural level and then additional tests that are conducted throughout the design process. This is no simple task. Moreover, the tools industry has only partially addressed this change, creating both opportunity in the EDA world and a level of frustration on the chip design side.
“There are a lot of different aspects to a chip now,” said Jonah Alben, senior vice president of engineering at Nvidia. “If you look back 10 to 15 years ago it was a straightforward pipeline. All you had to worry about was, ‘Does it do what it’s supposed to do.’ Now you’ve got to worry about performance, power, and noise, and even power supply integrity.”
He said the goal is no longer just reliability testing. It’s getting rid of the problem before a design gets to silicon. “You need to get all the stuff right up front.” That casts a completely different light on test—preventive as well as reactive—which is why there is so much activity in this part of the market these days. Done right, test can save time and money on the design cycle, and it has measurable results on manufacturing yield.
Defining the problem
What’s interesting is that the market is finally getting around to recognizing the value of test. Usually it’s the market that clamors for tools, while tool vendors wait until there is critical mass before committing resources. In this case, at least some of the tools have been available for years with very few customers.
“We’ve been at the International Test Conference four years in a row talking about front-end testability, and while some of the engineers were interested their managers would always do things the same way they had always been done,” said Mike Gianfagna, vice president of marketing at Atrenta. “People don’t change until they can’t do it the same way anymore. What’s happening is companies are forcing RTL designers to get done on time, and to do that you have to do testing earlier in the flow.”
This has caused a spike in demand for front-end testing tools across the board.
“Test and diagnostics have been around for the past 15 years—more if you look at what IBM was doing internally,” said Steve Pateras, product marketing director at Mentor Graphics. “It never took off in the mainstream. Even memory BiST (built-in self-test) has been around for several years. Now we’re seeing logic BiST and mixed-signal BiST. But to make all of this work requires up-front planning. You need to understand the design architecture, power and clock domains and how they communicate.”
That communication is particularly critical in an SoC with multiple power islands because to self-repair a power domain may need to re-initialize after being turned off. All of this has to be thought about at the architectural level, not further down in the flow.
Connecting the dots
There are several points where test needs to be considered. Up front at the architectural level is by far the most effective because the less that’s done the easier it is to change. But as every design engineer knows all too well, things can go wrong at any moment—and last-minute engineering change orders don’t help.
A second point is at RTL. This typically was done with static testing in the past. That doesn’t work at advanced nodes with multiple power islands and processors, however. What’s needed from there is delay test coverage and high delay test coverage.
“There needs to be far more planning at the RTL level,” said Mike Vachon, group director for the Encounter Test product team at Cadence. “But the addition of delay test and fault do drive up the cost.”
Part of the reason for the cost increase is that these kinds of tests are built into the silicon. It requires both area and energy to do the testing, and that requires engineering resources. While this seemed superfluous at 90nm, at advanced nodes it isn’t. But depending on how the budget is allocated for an SoC, the time to market and yield improvements from catching problems earlier may not be easy to justify further up in the flow. This is particularly problematic with power, which is not always obvious until chips are actually on the tester.
“There’s a big effort to understand this much better and push planning for this kind of testing way up in the design cycle,” Vachon said.
The third point of testing is on the back end, and that’s becoming more challenging, too. Higher density, keeping enough portions of the chip on to understand the physical effects such as crosstalk, and being able to take accurate measurements are getting extremely complex. Throw in stacking of die and it becomes even tougher, because it’s difficult to even track the flow of signals through multiple silicon layers.
Hierarchical design and the future of test
The challenge now is to have test follow the rest of the design process much more closely. In test, speed is essential. That becomes more difficult as density increases, more components and functions are added, and more interactions of hardware and software need to be considered.
One approach is to make testing more hierarchical. “As designs are done more hierarchically, test will move that way,” predicts Mentor’s Pateras. “You may have 50 different cores in a design. With hierarchical testing you do the testing on each core separately, which allows your RTL to run faster.”
Still, most chipmakers use multiple EDA vendors’ tools, while test historically has been almost uniquely a single-vendor endeavor. Weaving that into a multivendor tool flow is especially difficult when complexity is forcing test even further into a single-vendor endeavor. And so far there are no significant efforts to standardize test to eliminate this problem.
“There are not a lot of standardization efforts under way,” said Vachon.
“And right now, more complex issues require more of a single vendor flow.”
Leave a Reply