Balancing The Cost Of Test

The cost of test was approaching the cost of manufacturing, but have new test methodologies brought that under control? And are new challenges around the corner?

popularity

As semiconductor devices became larger and more complex, the cost of test increased. Testers were large pieces of capital equipment designed to execute functional vectors at-speed and the technology being used had to keep up with increasing demands placed on them. Because of this, the cost of test did not decrease in the way that other high-tech equipment did. Around the turn of the century, Intel declared war on test times, which had increased due to the increasing complexity of their microprocessors, and tester costs, which had increased by 25X in the span of two decades. This translated to a doubling in test costs per generation and led to the projections that test costs could exceed manufacturing costs.

Up until this point, testing was performed functionally. This meant a series of vectors had to be created that would exercise the design such that all possible manufacturing faults would be detected. These tests were created manually and fault simulation was used to grade the effectiveness of the test set. Manufacturing test primarily used the stuck-at model—any wire or device input could be shorted to supply or ground and the expectation was that the test program would detect these faults. Over time it became harder to generate such tests, and the length of the tests increased adding to the time spent on the tester.

At the same time, the cost of silicon had been steadily dropping. This made it economical to add complexity to a design if that could help reduce the test costs. This is the basic premise behind Design for Test (DFT), Scan Test, built-in self-test and a myriad of other techniques that increase the observability or controllability of a chip. The addition of such logic is generally referred to as structural test. Structural test not only significantly reduces the time spent on the tester, but reduces the time spend on test development.

Semiconductor Engineering spoke with experts in the industry to find out the state-of-the-art in DFT solutions and to look at new test challenges on the horizon.

“There is a minimum set of DFT capabilities that people always adopt in terms of test infrastructure,” explains Savita Banerjee, senior product marketing manager for embedded test repair at Synopsys. “Scan is not questioned anymore and typically that means scan everything. For memory, it is similar.”

Test, as with many other Chip Design and Verification functions, is not a one-size-fits-all solution. Every company and device has to balance many factors including the cost of the tester, the cost and time spent on DFT tools, chip real estate costs, defect rates and many other factors.

There is a direct correlation between fault coverage and the number of bad devices that can make it out into the field. “People used to be satisfied with 98% fault coverage,” says Kiran Vittal, senior director of product marketing at Atrenta. “In the mobile and consumer space, the remaining 2% could be a very large number of parts. Those companies now have a goal of 99.5% stuck at fault coverage. To generate patterns that target the additional 1.5% can double the number of ATPG patterns. This pattern explosion results in longer tester time and costs.”

Memory is one component that has employed built-in self-test (BIST) for some time, but according to Hem Hingarh, vice president of engineering for Synapse Design, the scope is growing. “BIST adoption is evolving beyond embedded memories to logic and mixed-signal blocks. For standard mixed-signal interfaces, such as PCIe, USB 3.0, HDMI and other SerDes interfaces, the IP contains BIST logic and uses IEEE 1500 ‘Standard for Embedded Core Test’, for running analog loopback tests that can also adjust eye waveforms via on-chip fuses.”

Logic BIST has been a niche capability in the past, but there is increasing adoption that is being driven by the need for in-system test. “This is important when you need to able to run tests in the field where there are no testers available,” says Steve Pateras, product marketing director for test at Mentor Graphics. “These are mainly safety critical applications driven by automotive. There are requirements in this space driven by ISO 26262 and LBIST is one way to achieve that.”

Pateras goes on to explain that it is design impact that is stopping more people from adopting LBIST. It requires a very clean design and that means no unknown states as this would corrupt operation. This means much more stringent design and test rules and insertion of LBIST is more complex than scan. “The cost benefit ratio has not been there for most people,” says Pateras, “but this is changing.”

Analog is a particularly thorny part of the testing process.

“Analog has always been an extremely difficult problem to improve the defect coverage and reduce test time,” says Robert Ruiz, senior product marketing manager for test automation products at Synopsys. “We have seen customers trying to provide means of quantifying the effectiveness of their analog test programs and one issue is that there are no industry standards for analog faults.”

An added complexity is that any test infrastructure added to the chip also has to be tested. “Customers do want this logic tested,” says Ruiz, “and when they can’t get to it using structural test they often supplement with some functional patterns. But the goal is to minimize this.”

New faults emerging
New technologies are adding new complications. “FinFet process technologies require new fault models,” claims Hingarh. “With new technologies, whether in the form of new circuit design, a new algorithm or new IP modules, the EDA tools need to be updated.”

But even that won’t completely solve the problem.

“Variability is the biggest contributor to failure,” says Vittal. “With the latest nodes, the fault models do not work well anymore. Stuck-at is no longer good enough and we have to add at-speed testing and cell-aware ATPG. This means that we cannot just look at the gates. We have to look at the transistors, and this creates a lot more faults.”

But not all agree. “The stuck-at model is not breaking down,” says Ruiz. “It is a matter of how much tester time, die area and productivity they want to trade off for higher-quality test programs. The big difference with finFETs is that analysis shows the fin is the most likely area for defects. Many of those defects result in a delay in the gate. If part of the fin has not been fully built, it will either not fully shut down or turn on and the net effect is a delay. This is being addressed by taking slack information from timing analysis and using this information to generate tests to find these types of faults along the longest paths. This is an at-speed test that is intelligently guided.”

Adds Synopsys’ Banerjee: “Whenever there is a new process node, we invest heavily in both memory design and the associated defects. This is the advantage of having the memory design in-house. By using simulations we are able to look at over 200 fault effects that exist in memories. We do not target each of these with a unique test, instead optimizing the tests based on what the memory physically looks like and the dominant failure types for that silicon.”

In addition to new technology nodes, new and very different packaging are being adopted, such as die stacking. “Hierarchical test is very applicable to 2.5 and 3D test,” says Pateras. “Instead of having a hierarchy of cores, we have a hierarchy of devices, dies and package. The biggest issue is standardization because you need to know about the test capabilities between the cores. This is being addressed by IEEE P1838 – Standard for Test Access Architecture for Three-Dimensional Stacked Integrated Circuits.”

Power-aware DFT
The complexity of many chips means it is not possible to power-up the entire device at the same time. The dynamic power consumed within the device, especially during test when activity levels are likely to be higher than normal operation, could either cause IR drop or create so much heat that the chip could be compromised. “Most SoCs use multiple power islands,” says Hingarh. “DFT has to be power domain-aware and scan chains constructed appropriately leveraging industry standard power specifications like (UPF/CPF).”

Vittal agrees: “You used to be able to test the whole chip at the same time, but today with multiple Intellectual Property subsystems you may not be able to test all of them at the same time. You have to shut down some of the system and test those with power. This can lead to a significant increase in the total test time.”

“The question is how to schedule testing of the various cores of the design,” says Ruiz.

New test methodologies
This added complexity is adding schedule pressures, though. “The amount of time required to generate ATPG vectors to meet the fault goals is a big schedule challenge,” says Hingarh. “Sometimes companies have to compromise by generating directed tests. Through early identification of areas in the design that have poor controllability and observability, the RTL designer can make changes to improve the efficiency, and generate optimum patterns that not only improve the quality of test, but also lower the cost of test.”

Also helping to keep costs in check are some new test methodologies. One attempt to reduce the cost of test is called Multi-site testing – testing multiple dies under one test program at the same time. This is adding new pressures on existing DFT tools. “We have customers wanting fewer pins to be used for test,” says Ruiz. “Using fewer pins is not just about form factor and packaging, it is the result of increasing adoption of multi-site testing. They need higher compression and fewer pins. As you reduce the number of test pins, you reduce the bandwidth and so you need more compression that can deal with this. This means that we have to evolve our compression technology. “

There is also a desire to make DFT tradeoffs earlier in the flow. “Simple checks at RTL can ensure that a design is fully scanable,” says Vittal. “It means you don’t have to wait until you get through synthesis and do scan insertion before the problem is found. For a design to be scanable, every register must have a clock that is controllable from the primary input of the chip and the reset must also be controllable from the primary input. Latches must be transparent during test. Otherwise they can block the passage of signals.”

Some want to take this even further. “One question is how do we put more test up-stream?” asks Ruiz. “How can synthesis be more intelligent about test? For example, if we are talking about wrapping cores, some blocks may already have inputs and outputs that are registered, and so the synthesis can recognize those and not replicate the register. They can be reused and that minimizes the die impact.”

Yield enhancement
Finding bad devices is only one of the functions of manufacturing test. Another role is as a feedback loop for locating and understanding defects so yield can be improved. This requires finding the root cause of problems as quickly as possible so that systemic problems in the layout or mask issues can be located and addressed. This is a growing problem caused by increased variability in chips and their increased sensitivity to these variations.

“We are seeing issues that are becoming a lot more difficult to characterize and to find the root cause,” says Taqi Mohiuddin, senior director of sales, marketing, and program management in the microelectronics test and engineering division at Evans Analytical Group. “Some of the issues are becoming softer failures and this is putting pressure on failure analysis tools so that problems can be pin pointed and characterized.”

evansgroup
Courtesy of Evans Analytical Group

“For a single die failure, additional testing may be run on parts,” says Ruiz. “Yield analysis takes many different dies and correlates the locations of defects. Most of this may result in additional patterns being applied, but for volume applications, the net result may be a change in the design or manufacturing equipment to correct the defect.”

That data also can be used to figure out where test can be minimized. “You can use this data to get smarter about how to do testing,” says Mohiuddin. “By collating data from the fab, yield, production testing, and with field data, we can work out where there is the least amount of risk and then scaling back on testing in those areas and adding in areas where there is higher risk.”