With few measurable methods to assess analog quality, it’s not clear how that can impact safety-critical applications.
As the amount of analog content in connected devices explodes, ensuring that the analog portion works properly has taken on a new level of urgency.
Analog circuitry is required for interpreting the physical world and for moving data to other parts of the system, while digital circuitry is the fastest way to process it. So a sensor that gives a faulty reading in a car moving at high speed or a medical device, for example, could be dangerous. The problem is that analog test is nowhere even close to the maturity of digital test, and analog designs are so unique that establishing that kind of consistency is difficult.
In a recent blog, Stephen Pateras, product marketing director within the Silicon Test Solutions group of Mentor Graphics, talked about the wide gap between the quality of digital parts on an SoC and the analog mixed-signal content. “The majority of field failures in automotive ICs now occur within the mixed-signal portion of the chip.”
Figure 1 (below) was supplied by On Semiconductor. While seemingly bad, Stephen Sunter, engineering director for mixed-signal DFT at Mentor Graphics adds “there are companies that have even more dramatic numbers but have not made them public. The difference in failures between analog and digital is huge.”
This rather shocking figure would seem to indicate that much more attention needs to be paid to the problem. But it also may be an indication that digital has improved so much that it is unfairly magnifying the analog failure rates. “One customer says the defects per million (DPM) for digital are down into the single digits, but DPM for analog is around 500,” says Pateras. “That number is not bad when it is not a functional safety issue. This is probably considered acceptable from a business sense in the consumer space.”
Sunter adds that many of these chip errors will get caught at the board or system test level and may not get through to the end customer. “If you choose to diagnose these failures, that is a business decision.”
This issue is acknowledged by Rob Knoth, product management director for Digital and Signoff group at Cadence. “At the International Test Conference of 2016, there was a panel dedicated to the topic of cost of test reduction. The panelists were hammering on the fact that analog test is by far the dominant test time and also the biggest opportunity for innovation.” Knoth points out that reliability and test are two halves of the same coin. “Something becomes reliable because you tested it. With testing, it becomes robust and you have confidence in it. The easier something is to test, the more confidence you can get and thus the more reliable it becomes.”
To get to the bottom of it you have to consider the progress made in digital test, comparing the differences between verification and manufacturing in the analog and digital domains.
Comparing analog and digital
Analog content is increasing in many domains. “The addition of sensor and wireless technologies on processors creates interesting design and fabrication challenges,” says Luke Schreier, director of automated test product marketing at National Instruments. “Many times those have to be addressed with software compensation/calibration for optimum power and transmit capability.”
Bob Lefferts, group director for R&D at Synopsys, characterizes the problem. “A lot of the analog content is associated with getting data on and off chip. That may be running at 6GHz and is fast enough that it gets hard to literally move the data through the medium, which absorbs high frequency signals much faster than low frequency signals. The IP has become a lot more sophisticated with calibration. It has regulators that are often built in because if you are sending data at 20X the rate of the data on the chip, (serial transmission versus parallel), you are susceptible to jitter which is often induced by power spikes. You want low noise circuits that don’t pick up a lot of power supply noise generated by the digital. That includes a lot of analog content including op-amps in the regulators.”
So how does this analog content get tested? “People have been asking for a revolution in analog test for a long time,” says Knoth. “The amount of digital content exploded and that became the dominant problem and had to be brought under control. Scan and JTAG, and technologies such as LBIST and MBIST, and then compression were all used to address the problem. Today, people don’t talk about problems in digital. Now the focus is analog because it has not seen that renaissance.”
There are differing thoughts on the right approach to the problem. “Why is analog hard to test when digital is easy?” asks Knoth “It comes down to the nature of the design. Analog isn’t about standardization. It is very custom and people tend to be artists. It is very hard to test art.”
But Lefferts is not in complete agreement. “Nobody does digital design today without taking test into account. In sophisticated IP, you have to build the test in from the beginning. And if you do that, it is not so bad. If you don’t and try to test something externally on a tester, that gets to be hard.”
Sunter acknowledges that separating design and test is not a good idea. “The failures in analog tend to be more parametric and they have no way of measuring the quality of the coverage of the tests that are developed. They are just going by experience. In many cases the data sheet is thrown over the wall to the test engineer who then has to write as many tests as they can to cover the specification in the time available and to maintain a test cost of less than 10% of the selling price.”
There are primarily two types of errors that occur. Random faults in the manufacturing process and parametric variation. Part of the analog verification process is to show that the circuit continues to operate over normal parametric variation as defined by the process/voltage/temperature (PVT) corners. By ensuring that such variations should not cause the design to go outside of specified ranges, variation outside of that range should be detected by characterization cells placed in the device. So while analog circuitry may be sensitive to such variations, they should not represent the types of errors that would need to be caught in analog test.
Sunter concurs. “PVT analysis is done to explore the range of normal variation, but this misses the shorts and opens and extreme variation that are not normal – those that are not Gaussian distribution. They are blobs that landed somewhere or anomalies in the optical masks. These are not covered by Monte Carlo simulation and cannot be anticipated.”
So where are most of the random failures likely to occur? “The analog portion of the chip tends to be larger, so statistically speaking it is more likely that a defect will land on it,” says Sunter. “But on the other hand, analog designers tend to allow more space and margin, so if the defect does land on it, it is less likely to kill the transistor.”
Area and density both play a role. “The most likely place for an error is in the digital,” counters Lefferts. “That is because they have many more nodes per square micron than analog. If you have a large driver on a regulator, then it is a thick gate device and has long channel length, it is routed with fat wires because there is a lot of current, and so the number of nets per square micron is way smaller than digital. In many cases we don’t even use minimum spacing when routing. It is done by hand and it is easier to give a little more margin so that adjustments can be made when post layout simulation is performed.”
Lefferts notes there also are better processes in place to deal with recurring problems in the digital parts. “By using diagnostic capabilities that exist in the standardized methodology, you can track down where the failure is, and there is amazing sophistication in finding the failure point so you can go back to the fab and look at ways to fix the problem.”
Extreme process variation can be identified, as well. “We put process monitors in our IP so that we can make sure we are not debugging a fab problem,” adds Lefferts. “What if we get a die that doesn’t work but the processing of that die is outside of the range of what you simulated? We have ring oscillators in our design that can be accessed through a diagnostic port and measure their speed to find if they are outside of the expected range.”
ISO 26262 forcing change
Safety critical application areas are pushing for change. “Given the emphasis on medical devices and transportation, you also see mandates for physical testing much more often to ensure quality and safety,” says NI’s Schreier. “Regulatory standards, like ISO 26262, are being pushed down to the ICs. The best way to prepare for these trends is to provide test capabilities that are both flexible and software-defined.”
“This is driving things in two directions,” explains Sunter. “One is to get zero DPM, and that means higher coverage. Analog fault simulation is attracting attention because it provides a way to actually measure the coverage. The second thing is that it requires you to address reliability. That means putting in self-monitoring such as voltage monitors. Some customers put a number of them around the chip. These are part of a safety mechanism so that if you don’t have enough design margins to tolerate many of the defects that can occur, then at least the safety mechanism will trigger and say that I have a fault and allow the chip or system to be put into a safe state.”
Knoth see it as a higher-level problem. “From a functional safety perspective, they are looking at things from a higher level of abstraction. They want to inject faults and make sure that the product goes to a safe state. We have to look at what that means for analog – what does controllability mean? What does visibility mean? We may find that by looking at fault simulation, but that may be too brute force of a solution to what is a more complex problem.”
Progress is being made. “Every single defective device in automotive has to be returned to the manufacturer (this is required by 26262 and part of the 0 defect parts per million) and the company has to track down how the defect occurred, how it escaped, how it will never happen again,” says Sunter. “This is very labor-intensive. And what is being reported is that defect quality for automotive, while getting better, is asymptotically approaching 1 or 2 DPM.”
New standards expected
There are various standards being developed to help mitigate this problem. “We have seen IEEE 1687 (iJTAG) being used in mixed-signal ICs and it allows a systematic way to develop tests at the block level and then to be able to retargeted those tests when that block is embedded within the chip and then to be able to reuse it on the next chip,” says Sunter. “That means you can spend more effort on making sure that test is really good and then automation takes care of migrating it to the rest of the chip or into other chips.”
But that has yet to lead to many new tools. “IEEE 1687, which defines a methodology for accessing instrumentation embedded within a semiconductor device, is being extended to do analog fault modeling,” says Knoth. “We have to approach it in a structured way and the key is to focus on the definitions and models for both. When we all agree on a useful set, then we can start talking about how to deal with fault simulation and how to address the impact of functional safety. Then we can work on making that faster and to scale. So it requires the three steps.”
Sunter sees the fault simulator as being central to any improvement strategy. “How did digital make all of its progress? The reason was because they had fault simulators, and this enabled them to see how poor their manually generated tests were. That drove the development of scan in digital, which drove the automation to put the scan into the design, and that drove BIST and scan compression and all of those advances.”
As the old adage says, if it isn’t tested, then it probably doesn’t work. “Smart devices need smarter test systems,” says Schreier. “That means the test system needs to upgrade or evolve with software faster than the device roadmap. It needs a modular hardware infrastructure that’s easily upgradable. And perhaps most importantly, it needs to leave a lot of flexibility for the user to adapt to quickly changing requirements and standards.”
Lefferts believes the way forward is more self-test. “There is some overhead, such as the ADC, and some extra overhead for the oscillators. There is extra overhead for the necessary routing so the signals can be routed around for measurement. So there is overhead, but it is similar to digital. It is probably less than 10% for testability features. But if they are not built in up front, you won’t have any place to put them. If you leave all of the testing to pins and the outside world, you will find that to be very expensive because the tester has to be smart.”
Reigning in the art
Analog designers are often seen as old curmudgeons, but these designers understand their working environment really well and that is key for the quality of the work they do. But more junior analog designers do not have their wealth of experience and this makes the chance of problems more likely.
“You can build a better, faster, cooler amplifier if you can do whatever you want,” says Knoth. “But if you can’t test it and prove it is safe, it will never find its way into an automotive chip. As an industry, you have to start maturing and following some rules.”
Lefferts concludes that “if analog is causing all of these issues, it may be because they have not yet figured out the best way to do built-in self-test.”
Putting Design Back Into DFT
Structured test solutions have had a profound impact on test time, cost and product quality, but the industry is starting to look at some alternatives.
Gaps Emerge In Test Flows
Increasing analog content, more complexity and silos are making test much more difficult.
Mixed-Signal Design Powers Ahead
The design and verification of current and next gen mixed-signal designs require a deep understanding of how it will be used.
Tech Talk: ISO 26262
What can go wrong in designing to this automotive standard.