New data suggests that more chips are being forced to respin due to analog issues.
Analog and mixed signal design has always been tough, but a resent survey suggests that the industry has seen significantly increased failures in the past year because the analog circuitry within an ASIC was out of tolerance.
What is causing this spike in failures? Is it just a glitch in the data, or are these problems real? The answer is complicated, and to a large extent it depends heavily on analog tuning.
Fig. 1: Flaws contributing to ASIC re-spins. Source: Wilson Research and Mentor, a Siemens Business
“Analog tuning means you need to be very clear about the performance of the analog circuit, given the context of the entire system, that you need to achieve in silicon,” says Sathish Balasubramian, senior product manager for AMS Verification at Mentor, a Siemens Business. “And that needs to be pretty close to what you get from silicon.”
Harry Foster, chief scientist at Mentor, analyzed the results of the Wilson Research/Mentor survey to see whether the spike was restricted to designs at the latest technology nodes, or if it was more widespread. It turns out that while 7nm or below was the most popular answer for those experiencing the problem, it only accounted for about 16% of the cases. Almost all nodes, including 150nm and larger, were seeing these types of failures.
A second area of exploration was whether these issues cropped up on large designs or small ones. The results can be seen in figure 2 below, but it clearly shows that while all design sizes are experiencing a rise in problems associated with analog tuning, the biggest percentage involves the largest designs.
Fig. 2: Tuning analog circuit flaws by design size. Source: Wilson Research and Mentor, a Siemens Business
Is the number believable? “One way to track the progress of analog design methodology is the percent of field failures due to analog elements of a design compared to other elements,” says Art Schaldenbrand, senior product manager at Cadence. “At a recent VLSI Test Symposium, it was reported that 95% of field failures are due to the analog elements of the design. Analog is difficult. The challenge of analog design is getting harder, and the impact of analog elements on the design are becoming more intractable. Plus, there are increasing pressures on analog designers. It takes more time to scale down analog power than it does to scale down digital power. This is because we have to re-architect what we’re doing in analog to achieve that.”
Designs of all sizes and technology nodes are experiencing increasing problems with analog. “ASICs are getting more complex, driven by two main areas,” says Mentor’s Balasubramian. “One is going to be the migration to advanced nodes. But the biggest driver is that the number of complex mixed signal designs is increasing. This is mainly due to companies trying to optimize the area footprint to include analog within the same technology node. Everyone is trying to migrate to a single substrate, or a single technology node. That by itself poses a lot of challenges. When teams need to get exposure to analog design, or get exposed to some of the effects of advanced nodes — such as lower threshold, such as being really finicky in terms of parasitics — they can’t do a schematic simulation and say everything is working.”
A significant area of growth for the industry is coming from IoT devices. “When you look at mobile or IoT or handheld devices, you will find lots more analog components,” says Farzin Rasteh, analog and mixed-signal applications engineering manager at Synopsys. “You will get PLLs, RF, and high frequency modulation circuits. You will find charge pumps and op amps. And all of these devices have sensors, which are analog in nature.”
New nodes mean more issues. “At the latest technology nodes, there are new things to learn and it takes time to fully understand them and incorporate these new effects into tools and models,” says Benjamin Prautsch, group manager of advanced mixed-signal automation at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “For example, at the latest nodes, variability is very difficult to model. Parasitic extraction becomes a lot more difficult when many root effects come into the picture, and it can take time to isolate layout-dependent effects, which can exacerbate the parasitic issues.”
One such new issue is noise. “TSMC has published papers saying that porting an analog design into a higher technology node or advanced technology node comes with its own challenges,” says Balasubramian. “The best-case scenario is going to be that everything works and you’re probably lucky. To make that happen you probably have high margins. In most cases that will result in a performance hit. But the worst-case scenario is that it does not function.”
When you combine noise and variability, things get worse. “Consider the voltage regulator, or the charge pump, which is providing power to a datapath,” says Synopsys’ Rasteh. “What if there is variation in that and what if there’s a noise on that? How does that noise manifests itself in the timing of the signals and clocks, which in turn could lead to failure? What type of cause effect relationships exist, at these small geometries, at these high frequencies, where every fraction of picoseconds matters? These are the kinds of things that usually lead to failures.”
Some problems cause chips to fail, but there are other reasons why re-spins may be necessary. “The biggest challenge, when it comes to variation, is yield,” says Haran Thanikasalam, senior staff applications engineer in the Design Group at Synopsys. “We depend on simulation to provide an accurate sigma-based analysis, so that they can relate that to a yield fall-out. Tools have a difficult time applying different sigma values to different parts of the circuit. As a result, companies send out a test chip, and these provide a vehicle for all kinds of analysis. They can push the limits on the silicon, and then make the correlation between the simulations and the actual part.”
Expecting more from analog
There is continued pressure to improve analog performance even when the process technology is fighting in the opposite direction. “With in-chip integration, there is no longer a simple analog/digital boundary,” says Balasubramian. “We see designs architected such that there is no one way traffic where analog is driving digital. Instead, there are feedback loops. Consider the digital calibration of a PLL. A PLL used to be purely analog, but today they are adding digital calibration to make it faster and easier to converge. Now, your basic analog block has a digital component, and it’s not a uni-directional flow anymore. This requires more advanced methodologies.”
Rasteh agrees. “To check and adjust, we need more intelligence built into the design. This may include self control with a feedback loop to monitor the conditions for these events — whether it’s temperature, signal power that it is receiving, whether it’s an error-checking mechanism, whether it’s measuring the jitter or variation — and self-correct and compensate for it.”
This creates additional potential for failure. “Configurable analog has become very trendy,” adds Rasteh. “This is where you use digital or software to instruct analog to go to high power mode or high frequency mode, change the output of a current source, or change the output of a charge pump. Designs have to be resilience to extreme conditions, weather variation, or external factors, and when you pack so much in such a small area, you get crosstalk noise from high frequency circuits, digital circuits to analog, or vice versa. That’s not easy to simulate and model.”
Pushing performance
All communications rely on analog. At a minimum there will be a SerDes driving the signal across a harsh environment, and new versions of standards normally come with greater demands on the SerDes. “With more speed you need to have more accuracy,” says Balasubramian. “The margins are getting smaller and the speeds necessary to satisfy some of the interface requirements mean you have to take into account many more physical effects. That’s not easy to achieve in some designs, and it requires a lot more tuning. They used to sign off analog circuits without even taking into account the device noise that happens on a technology. But device noise can increase to the point where it really affects the performance of the PLL.”
Very small errors in interfaces can lead to catastrophic failures. “Process variation could produce errors or delays in the read/write of a memory,” says Rasteh. “If you miss that clock edge by a fraction of a picosecond, it’s enough to make read and write unreliable. If one out of every 30 or 40 writes is erroneous, that’s enough to make the chip or memory controller useless. The primary factor we attribute these problems to is variation and higher frequencies. So the tolerance for jitter or any mistakes is much less. Process variations create bigger variations in those frequencies or clock phases, or jitters. And because there is little margin in terms of absolute time, these designs react in a more pronounced way under variation.”
Higher quality
Not only are speeds going up and environments getting more extreme and noisier, some markets are demanding higher quality. “Any chip that goes into automotive requires extreme precision,” says Balasubramian. “For automotive, that means at least 5-sigma and possibly 6-sigma. There is no way in a reasonable time, that they can verify analog circuitry using Monte Carlo simulation. We are bringing machine learning technologies into variation analysis. With this we can make device verification possible in a limited number of simulations, rather than running billions of simulations.”
This requires a change in the development process. “Running multiple corner (PVT) simulations is never going to catch variation problems,” says Synopsys’ Thanikasalam. “They have to make use of statistical models that come from the foundry, and they need to sweep the entire range in order to catch these problems.”
Companies on the leading-edge nodes are aware of this. “When the process and the models are still evolving, test chips become highly important and can be an important step in validating extraction,” says Fraunhofer’s Prautsch. “Many of these new process steps start off being very manual in nature, and it takes time before repetitive and/or error-prone tasks can be incorporated into tools.”
Tools are merging that can deal with many of these issues. “How could a tool identify parts of your design that have the most effect on the output,” asks Rasteh. “Which part of the design is potentially contributing most to an error? If you have a PLL, and the jitter on that PLL is extremely important to you, controlling the jitter is important to you. So which parts of the design are the most likely to impact that? Is it a VCO, is it a feedback loop, a charge pump?”
Conclusion
While the survey number represents an extreme jump in ASICs that require analog tuning, the results correctly identify that analog design in going through many changes. Those at the extreme edge are experiencing a continuation of new challenges that come with each node, and they successively get more complex. Test chips, where the analog can be tuned, may just be part of the plan.
For designs on legacy nodes, those teams potentially are being disrupted as they have to deal with integration issues for the first time. Many of the legacy methodologies, where analog and digital were designed and verified separately, are having to be rethought and that impacts team dynamics.
All designs at all nodes are being pushed to be faster, use less power, and achieve higher yields, and each of those incrementally adds to the complexity.
Related
Analog Knowledge Center
Top stories, special reports, videos, blogs and white papers about Analog.
Problems And Solutions In Analog Design
At 7nm and beyond, and in many advanced packages, all devices are subject to noise and proximity effects.
Why Analog Designs Fail
Analog circuitry stopped following Moore’s Law a long time ago, but that hasn’t always helped.
Integrity Problems For Edge Devices
Noise becomes a significant issue at older nodes when voltage is significantly reduced, which is a serious issue for battery-powered devices.
Low-Power Analog
The amount of power consumed by analog circuits is causing increasing concern as digital power drops, but analog designers have few tools to help them.
Yet getting anybody to sign up for developing SystemVerilog-AMS seems to be a lost cause.
Most of the problems stem from the analog guys and digital guys not understanding each other in both IC design and at EDA companies, and failing to integrate their flows. Verilog-AMS was introduced ~ 25 years ago; little has improved since then (and all the implementations are half-assed).
Very interesting article, thank you Brian!
We faced the same challenge a few years ago when architecting the next generation MCU platform with complex analog / mixed signal, RF functionality.
It was clear we needed a formal verification at all hierarchy levels, including analog and we came up with a new methodology that brought us from having 5-10 architecture bugs at 1p0 to 0 architecture bugs.
The trick was to formally verify each IP within the digital flow, and it seemed it worked quite well 🙂
In Figure 2, the biggest percentages seem to be the smallest designs?
“verify each IP within the digital flow” makes sense for digital, but analog is a completely different planet. Tools are the same from the 70s, no innovation so far or in the spotlight.
This confused me too. I think that perhaps what he means to say is that the largest INCREASE in problems comes from the biggest designs.
There are newer tools out there that definitely address those issues, but analog designers don’t necessarily want to spend the time using them because it ultimately translates into more work (and they’re already quite busy).
Analog circuits are typically tested based on design specs – i.e. converting functional design requirements written as simulation testbenches and converting them into equivalent silicon tests. Most designers out there then assume that silicon devices with manufacturing defects (or large process variations) will fail such tests. That’s a flawed assumption – there are many ways an analog circuit might pass tests even in the presence of silicon defects. There are tools out there to highlight such test coverage oversights. But it takes the willingness to address them and the effort to make some changes in order to get there.
The industry probably needs the analog equivalent of a DFT engineer. Someone whose job is to ensure that all devices are correctly tested (and testable) and truly live up to their specs, while all defective devices effectively get screened out and don’t simply become test escapes. Someone that stands between analog designers and test engineers, addressing the needs of both while infuriating neither…