Problems And Solutions In Analog Design

At 7nm and beyond, and in many advanced packages, all devices are subject to noise and proximity effects.

popularity

Advanced chip design is becoming a great equalizer for analog and digital at each new node. Analog IP has more digital circuitry, and digital designs are more susceptible to kinds of noise and signal disruption that have plagued analog designs for years.

This is making the design, test and packaging of SoCs much more complicated. Analog components cause the most chip production test failures, and possibly as many as 95% of field failures. And with multiple chips in a package, any component failure can render all chips in that package useless.

“There’s so much redundancy being built into die-to-die IPs,” said Manmeet Walia, senior product manager for high speed SerDes IP at Synopsys. “Even if things break down during the manufacturing process, there’s a lot of redundancy and testability built into them. But in a die-to-die link, leads are not coming out on a package, so you cannot connect something to a connector and take it to test equipment over some wire. No matter how good that wire is, it’s a 10mm to 20mm link. Today’s IP has a humongous amount that needs to be tested — everything from different loop backs to different package generators to scope functionality.”

Redundancy helps overcome that, but it’s also adds margin. “There’s a transmitter built in every receiver, and there’s a receiver built in every transmitter — a dummy transmitter, dummy receiver — so that there is a full loop within everything,” said Walia. “There’s a lot being done to improve the KGD (known good die) testing, and that’s very important right because once you build your MCM together, the cost can be very high. An interposer-based MCM can be more than $100 just because of the cost of packaging. And if you find out the dies you’re using to build that MCM are broken, then your cost-per-good-die will go up. The idea is to do a lot of testability with these die, and make sure that the die are good before you put them in a package.”


Fig. 1: Aggregated functions in a multi-chip module. Source: Synopsys

That’s easier said than done, however. While digital designers have been automating test for 30 years, that’s not the case in analog.

“In analog design we’ve never had the equivalent of fault simulation so we don’t even have the first step in place that’s required,” said Art Schaldenbrand, senior product manager for transistor-level simulation at Cadence. “In digital design, we typically talk about time to market. Analog people worry about the time to get samples to their customers so they can start building products and evaluating them. They want to get that design-in, and a lot of that has to do with how to characterize something faster. To make that happen, there’s a lot of design and test interaction that has to be more automated than it is right now.”

Two IEEE working groups currently are focusing on those issues. One group is looking at defect modeling in order to do fault simulation, while the other is working on the equivalent of JTAG for analog.

Today, analog test buses typically are included in designs, but they often are hidden because for test there are only the input nodes and the output nodes, making it difficult to generate the right stimulus and get the right outputs. With digital, there are automated ways to build in test. Analog is more manual. It requires accessing intermediate points and bringing signals out to a single test point, which is basically a pin multiplexer. That allows engineers to select different nodes and look at what’s happening internally, which helps for debug and potentially production test because it provides access to an internal node.

The problem is this is not entirely possible in an automated flow. “We have defect simulation working now,” Schaldenbrand said. “It’s in production, but it’s still at the first stage of people looking at it and saying, for example, ‘I want to see what my test coverage is.’ If you do that at the top level on a chip, that’s a very simulation-intensive problem, and it’s difficult to do. Ideally, we’d like to get there because right now we don’t have a heat map: ‘I ran my tests, I’ve got 90% coverage of defects, and only 10% coverage from this block, so I’ll need to add that test interface there.’ We don’t have any automated tools for doing that right now. One of the holy grails we have to work on, going forward, is getting the capacity to get insight into what the coverage is at the block level for the top-level tests. Then you can touch-automate a test best methodology, but we’re not there yet.”

Part of the reason for the lack of automation in the analog domain is mindset.

“It’s not easy to generate analog fault models,” he said. “In digital, there are two states. Either my high state is bad or my low state is bad. But for analog, I might have an op amp that has gain and bandwidth and slew rate. My ADC is going to have other kinds of faults. So the approach that the industry has come up with over the last decade is modeling the defects. Traditionally for analog, we look at functional verification. Does the circuit work right? Parametric verification: are the parameters correct? What we’re trying to do now is say, ‘With my test program, can I test to make sure that the structure of my die is correct? Was it manufactured without defects?’ That becomes a problem because you have to figure out where the defects are going to occur in the design. Then you have to test all those. For a design that has 3,000 or 5,000 transistors, for example, and there is a gate drain junction, a gate source junction, there are five or six places to check for each one of those transistors. And that’s for just a simple block. I’m not talking about the 30,000 to 40,000 simulations that have to be run for each test to measure the test vector is false. This a prohibitive simulation problem makes it a very big challenge.”

Necessary steps
Getting this right is critical to the design. Before signing off an analog design, Zhimin Li, solutions architect at Mentor, a Siemens Business recommends these five tips to avoid silicon re-spins, delays to market, and reduced profit:

  1. Account for layout-dependent effects (LDE) in pre-layout simulations.
  2. Include reliability analysis and methodology in your development plan.
  3. Utilize mixed-signal verification to improve throughput and coverage.
  4. Avoid extrapolating the data from insufficient Monte Carlo runs for yield, instead move advanced variation aware methods utilizing machine learning.
  5. Use optimal methods for addressing both deterministic and random noise from the architecture to final verification phase.

“At first we must consider the critical impact of layout early on in the design and take into account the non-ideal effects such as proximity effect (WPE), length of diffusion (LOD), oxide-to-oxide spacing effect (OSE), and poly spacing effect (PSE), in addition to estimated routing RC. Running pre-layout simulations without accounting for the layout dependent effects may leave you far from the post-layout simulation results and result in multiple design iterations and delay the time to market,” Li said.

Second, reliability is very important for safety-critical and long-lifecycle ICs. For those blocks that handle large signals or power up/down sequences, aging and/or self-heating simulations can be performed to analyze how long the blocks can reliability function under certain stressed conditions, he noted.

Third, with today’s complex mixed-signal SoCs, it is imperative to ensure there are no functional errors due to interactions between the analog and digital domains. This requires an easy to use mixed-signal verification methodology applicable for both top-level and subsystem validation, Li said.

Fourth, variability in designs must be account for which are dependent on common variations such as process, voltage, and temperature, as well as local mismatch. “The common methodology to run insufficient number of Monte Carlo simulations and extrapolate the data for the targeted yield with the assumption of Gaussian distribution often lead to false results, especially for high-sigma targets (sigma>=4). Machine learning techniques utilized in variation aware designs software can greatly facilitate the process,” he pointed out.

Fifth, one of the most important aspects in analog design is to account for noise. “Various noise sources such as device noise, cross talk, inductive and capacitive coupling, substrate noise, PCB packaging effects and Electromagnetic Interference (EMI) should be considered and budgeted,” Li said.

Better coverage, faster sign-off
Layout-dependent effects (LDEs), such as well proximity effects (WPE), length of diffusion (LOD), oxide-to-oxide spacing effects (OSE), and poly spacing effects (PSE), play an increasingly important role in advanced designs. If they are not considered and taken into account at the early stages of a design, the post-layout simulation results can deviate significantly from the pre-layout ones. That, in turn, adds time and expense to the design process.

“In schematics, designers can include LDEs effects for each device,,” Li said. “However, the values to model the LDEs heavily depend on the final layout. Thus, it is crucial for the designers to start thinking from the beginning about how the circuits would be laid out to accurately catch the LDEs in pre-layout simulations. Estimated routing RC parasitics also can be added into the schematics to catch their effects, instead of waiting for the final RC extraction from the finished layout, because it might be too late to fix the relevant design issues.”

This is particularly important in safety-critical ICs, such as those in automobiles and medical devices, and ICs used in harsh environments.

“Aging and self-heating simulations are the analysis to quantify how long a chip can reliably work under certain stressed conditions, including high voltages and temperature, taking into account the device degradation mainly due to hot carrier injection (HCI) and negative/positive bias temperature instability (NBTI/PBTI). The aging and self-heating models usually are provided by foundries or EDA vendors. Not every sub-block needs reliability simulations, but those handling large-signals and power up/down sequence, such as mixers, ADCs, VCOs and Power amplifiers, are more likely suffer device degradation. Hence, it is important to check their reliability,” Li said.

Verification of mixed-signal SoCs is particularly challenging. As complexity grows, designer and/or verification engineers cannot rely on the divide-and-conquer approach of verifying digital and analog blocks individually, and then stitching them together for full-chip verification.

“As many design failures happen at the interfaces between analog and digital blocks, mixed-signal simulations at the top level, as well as at the subsystem, are imperative to make sure there are no functional errors due to interactions between analog and digital domains,” he said. “Even trivial bugs can result in costly silicon re-spins. For example, a wrong bus order in the programming bits from digital control can cause the functional failure of an analog subsystem. While mixed-signal functional verification is important for analog blocks, analog designers often are concerned with the accuracy in subsystem verifications like those for time-interleaved ADCs, PLLs, and RF transceivers, where digital calibrations are commonly utilized. In such scenarios, the mixed-signal tool and verification flow should offer both performance and accuracy.”

While digital verification techniques have evolved rapidly over the years, mixed-signal verification is still catching up. To address the challenges, mixed-signal simulation solutions should be fast, accurate, easy to setup, easy to debug, and seamlessly integrate into existing analog and digital verification flows. Interestingly, the analog components on a mixed-signal SoC use the same technology as their digital counterparts, which means the MOS corners are extracted based on digital delay, power and 1/0 strength, which are not optimal for analog verifications.

Different design houses, groups, or even individual designers may adopt different methodologies to check variations due to PVT corners and local mismatches. Common issues include:

  • There could be as many as hundreds or thousands of PVT corners, and it is very expensive to identify the worst one.
  • Relying solely on digital corners and running Monte-Carlo at only a typical or FF/SS corner may cause over-design or under-design for analog functions.
  • Because of time-to-market pressure, limited computing resources, and limited understandings on statistics (designers may wrongly assume perfect Gaussian distribution for an output quantity), designers may only run tens or at most hundreds of Monte-Carlo iterations and apply a “mean +/- 3*Std” equation to extrapolate 3-sigma or even 6-sigma yield. Those numbers may be significantly off, leading to over- or under-design, or poor yield.

“Using a brute-force method for full coverage is highly unlikely in most cases,” Li said. For example, it requires 740 runs to catch a single failure for a 3-sigma yield target, and 1 billion runs for a 6-sigma target. So it is critical to adopt some intelligent variation-aware verification solution that is able to dramatically reduce the number of runs and offer verifiable results, for both corner sweep and Monte-Carlo iterations.”

Noise considerations
Finally, noise tolerances are tightening at each new node for digital, and in all fully analog implementations. Noise is one of the most critical specifications in the majority of analog and mixed-signal designs, and it has to be carefully considered throughout the design flow, from architecture to final verification.

“Device noise is often the dominating noise source of an analog block,” Li said. “To quantify its impact, different noise analyses, such as small-signal noise analysis, periodical noise analysis, or transient noise analysis, can be used depending on whether the circuits are continuous-time or discrete-time, periodical or non-periodical. Designers need to understand the pros and cons of each analysis to choose the appropriate one. It is also important to keep the tradeoff between performance and accuracy in mind so that at different design stages the analysis can be used wisely to achieve the simulation goals. For a mixed-signal SoC, we must take into account the layout floorplan at both the chip and block level to minimize the impact of noise, including but not limited to inductive and capacitive coupling, substrate noise, and PCB packaging effects.”

Signal integrity can be affected by everything from crosstalk to power and ground noise in packages and PCBs. EMI, which was largely ignored by digital designers for years, now can impact performance in both analog and digital below 10nm, and it will propagate through the air or through conductive layers in the package.

“A good example is that if multiple LC-tank VCOs on the chip oscillate at close frequencies, pulling can cause one VCO to oscillate at the same frequency as the other and the phase noise could be degraded,” Li said. “Even worse, the functionality could be deteriorated. Another example is that when sensitive receiver and power management unit (PMU) are on the same die, the EMI from PMU (aggressor) can degrade the noise figure of the receiver (victim). The EMI issues exist even if analog/RF and digital functions are on different chips but share the same package.”

Conclusion
To avoid test failures and field failures, Schaldenbrand said it is important to not skip any steps, follow best practices, and utilize tools as much as possible. “You don’t have to spend a lot of time on corner analysis or Monte Carlo,” he said. “Those are things you can set up quickly and send off and run. In the same way, for verification, there are new tools that automate verification. You just tell the tool what you want it to do. You send it off then you look at the results of coverage. In this way, it’s a lot more like digital verification, and so they have other constraints such as getting a good chip out, but also getting it out quickly. That’s the big challenge.”



Leave a Reply


(Note: This name will be displayed publicly)