Test Chips Play Larger Role At Advanced Nodes

Opinions diverge about whether to use fewer test chips, or whether to put more diagnostics into those chips.


Test chips are becoming more widespread and more complex at advanced process nodes as design teams utilize early silicon to diagnose problems prior to production. But this approach also is spurring questions about whether this approach is viable at 7nm and 5nm, due to the rising cost of prototyping advanced technology, such as mask tooling and wafer costs.

Semiconductor designers have long been making test chips to validate test structures, memory bit cells, larger memory blocks, precision analog circuits like current mirrors, PLLs, temperature sensors and high-speed I/Os. This has been done at 90nm, 65nm, 40nm, 32nm, 28nm, etc., so having test chips at 16nm, 7nm or finer geometries should not be a surprise. Still, as costs rise, there is debate about whether those chips are over-used given advancements in tooling, or whether they should be utilized even more, with more advanced diagnostics built into them.

“Modern EDA tools are very good,” said Tom Wong, director of marketing, design IP at Cadence. “You can simulate and validate almost anything with a certain degree of accuracy and correctness. The key to having good and accurate tools and accurate results for simulation is the quality of the foundry data provided. The key to having good designs—layouts, for example—is having a high-quality, accurate DRC deck that catches all the things you are not supposed to do in the layout.”

Most of the challenges at advanced nodes are in the front end of the line (FEOL), where semiconductor physics and lithography play an outsized role. Issues that were not an issue at more mature nodes can manifest themselves as a big problem at 7nm or 5nm.  Process variation across the wafer, and variation across a large die, also presents problems that were of no consequence in more mature nodes, Wong said.

As those front-end manufacturing teams perform diagnosis, find defects, do failure analysis, drive yield with scan test, and work toward silicon bring-up in order to get the first silicon out, test chips are just one piece the puzzle.

“It’s something that doesn’t get a lot of attention because it’s not volume,” said Matthew Knowles, product marketing manager for silicon learning products at Mentor, a Siemens Business. “No one sells a test chip, so making the investment into a test chip is a challenge. Looking back to the late ’90s, there were SRAM memory-based test chips, and you could get pretty far with that because it had a pretty dense array structure. So you could look at the tightest pitches and look at defectivity on your in-line inspection. Then, as that correlation between what you saw in the SRAM stopped matching the random logic as much as it used to, test chips as they were previously thought of ran out of gas, so engineering teams started leveraging scan diagnosis for failure analysis. But scan diagnosis was rather manual and arduous and problematic, so the industry streamlined that, made some tools to support that, and now there are tools available from every EDA vendor.”

Coupled with this, there began to be more product-specific issues that couldn’t be correlated to test chips, so scan diagnosis was used even more to take production material and turn it into a yield vehicle, Knowles said. “Here you have built in scan test into these devices, and then you leverage that through volume diagnosis to do yield learning. That brings us close to present day.”

At last year’s International Test Conference, Knowles noted a couple of good sessions on test chip design. “People have to have complex random logic in their test chip and running diagnosis there. They’re doing volume diagnosis on the production vehicles. They’re doing memory on the test chips, and now they’re adding all these other IP blocks. As such, there is a ping pong back and forth between the test chips not being as good as the in-die type of diagnosis. Now you absolutely have to have both.”

For complex SoCs today, Wong believes the questions on the table now include:

  • What is the role of test chips in SoC designs?
  • Do all hard IPs require a test chip for validation?
  • Are test chips more important at advanced nodes compared to more mature nodes?
  • Is the importance of test chip validation relative to the type of IP protocols?
  • What are the risks if I do not validate in silicon?

He noted that there are many high-performance protocols, such as LPDDR4/4x PHY, PCIe 4 PHY, USB 3.0 PHY, and 56G/112G SerDes, and each one of these IPs is very complex. “If there is any chance of failure that is not detected prior to SoC (tape-out) integration, the cost of retrofit is huge. This is why the common practice is to validate each one of these complex IPs in silicon before committing to use such IP in chip integration. The test chips are used to validate that the IPs are properly designed and meet the functional specifications of the protocols. They also are used to validate if sufficient margins are designed into the IP to mitigate variances due to process tolerances. All high-performance hard IPs go through this test chip/silicon validation process. Oftentimes, marginality is detected at this stage. At advanced nodes, it is also important to have the test chips built under different process corners. This is intended to simulate process variations in production wafers so as to maximize yields. Advanced protocols such as 112G, GDDR6, HBM2 and PCIe4 are incredibly complex and sensitive to process variations. It is almost impossible to design these circuits and try to guarantee their performance without going through the test chip route.”

Besides validating performance of the IP protocols, test silicon also is used to validate robustness of ESD structures, sensitivity to latch-up and performance degradation over wide temperature ranges. “All of these items are more critical at advanced nodes than more mature modes. Test chips are vehicles to guarantee design integrity in bite-sized chunks. It is better to deal with any potential issues in smaller blocks than to try to fix them in the final integrated SoC,” Wong added.

Hugh Durdan, vice president, strategy and products at eSilicon agreed that test chips continue to be an important part of IP development. “We recently taped out 7nm test chips for a gearbox/retimer, our 112G SerDes and an updated library of AI functions. What is common about all these test chips is they contain complex analog circuitry. No matter how much simulation you do, a test chip is still needed to validate performance when you’re pushing the envelope from an analog perspective. This is not so critical for digital designs.”

According to Knowles, the process variation that inherently exists today, which is coupled with the product-specific design, is the reason why engineering teams are now applying machine learning along with diagnosis within the cell. The goal is to obtain statistical analysis for some learning in order to figure out what the root causes are. For volume, that is monitored over time.

“Another thing we’re seeing is people making the leap into Design For Diagnosability,” he said. “The diagnosis piece itself is inherently ambiguous: if you just do the diagnosis of one die, you have a probability that the defect will be in a certain location because of the design of the circuit, the logic code, etc., and that’s why we do machine learning on it. Now, people are seeing this as such a challenge and they’re considering modifying designs to make it better.”

A number of papers have been published lately on Design For Diagnosability. On a test chip, engineering teams are designing specific circuits that are both representative, but also optimized for highest-resolution diagnosis.

One approach here is a two-dimensional scan. “There’s chain diagnosis and logic diagnosis,” Knowles said. “Chain diagnosis is the gross reality check to make sure that the scan chains are working, so if you have a really low yield, people do chain diagnosis just to see if the scan chains work. If the scan chains don’t work, you can’t access the logic anyway. In order to increase the resolution of that, one approach is to scan in one way, and when you hit a defect, scan out the other way. This means when you run into a defect, you turn around and back out, you can tell exactly where it came from. We’ve heard this discussed in the context of test chips, but there’s a possibility that it could make it into production chips if people are so desperate because it does take some extra real estate. I’m very curious to see how far Design For Diagnosability makes it into production, or if we’re going to take that same route of using production vehicles as yield vehicles to the point where we’re designing them for detectability and diagnosability.”

Either way, test chips are becoming a given in advanced designs. “Whatever your motivation for doing a test chip, the priority is obviously to gain as much insight as you possibly can from the exercise,” said Rupert Baines, CEO of UltraSoC. “Here, it is vital to include structures on chip that can collect as much data as possible about how the system is functioning as a whole — and can turn that into information that an engineer can understand and act upon.”

This is why it is becoming critical to have the right kind of hardware monitors architected-in. Those monitors need to be sophisticated enough, in terms of run-time configurability and integrated smarts, to deliver solid information to be able to shine a light wherever the engineer’s investigation takes them, Baines said. “It’s equally important to have the right kind of data science tools on hand. We’re collecting more information from chips now than a human brain can possibly grasp.”

Companies like UltraSoC are looking at algorithms and software tools that can bridge that gap. In practice, that means methods like anomaly detection, heat mapping and trend analysis.

Along these lines, many companies are building database-style pools of information, which they can use to look at physical-level effects like process variation. “If a system-level view of the impact of those effects at the functional level is added, you’ve got a truly powerful tool,” Baines said.

The quality connection
As chips find their way into markets such as automotive and industrial, there has been a growing focus on quality and reliability. Test chips have a significant role in ensuring that reliability, and increasingly that means test chips for IP.

“For PLLs, which are high-performance clock generators and SerDes, which are high speed data interfaces, these are like the heart and the lungs of the chip so basic functionality is quite critical. Randy Caplan, CEO of Silicon Creations. “If these circuits don’t work, their whole chip can fail, so the standards are quite high. We typically look at it in terms of statistics more than pass or fail as a quality metric. Above all of that are the statistics of mass production. Nobody wants to be the first to test an IP, and the mass production numbers are the most important metric to customers.”

To provide this data to customers, foundries give statistics of wafer volume and number of chips that have used a given IP, Caplan said. “That’s the strongest indicator of statistical reliability. To this point, we have a couple of PLLs that just crossed a million wafers in volume production, one of them on 125 different chips. When we show that to a potential customer, that’s usually the end of the discussion, but that’s just the first step. Someone has to be the first customer, so typically if the volume production is small or if it’s a new IP, then of course the next question is whether they can see the silicon test report.”

IP providers have very active test labs in order to provide the necessary test reports, which can range from 50 to 500 pages of data. Silicon Creations typically runs test chips with skew lots from the foundry, which is where the foundry shifts the process in each direction if they think it might vary in production, Caplan explained. “For a test report with skew lots, we have 12 test benches with thermal chambers and voltage controls. We try to vary the environment across every likely permutation for our customer’s production. Then, in the report, we essentially measure each of the key performance metrics that the customer is defining for that IP.”

This test report is not a standard document, but it’s expected to have all of these permutations of process, voltage, temperature and performance operating range included. “That’s the next step—to have taped-out your own chips and measured it in your lab, which is critical for any type of modern-day IP company,” Caplan said. “In fact, most leading chipmakers do their own test chip programs, because to prove an IP on an IP vendor’s test chip isn’t enough, as it actually may still have issues in the customer’s application because there is a disconnect there. For instance, we make our power supplies perfectly clean. We give the IP every advantage on a test chip, and even if we have noise generators and other things to disturb it, it’s likely that those don’t behave in the same way as our customer’s chip. It’s well intentioned, but it’s simply not possible for us to know the exact environment of our customers. As a result, almost all of the major chip companies discount the IP vendor’s test reports. Most of them have internal groups that they call test chip groups or quality groups that take the IP and prove it internally.”

The smaller the manufacturing node, the more weight the reports and data hold.

“If you look at 7nm, the first several companies that have gone to mass production, and the first companies that are preparing for production at 5nm, they all have internal test chip groups. The critical factor there is the test chip results. Basically, they don’t want the design to change after the test chip. That is very costly. It has schedule delays. It’s not as costly as changing a mass production chip, but the expectation is not that we’re going to use this test chip to try things out and then iterate. The expectation is first-pass silicon perfection, so to speak, on the first test chip of theirs, so there are a lot of costs involved and schedule,” he said.

For IP at the most advanced nodes, customers typically ask for simulation and reliability reports. In this case, top-level simulation results will be provided that go line-by-line through the data sheets. “The key here is that the definition of quality is that the silicon behaves in a way predicted by a data sheet or, or an agreed upon document so a data sheet failure is no different than a silicon failure,” Caplan said. “In fact, they’re identical, so there’s a lot of engineering negotiation involved between the customers, architects and the IP architects to agree on, on a data sheet that includes and defines all of the important performance metrics. Without that we don’t have a definition of quality.”

Test chips will continue to play a vital role in helping IP and SoC teams lower the risk of their designs, and assuring optimal quality and performance. As new technologies and approaches are developed and refined, engineering teams will gain new resources to leverage current and established nodes for the highest return on investment possible.

Leave a Reply

(Note: This name will be displayed publicly)