What’s Missing In Test

Different types of test all work, but catching every real and potential issue remains a challenge.

popularity

Experts at the Table: Semiconductor Engineering sat down to discuss how functional test content is brought up at first silicon, and the balance between ATE and system-level testing, with Klaus-Dieter Hilliges, V93000 platform extension manager at Advantest Europe; Robert Cavagnaro, fellow in the Design Engineering Group at Intel (responsible for manufacturing and test strategy of data center products); Nitza Basoco, technology and market strategist in Teradyne’s Semiconductor Test Group; and Robert Ruiz, senior director of product management at Synopsys. What follows are excerpts of that conversation. To view part 1 of this conversation, click here.


L-R: Hilliges, Cavagnaro, Basoco, Ruiz

SE: There are challenges with software based functional testing, possibly creating a worst-case electrical result. Where does the Portable Stimulus Standard (PSS 2.0) fit in, and is it getting traction?

Ruiz: We do see customers starting to use the Portable Stimulus Standard for software-based test content. The advantage is they initially are doing this pre-silicon. And then, if they need to adapt for whatever reason for actual silicon, it’s very easy to do with PSS. So yes, we’re seeing some cases of adoption, and definitely interest.

Hilliges: Pre-silicon is crucial. The good thing about pre-silicon is they often have the same constraints that we have in production test. They do not have a lot of simulation time. It’s targeted to create some coverage. Of course, Rob, you’re right. At the end, we need to also drive the electrical stress on top of that content.

SE: Let’s focus on first silicon. You have different platforms in which you can run your content. You have wafer test, packaged part test, system test and you have your designers in the labs. Where and how does functional test play a role during the first silicon process?

Cavagnaro: Everywhere. Part of the reason we got away from pin stimulus testing a long time ago is that it is not portable. It’s super difficult to do in a sort environment. It doesn’t go up to a system-level test very well. It was very narrowly focused on certain platforms. That’s why we moved to structural-based functional tests. We even have system-based wrappers that allow us to do the exact same thing that we’re doing on ATE at the system level. Or we can run that same content just in native system mode and see the differences there once we get to a platform. And the correlation between each insertion is critical. Obviously, at sort, you don’t see a lot of the electrical things that you see in a packaged part. It’s more isolated, especially in our situation, because we have FIVR (fully integrated voltage reference) regulation, which does two completely different things between sort and class. There’s so much you need to learn about the part using functional tests these days. If you don’t have that back at sort, you’re setting yourself up for a pretty bad experience in the back end from a yield and coverage perspective.

SE: So are you talking about first silicon or manufacturing or both?

Cavagnaro: Both. The market demands of today require that you have content enabled at power-on. The industry standard is that content is up and running within a week or two. And you need to know what your chip is doing quickly so that you can enable your downstream platform people and your downstream customers. Nobody is tolerating months to get my content up and running and correlated and dialed in. Those days are long gone.

Hilliges: On one hand, it’s time-to-market. On the other hand, it’s coverage. You have to have coverage on the die to have valid chiplets. You can’t afford not to have functional test at probe.

SE: But at wafer test you’re limited in how much power you can apply, right?

Cavagnaro: You can get most everything done. It just means you run it slower. But you can still get a ton done at sort.

Basoco: The question is whether you can do your worst-case power corner. But at wafer sort, you should be able to accomplish 99% of your test list and your test conditions, so that once you go to the next stage and you’re integrating everything together, you’re okay.

Hilliges: The biggest problem in a wafer probe is that your memory is not attached. You cannot run the workload with an HBM. What can we still do in terms of workloads that are relevant in probe, despite it not being completed as a package?

Basoco: When we bring content up with the ATE engineer, and then designers on the bench, the concerns are a bit different. Some designers care about a particular block. ‘What is the performance? What is that block really doing?’ Versus the system engineers, who would be thinking, ‘What does my signal path look like? What does my performance look like over that path?’ And the ATE engineer is thinking, ‘I need to be able to feed key information and data back to everybody.’ The preparation for first silicon and the integration with the test development engineer becomes super critical.

Hilliges: You described the craziness of the first silicon to bring up. All these teams are trying to bring up stuff. For example, the poor guy wants to bring up his IP block, but unfortunately the DRAM IP is failing. And suddenly you have a nightmare on the bench. We now see some customers use Portable Stimulus Standard as a way to divide and conquer. I can use a PSS setup to bring up one IP block, or even take over some of the low-level driver control in that PSS model, concentrating on this IP block. Then the next engineer, and so on. I don’t have to stall the entire path because one guy isn’t ready. The other guys should still be able to proceed.

Cavagnaro: All you can do is build out in your concentric circles. You start with localized scan, and you make sure things are good. And then you have localized functional tests for certain things. Then you start to build out to more serious functional tests that integrate things. You have to approach it from the perspective of how you’re going to power-on the part, and then you can kind of keep going down the line. That’s why PSS might be a nice thing to evaluate, because there’s no easy button today. Once you cross into functional content, you need a lot of stuff to work.

SE: What happens with functional test content for production test? When a customer has tons of functional test content available, how do they choose which ones to use in production?

Cavagnaro: We do two things. One is we fault-grade our content. We fault-grade our scan tests, then assess functional tests on top of scan to see how we can get top off coverage. In addition, sometimes functional test is just way better at hitting things. And silicon data will tell you that part later. The other part is we do electrical grading of our content, because scan does a pretty good job of getting at most of the hard defects. It’s really the transition defects that generally are the most challenging, so let silicon dictate that content. You run all your content over 1,000 units. You can tell pretty quickly what is your most effective, what is your least effective. The issue is that every time you change silicon, or your process, you need to redo that because that can change how your content is working.

Hilliges: The pass-fail needs lots of devices, and that’s the tradeoff that you just described. So do you do wattage, frequency shmoos, and low-margin tests in production, or electrically to get closer to the right content?

Cavagnaro: All of the above. Unfortunately, that’s why people complain. Product development is expensive. At the end of the day, we’re the only thing that stands between all the dreams and hopes of everybody in the company and the reality of the customer experiences.

Ruiz: Are you saying that for the functional test, ultimately, it’s what you observe on silicon?

Cavagnaro: Yes we’re doing both. We are trying to grade them, but I don’t trust the grading so I measure everything. For first silicon, we’ll use what the models say, but it’s always wrong, because when you mix process with the actual interactions of design it’s just always wrong — especially now. We have all sorts of clock gating and power management, etc. The operating model of the SoC today is incredibly complicated with respect to how it works in the real world, and that doesn’t translate really well to a test. There’s a big gap, and we have to span that divide. The models don’t predict, for instance, this process corner with this design, that marginality coupled with the package state, and all of a sudden you have a problem. No model in the world is going to tell you that pre-silicon. It’s just too elaborate. You need silicon data to really answer some of your questions.

SE: That gets into how many patterns you can run and at the different test insertions. What about system-level test?

Hilliges: There was a time when people said, ‘I must have SLT because I cannot trust that my structural tests will cover functionality issues.’ People decided to invest in SLT. While this may be necessary, it is not sufficient. So people now are saying, ‘I need much more targeted bare metal content to prove that all my paths from any core in my 3D package to any memory in my 3D package are working at the proper speed that I guarantee at the end. You cannot effectively program this in a system-level test. We need more bare metal content that is targeted and explores all the complex interactions that are possible between IP. In fact, that’s the kind of partnership with the development engineers you have to learn — understanding the necessary stress elements on top of functionally exercising everything — and bring those kinds of graded content into production test.

Basoco: If you’re one type of SoC device, you can have a lot of control. But as you aggregate it into lots of different chiplets, it becomes really difficult to get there. It’s almost that thought process of, ‘Let’s just do some gut checks and make sure everything in the signal paths that I care about is working correctly.’ But knowing in what corner and in what condition I have to be working is becoming really difficult. I’ve heard SLT is the answer. But we have to do mission-mode type testing. These are different types of tests that you keep doing on ATE. But what are we doing holistically from a test coverage or testability standpoint for our devices as a whole?

Cavagnaro: Bare metal is required to get access to certain things. Mission mode is required to also get after certain things. It’s become wildly expensive. It is known in the industry that silent data corruption is a huge problem. It is the tail of the tail of the tail of what you missed. It is exceptionally hard to catch, and exceptionally expensive. I’m embarrassed to even say how much time I spend on system-level tests today. We spend hours on it. When you’re building 2,000 to 4,000 mil active area devices, with 99.99% coverage upstream, you’re still going to have thousands of DPPM going out the door. With leading edge processes, even with a low defect density, once it’s multiplied by 3,000 mils you just have an impossible surface to cover.

Read part one of the discussion:
Doing More At Functional Test
New approaches for cutting costs and improving reliability for increasingly complex chips.



Leave a Reply


(Note: This name will be displayed publicly)