Why Data Format Slows Chip Manufacturing Progress

Adoption of a new format will take time, but it also will add consistency into data as volume grows.


The Standard Test Data Format (STDF), a workhorse data format used to pull test results data from automated test equipment, is running out of steam after 35 years. It is unable to keep up with the explosive increase in data generated by more sensors in various semiconductor manufacturing processes.

First developed in 1985 by Teradyne, STDF is a binary format that is translated into ASCII or other some format. It has evolved over the years, but it also has splintered as various companies have developed their own custom flavor of STDF. That has made the standard far less standard-like.

Teradyne gave SEMI a license to manage STDF as a standard in 2010. SEMI defined a memory fail datalog standard (SEMI G91) under STDF, but it disbanded the STDF taskforce in 2019 to instead pursue a more efficient, flexible standard with IoT-like features that could enable adaptive testing for real-time visibility.

This is important for a couple of reasons. First, advanced chips and packages are being used in safety- and mission-critical applications, and utilizing data to prevent field failures is essential. Second, many of the designs are much more customized than in the past, when an SoC was fine-tuned and extensively tested and then produced in billion-unit volumes. Current volumes in most market segments are much smaller, and being able to adapt testing around various packaging options and layouts is vital.

This is where the Rich Interactive Test Database (RITdb) fits in. It’s the next standard to replace STDF. The SEMI group CAST is borrowing from IoT to develop RITdb. The new standard will have a SQL database and MQTT messaging, which means data can move back and forth to a server. Potentially more of the team can see the data live during manufacturing and make changes on the fly to the manufacturing of the chip, but metadata will be consistent thanks to an electronic ID that stays with the chip.

None of this is going to happen overnight, of course. “This industry is a pretty slow-moving. Things just don’t change that often,” said Keith Arnold, senior director of solutions at PDF Solutions. “I suspect STDF will probably be around for a long time to come just because it’s just ubiquitous. It’s everywhere.”

The human factor
As with anything, people can gum up the works. STDF is clunky and limited — and its structure leaves a lot for interpretation. “One of the biggest problems we see with STDF is that almost every customer uses the fields differently,” Arnold said. “Even though it is a standard, nothing restricts users from putting in pretty much whatever they want. It can be as simple as someone just misspelling something because the operator has to hand enter it.”

Much of that is due to how the tester interfaces with the manufacturing execution system  (MES), a computerized system used in manufacturing to document a product’s manufacturing stages. Getting associations correct is important. That requires collecting the data and cleaning it. The data cleaning process is iterative because of all the personal recording styles and testing tricks.

“You just have to look at the data and try to, just from experience, know ‘Okay, I think this is what this is — what they mean, even though they’ve done the wrong field,” Arnold said. “I think this is what they mean. It takes someone who’s really done it before. We go through a fairly lengthy integration process — that’s where we go through and do that mapping. And it’s very iterative. And then it changes in time, which creates even more interesting problems.”

Here is what an iterative process looks like, according to Arnold:

“With engineers or the manufacturer people, we may look at it [the data], we say, ‘I think this is what you meant.’ And then we do something, and they go, ‘No, no, that’s not right. This is supposed to go there,’ or, ‘Oh, actually you know what? This field, what we need to do is we need to split that field into two different fields.’ ‘Really? Okay, fine.’ The format is a bit restrictive. And so, if you’ve got extra information you want to put in there, either you start concatenating a lot of things together, or some people do things — they’ll stick it in the file name. Just add another underscore and then some other parameter that nobody else knows what that means. But, so people kind of do whatever they have to do to get all this information in there. And then a lot of the fields, they just don’t fill in it all. Like for instance, ‘I don’t know what facility this came from. Did this come from Taiwan, or did this come from Malaysia? I have no idea, because that information is maybe not included. All that kind of stuff is really important when you’re trying to collect a large data set, and you’re trying to do what we call alignment. And the data alignment and all the proper association, it’s just not there.”

The catchall
Some operators favor dumping data into a generic record, called the Generic Data Record (GDR), in a long string that someone farther down the line has to decode. “Engineers start putting in txt records and generic records and they start fudging with the meaning in order to stick it in there,” said Mark Roos, the co-chair at SEMI’s RITdb effort and CEO of Roos Instruments. “Yes, people are managing to use STDF to do these more complex operations. But there are limitations and it’s really clunky. Everybody has a personal favorite way of doing it. This is a problem because how do you know what things mean, having to code it uniquely for each product.”

There is a lot of logging information that needs to be kept, and the only place to do that is the GDR.

“The Generic Data Record as it implies — it’s just a string,” said Arnold. “Stick anything you want in there. It’s pretty much asynchronous. It’s hard to figure out where did that occur in the string. It is just a dump of lots of different stuff, and oftentimes when we’re looking at the data, the engineer will ‘Oh yeah we stuck all that stuff in the GDR record.’ GDR has more data volume than the whole file practically, and it’s got zero structure. And so, you just stepped off into the Wild West, and that’s what you’re depending on to do some analysis.”

Still, it can be hard to avoid using the GDR. “Even from the design perspective, for instance, when we’re capturing data from the design, as we move into the realm of high speed interfaces, protocol-based testing, we find that you want to keep a duplication, for instance of the observed data coming out,” said Brian Archer, senior staff solutions architect at Synopsys. “But there’s no real way to capture the same data multiple ways in the same file without using the user defined registers or user defined section of the STDF. So that becomes a challenge, as well.”

No place for an electronic ID
The big problem is associated information. To make predictions at the die level requires some type of electronic chip ID, and a consistent way of identifying each chip. If the metadata is inconsistent, that creates a mess.

“If you are going make it a lot easier to do any kind of machine learning and mass volume data analysis, we’ve got to come up with a way to ensure that we’ve got consistent information that’s being populated in the fields. For instance, a lot of this information the tester already knows. The tester should just populate it. Honestly, there’s a lot of this metadata, it almost shouldn’t be the domain of the engineer to decide. There needs to be a tighter integration between, say STDF and the company who’s making the tester, so that the tester company ensures that the data is being properly applied and recorded and is consistent. And part of that would require some kind of an interface to their MES system. You can’t always necessarily rely on an operator interface, that the operator is going to provide all that because a lot of it is optional.”

“And you really would like a container that doesn’t care what you’re testing. It only cares that you can provide the structure and convey it in the future,” said Roos.

Controlling the quality of the metadata needs to be done during test by the tester, with an interface to the MES system. The idea is to automate this and remove the human from the loop in order have some consistency.

A yield management system enforces some of the good behavior. “When doing volume analysis … users quickly realize the need for having standardized metadata or it takes time to aggregate data by labelling the data on-line afterwards (which you can also do),” said Marie Ryan, marketing executive at yieldHUB, in a recent blog. “In this case, it is important that steps should be taken to standardize datalog metadata as soon as possible.” If you don’t have good data, you can’t run accurate yield information.

Storing data
Having enough storage for the data is important. YieldHUB offers up advice how to structure the STDF data to improve database processing. The STDF data is going into a yield management system and viewed in real time during test. Yieldhub stresses speed and if possible keeping the data storage down or being prepared with the right amount of storage. Row-level locking tables is one way to speed up the process.

“AMD has a slightly different format they use. They needed a more hierarchical type of structure, because the structure of STDF is a little dated, and it fits most people’s needs, but AMD has some special needs, where the data they’re collecting kind of branches,” said Arnold. “So it creates a fairly complex structure. But if you have the ability to store the structure and you understand what it is, it’s not that difficult to do the analysis. But STDF really doesn’t allow that. So they have several choices. They can either just stick it in an external file, or they can put it in other files, like those GDRs that generate generic data records. Or they come up with their own format, which is what they did.”

He noted that RITdb also wanted an unconstrained structure. “There are certain parts of the structure that we know for sure we have to have, because it’s just part of our business,” said Arnold. “There are fab lots, fab lots have wafers, wafers have die, they go through different processes — all that kind of stuff. That’s all fine and good but when it comes to the raw test data itself, you need the flexibility to be able to define some relatively complex structures. That’s where RITdb really comes in handy, is that it has the places for all the standard information, the standard structure, so it provides structure where structure is really needed, but it also gives some flexibility and more flexibility is needed, especially in the actual collecting of the test data itself.”

STDF is not going away
Despite all of the problems, STDF will be around for a while. In fact, RITdb has STDF features built into its block diagrams. But as chips become more complicated, and as they are disaggregated into advanced packages, the kind of flexibility and uniformity offered by RITdb will become much more attractive.

After 35 years, the industry apparently is ready for something new.

—Anne Meixner contributed to this report.

Related Stories

New Data Format Boosts Test Analytics

Leave a Reply

(Note: This name will be displayed publicly)