Enablers And Barriers For Connecting Diverse Data

Integrating multiple types of data is possible in some cases, but it’s still not easy.


More data is being collected at every step of the manufacturing process, raising the possibility of combining data in new ways to solve engineering problems. But this is far from simple, and combining results is not always possible.

The semiconductor industry’s thirst for data has created oceans of it from the manufacturing process. In addition, semiconductor designs large and small now have on-die circuitry, which provides additional electrical test information flowing into these data oceans. Engineering teams need to manage this deluge of data and they need to facilitate consuming the data by diverse engineering teams.

More data can be connected together than in the past, but not all of it and not 100% of the time. Even where it is possible, it often requires an up-front, concerted engineering effort. For any one product, connecting all data sources into one system or model is not necessarily practical or essential for everyday engineering.

The complexity of CMOS technologies has increased both the amount and the types of data collected during the manufacturing process. For the most part, engineers have used this data in a siloed fashion, particularly during wafer fabrication, the assembly process, and the associated test processes. To face the challenges of advanced CMOS process nodes (22nm and below), engineers increasingly have turned to merging different types of data to meet yield and quality goals. Examples include:

  • Tool history data and unit-level test results to determine root cause yield excursions;
  • Product physical layout data and volume test to improve manufacturability;
  • On-die circuit monitors and test data to tune test pass/fail limits, and
  • 100% inspection and test data to optimize test cost.

“The desire to combine many multi-faceted data sources has been prevalent for a long time. There is a direct correlation between having many different data sources available to analyze and achieving higher quality of results, since there is more data you can correlate together to help isolate issues,” said Guy Cortez, product marketing manager in digital design group at Synopsys. “However, what has held customers back is their inability to collect all of the various data — and if they had the data, understand how to parse, align, normalize, merge, and stack the data in a reasonable amount of time.”

On-die monitors, which are becoming more common in safety- and mission-critical applications, where devices are expected to perform consistently for longer lifetimes, have only added to the amount of data being generated.

“As we implement silicon lifecycle solutions, the issue of data management is very much a live one,” said Aileen Ryan, senior director of portfolio strategy, Tessent silicon lifecycle solutions at Siemens EDA. “There are now lots of data sources. The manufacturing process (including manufacturing test) is one. But functional, structural, and parametric monitors embedded in the chip can also gather information during the initial bring-up and debug phase, and right through the chip’s life in the field.”

This trend of combining diverse data sources also can be observed in mature technologies, which commonly have lower ASPs. In these technologies, combining diverse data sources enables engineers to respond more rapidly to yield excursions and to squeeze an additional 2% to 3% yield at final test. The same parallel can be observed in both advanced and mature packaging technologies.

“Advanced manufacturing facilities have been combining data for the purposes of operational improvement for more than 25 years. This is not a new activity for successful manufacturing facilities. In more recent years this activity has gone mainstream due to changes in the performance and advancements in architectural capabilities for large scale databases,” said Mike McIntyre, director of software product management at Onto Innovation. “As these databases can now exceed 100 or 200 terabytes, they are still effective at retrieving the requested data in a timely manner. Tools that help process and cleanse data prior to its entry into the database, as well as interfaces a general user can navigate to access this data, have helped make these large data stores ever more valuable.”

Information and smart manufacturing technology enablers
Engineers always value more data, but it needs to have context to enable decisions both in real-time and months or even years after its creation. Data storage and database architectures provide the foundation upon which an engineering query can be made. Computational options follow suit, and they need to consider the types of decisions being made.

Those decisions range from simple to complex, with the latter heightening the demands on computational architectures. Standard analyses like statistical process control charts and yield dashboards assist in general monitoring of factory operations and of product health, respectively. Taking it up a notch is the ability to combine massive volumes of test data in wafer maps, and then permitting drill-downs to discern anomalies and tool history signatures.

For more than two decades, engineering teams have relied upon machine learning for wafer and package inspection. But it’s the prospect of connecting more diverse sets of data — inspection, wafer test, unit test, tool history — that prompts engineers and data analytic platform suppliers to delve into the data oceans with deep learning computation.

To meet the high quality and reliability expectations, 100% inspection is being extended to more wafer layers and assembly processing steps. This results in a significant increase in data to manage and consume. Traditionally, engineers have used this data as feedback for specific equipment and processing steps. Now they are feed-forwarding inspection data for some applications to augment pass/fail decisions at test.

Another contributor to the data deluge has been the adoption of IoT in factories and subsequent streaming of IoT sensor data. This enables real-time processing of data and, hence, real-time reaction to perceived anomalies.

“In wafer front engineering processes (e.g., etch, deposition, atomic-layer deposition) there’s more capability in embedded devices, which has allowed compute resources to be applied directly to data processing, streaming, and integration. It can happen locally, or in close proximity to the instrumentation that is generating data, or be aggregated together a little further away,” said Eli Roth smart manufacturing product manager at Teradyne. “Improved bandwidth and edge compute capability support streaming analytics. Time-series analytics, with integration of disparate entities in processes or flows, are enabling connections and inferences to be identified. Prescriptive analysis converts all this new interconnected data into actionable intelligence.”

Key to connecting diverse data is integration in the factory setting.

“The big changes that have enabled our customers to embrace end-to-end product data are factory integration with greater connectivity of tools/equipment, more sophisticated equipment and testers that generate more data, and massively scalable cloud-based analytics solutions that offer lower data storage costs per terabyte with higher performance storage and retrieval of data for faster analytics,” said Greg Prewitt, director of Exensio solutions at PDF Solutions.

In the back end of assembly and test services, engineering teams also reap the benefits of IoT real-time sensor data to adjust downstream equipment. Reviewing data from a test floor of hundreds of test cells with data visualizations eases the burden of identifying inefficiencies. All of this necessitates more investment in IT infrastructure for storage, networking, and computation.

“Containers, Kubernetes, artificial intelligence algorithms, and connected devices and equipment with sensors have been recent key technology advances,” noted George Harris, vice president of global test services at Amkor Technology. “The advent of data transmission and higher compute architectures (at the source, edge, data center, and cloud) along with specialized processors (CPUs vs. DSPs vs. GPUs) and memory/storage have been critical to move and process the exponentially growing magnitude of data.”

Cloud storage, cloud computing, and machine learning are necessary to cope with the oceans of data that come from diverse data sources.

“Most important are cloud-hosted data analytics platforms that enable deep data from multiple sources, sites, and geographies to be uploaded into a single coherent and scalable data lake. Then, machine learning is the best way to process and produce meaningful and actionable insights out of these large data sets,” said Nir Sever, senior director of product marketing at proteanTecs. “The optimal approach is to know what to look for when generating the data in the first place. If the data is extracted with analytics in mind, that provides a much more coherent picture of the issues at hand.”

Others agree on the need to extract data with analytics in mind. “Specific data type tagging becomes critical as data size grows,” said Melvin Lee Wei Heng, applications manager at Onto Innovation. “How accurately and how quickly the data tags can reference to data becomes important as data sizes grows. Close to real-time processing of data could become a norm in the future when data is collected.”

Challenges to combining data
Recent advances in computation architectures, database structures, and data storage technologies enable the combination of different data types. But operational barriers persist to fully make it so. Two common issues are the lack of standardized data format in assembly equipment and the lack of compliance to the existing STDF format for ATE-generated data. In addition, the fragmentation of the semiconductor device supply chain complicates leveraging data from different sources.

Those issues can be overcome, but not without tedious engineering effort. While some standards do exist for porting data from the fab, assembly and test equipment, there are no standards for naming conventions (a.k.a. data governance).

“A key challenge is data naming capability. As an example, some call it WAT (wafer acceptance test), some call it WET (wafer electrical test), and some call it SST (scribe structure test),” said Onto’s Lee Wei Heng. “Also, the data format structure is a major challenge. As back-end packaging becoming more critical in the supply chain, there is still no standard format data structure. And what we are observing in the market at this time is that many companies struggle with different format outputs from the vast amount of equipment available in the back-end packaging world. Parsing and formatting the data before loading into a database structure becomes a common task that is needed.”

An often-unsaid challenge is no single engineer or engineering team understands all the data in the vast ocean that is stored locally or up in the cloud. This is particularly true in the foundry/fabless model. Consider that for any design house an IC device may have at least two foundries and two OSATs manufacturing their products. Aligning data between these factories so the design house can view the overall picture is an operational barrier due to data governance issues and data security concerns.

“One of the key challenges is the disaggregation of the industry and the move to contract manufacturing flows. In these cases different parties own the tools, which are the sources of data, as opposed to everything being under ‘one roof’ in the IDM models that were dominant years ago,” noted Ken Butler strategic business creation manager at Advantest America. “These new boundaries make it more challenging to integrate data sources from different players and still maintain everybody’s information security and intellectual property.”

While the fabless/foundry eco-system continues to exhibit the tension of closely guarding process and design IP, this is beginning to change.

“IDMs have an inherent advantage from fewer hurdles to sharing data across design, manufacturing and test domains,” said Jay Rathert, senior director of strategic collaborations at KLA. “But there is a huge incentive for foundries to find a path forward to sharing data at an appropriate level of granularity that preserves their process IP, but which allows test engineers to better select the appropriate test regimen for each incoming device to reduce the incidence of low-reliability die escaping into the field.”

Engineers benefit from connecting diverse data sources across the semiconductor manufacturing supply chain. Combining data in new ways supports their missions of increasing quality and yield, and at the same time reduces manufacturing costs.

Analytics platforms to connect multiple data sources exist today. Yet managing the oceans of semiconductor device manufacturing data remains a non-trivial task. It’s not so much because engineers don’t have the technologies to store and organize the data. It’s more because the connecting the data requires domain expertise to determine which data sources to connect to each other. On top of that, there are data governance issues and data security concerns, which impede engineers from effectively and efficiently connecting diverse data types. So there’s not always a smooth integration of diverse data types, but engineering teams persevere in striving for those connections.

“Data leads to knowledge, and knowledge leads to one’s ability to effectively solve problems,” said Onto’s McIntyre. “Fundamentally this is the motivation that drives our customers to combine what initially was highly disparate data into an organized data repository. Without this combination of data, engineers in factories would be hindered in trying to solve their everyday problems.”

Related Stories:

Too Much Fab And Test Data, Low Utilization
For now, growth of data collected has outstripped engineers’ ability to analyze it all.

Data Issues Mount In Chip Manufacturing
Master data practices enable product engineers and factory IT engineers to deal with variety of data types and quality.

Cloud Vs. On-Premise Analytics
Not all data analytics will move to the cloud, but the very thought of it represents a radical change.

Making Test Transparent With Better Data
How the new test data standard can make the test floor more accessible in real-time.

Big Shifts In Big Data
Why the growth of cloud and edge computing and the processing of more data will have a profound effect on semiconductor design and manufacturing.

Leave a Reply

(Note: This name will be displayed publicly)