Demand for IC Resilience Drives Methodology Changes

New ways of connecting design, verification, test, and in-field data are needed for longer lifetimes and more critical applications.

popularity

Applications that demand safety, security, and resilience are driving new ways of thinking about design, verification, and the long-term reliability of chips on a mass scale.

The need is growing for chips that can process more data faster, over longer periods of time, and often within a shrinking power budget. That, in turn, is forcing changes at multiple levels, at the architecture, design, verification, and test phases, each of them bolstered by in-field data analysis that is looped back to add incremental reliability improvements at every step of the flow.

“For design engineers, one of the critical changes is engaging with people who will benefit downstream from the augmentations that are done at the design stage,” said Aileen Ryan, senior director of portfolio strategy at Siemens EDA. “For the DFT engineers, there’s already quite a strong link with the test community. By adding embedded analytics, there’s also a link between embedding IP up front in the design, and using that in the bring-up phase, where you’re trying to get your chip out to market, drive down costs, and maximize revenue by getting to market as fast as you can. One of the critical things for design engineers is to look beyond that stage, meaning once that chip is shipped, what additional capabilities have you enabled that chip to have? Also, who’s going to benefit from that? And how are they going to benefit from it? One of the unique changes in the mindset, and the relationships that need to be built throughout that value chain, is that link between the people who are designing the chip upfront, and the people who benefit from the design augmentation, which can last during the entire lifetime of the chip.”

Even before the terms were coined, the original intent of silicon lifecycle management/solutions was to bring a sense of what’s happening inside a chip as it goes through its lifecycle, along with how it behaves in the field, said Uzi Baruch, chief strategy officer at proteanTecs. “When you want to validate your design, and correlate between the first silicon that arrives with what you had in mind when you first designed all the simulation points and so on, in-chip sensors play a key role just before you go into first phases of the new product introduction process and high-volume manufacturing, to get a sense of what’s actual versus planned.”

This helps to break down the siloed thinking that chip designers have long operated with. Although strides have been made to bring hardware and software teams together, a substantial change in mindset is both necessary and underway. And while product lifecycle management initiatives have been around for years in some industries, they are relatively new for semiconductors.

“Silicon is actually late to the game,” said Frank Schirrmeister, senior group director, solutions marketing at Cadence. “What’s propelling this now is, if you look into product development, a lot of what we do today is pre-silicon, after which comes lab testing, then the actual life cycle of a product. But what do you do throughout a lifecycle? It’s driven by things like long-lifecycle markets, and needing to be prepared that this device will be available for 20 years. One problem that you have to deal with is obsolescence. You have to prepare yourself for a very long lifecycle market. This is happening in areas like automotive for some components, and it has to be taken into account at the design stage, which means things like redundancy, and flexibility for enhancements.

To some extent, this is an extension of design for test (DFT), which now spans from the architectural stage all the way through manufacturing. DFT has been growing steadily more complicated for the past several years as designs become increasingly complex and heterogeneous. The next step is to push it into the field, where it can be enhanced with a variety of sensors to monitor chip and system behavior and to identify irregularities before they turn into serious problems.

“It was clear there was a trend toward wanting to do more and more testing, not just in manufacturing, not just before you ship the part, but in the field with the idea of power-on test, and so on,” said Steve Pateras, senior director of marketing for hardware analytics and test at Synopsys. “That grew into a concept of continuous testing, which evolved into the understanding of the need to monitor devices continuously, and optimize what they’re doing. This is because in the lower process nodes there are a lot of critical applications like automotive ADAS, as well as IoT applications that are critical, along with data centers, where the growth has exploded and the costs are unbelievable. You need reliability, you need performance, and given the complexity of these systems, something more was needed beyond just testing a part and shipping it. This has grown into the concept of ‘silicon lifecycle management.’”

Fig. 1: Silicon lifecycle components. Source: Synopsys

Fig. 1: Silicon lifecycle components. Source: Synopsys

The idea here is to be able to manage parts after they are shipped, which is essential due to circuit aging as well as in-field software updates. Historically, the focus of most EDA companies has been how to create and verify the best designs, but those efforts essentially stop once the part is shipped.

“We need to do more than that,” Pateras said. “These parts just cannot be left alone anymore. You can’t just put your head in the sand and hope these are going to continue functioning properly throughout their life, so we need to do something to provide that kind of support maintenance of these parts. The natural thing from a test perspective was to extend test — extend the ability to monitor and analyze parts in the field. That required two things. It required the ability to extract data from the chip, along with some kind of analysis of that data, which are the two cornerstones of SLM. But once we start thinking about that, we realized this data can be used not just with parts in the field, but it can actually help us understand all aspects of the device through its lifecycle, from design optimization through yield ramp, through test optimization, through bring-up, time to market, and ultimately in-field. It really expanded into this whole lifecycle management concept. This touches many aspects of semiconductors in general. How to design, how to manufacture, how to test, how to debug. It’s all about data analytics. It’s all about gathering data, then analyzing it to understand the issues and trends. Ultimately you want to be able to react to that analysis so it’s all about optimization, and how to send information back to change things in a full loop.”

ig. 2: Silicon lifecycle solutions encompass the complete cycle of design, realization and utilization, addressing debug, test, yield management, safety and security and in-field optimization. Source: Siemens EDA
Fig. 2: Silicon lifecycle solutions encompass the complete cycle of design, realization, and utilization, addressing debug, test, yield management, safety and security, and in-field optimization. Source: Siemens EDA

Pieces of the puzzle
In-chip monitoring, and analysis of the data generated by those sensors, is just one piece of the required technology. That data also needs to be stored throughout the lifetime of the product.

“You need to have the ability to store that data, massive amounts of data, across all the product lines,” said Randy Fish, director of marketing for silicon lifecycle management at Synopsys. “You may be shipping millions or tens of millions of components, and there must be an ability to store the historical test data or parametric data, as well as the data being gathered in the field, and be able to then, throughout the lifetime — which could be 10 or 15 years in an automotive application — continue to run analytics based on new information or augmenting that data. There are technologies available today without which we wouldn’t have been able to do this, namely big data analytics and machine learning. These are giant tools in the toolbox, opening up huge opportunities. Gathering data could be done before, but doing anything with the data is still a fairly recent occurrence that you can deal with this massive, sometimes unstructured data.

Silicon lifecycle management also needs to include verification, noted Rob van Blommestein, head of marketing at OneSpin Solutions. “Without continuous verification effort, the chip can be susceptible to security and safety issues. Devices must meet up-to-date safety standards as well as protect against the evolving hardware vulnerabilities.”

Formal verification is ideal to handle this ongoing effort. Its exhaustive nature is effective in proving the absence of vulnerabilities and safety-related issues. But verification, in general, also has evolved into a continuous process that never really ends.

“Even if your designs are out in the market, it makes sense to continuously improve your tests and show that you’re working on old designs, as well,” said Shubhodeep Roy Choudhury, CEO of Valtrix Systems. “From time to time there are some improvements that can be made, and then you would want to run that on your previous designs, as well. When a design is being verified in the simulation, you might have an earlier design being verified in the silicon. So it’s a continuous process, and we have to continuously improve our tests and get better coverage.”

Keeping systems up-to-date is a challenge, particularly in markets such as automotive where older vehicles will have to share the road with vehicles that utilize newer technologies. Staying current requires continuous firmware and software updates. That means a company responsible for the chip will need to write a verification plan before starting a project, detailing how it will verified through its life.

Looping back
One of the benefits of lifecycle management is the ability to loop reliability data back into the design process, which in turn can be used to reduce the number of defects. This is particularly useful when redundancy is built into chips and systems, because it provides baseline comparison data for analytics.

“When doing design and verification, we are building in redundancy mechanisms so that most of the functions are doubled up, and have a backup option if one fails,” said Aleksandar Mijatovic, digital design manager at Vtool. “We are also adding in-line testing for silicon defects in order to detect them. There will be defects over the lifecycle of a chip. It’s inevitable. You do silicon iteration. But what we can do is prevent the end user getting in trouble because of that. This is nothing new. All of these concepts are as old as silicon technology.”

But there have been improvements recently with the addition of sensors in digital chips. “This is something quite new,” said Mijatovic. “CMOS sensors for analog we are all doing. There are certain techniques to check the quality of silicon. You can always try to go through some PLL checks such as, ‘What is max frequency that is possible at the time?’ But it is rarely used. You’re just trying to check whether your silicon is still operational on the standard frequency it was supposed to use, and do checks like that. Techniques to do that have escaped from DFT up to the functional logic BiST, memory BiST, and there is another set of techniques for safety-related issues. All of those are brought together in the latest automotive and safety standards that the industry follows.”

Fault injection needs to be added into this process, as well. “There is safety logic, but on top of it we also need to make sure that we have ways to inject errors to make sure we are able to test this, even on chip, not just in simulation,” noted Darko Tomusilovic, verification lead at Vtool. “On top of all the safety mechanisms we now add additional test logic to be able to access it to make sure it really works.”

On top of that, aging effects need to be determined and inserted into the data sets to determine how a device will work over time.

What’s changed
While there are many technology approaches for examining and analyzing the lifetime of silicon, there are no open standards for data models. On top of that, there is no agreed upon mechanism for sharing data across the supply chain, a problem that is becoming more challenging as chips are used in safety- and mission-critical applications, and as more IP and even chips and chiplets are packaged together.

“We absolutely will need to get there in order to realize the fullness of the vision,” Siemens EDA’s Ryan said. “It’s an emerging concept. Does what we have today add value? Absolutely. But we’ve still got work to do.”

Nevertheless, the work to collect, analyze, and act upon all types of chip data in new ways could help design and verification teams to understand the impact of a design decision on a manufactured device.

“Previously, you would have released a board by checking its power supply, for example, and based on metrics and tests that you would run, you would have seen ‘pass’ or ‘fail,’ and that’s it,” proteanTecs’ Baruch explained, “Now you can bring in data from how the chip behaves, and suddenly you see voltage fluctuations that you have not seen before. Previously, the only thing you would have seen is that was a passing board, and moved on. Now you can see the behavior, and maybe it is tied to the software, because when you unload the software into that game, for example, you have the software on the chip, you have the software on the board, and now you understand the impact of the actual final behavior of the system versus the different chip components.”

The ability to develop a methodology for defect-free devices and reliability over a product lifecycle relies on many traditional concepts, but it brings in a new data set. “The concept of outlier detection that was done for ages on test can now be brought to life,” Baruch said. “The same concepts at the board level, for example, now apply on a system level because you get a sense of modular behavior that the chip is bringing to the table that you have not seen before. It may be different than the rest of the board, so from a quality standpoint it allows you to drive the root cause concepts much deeper than previously. You actually can measure how the chip is behaving using the same data set between the chip vendor and the system vendor just by comparing the chip readouts themselves, along with the data that chip is producing, how it was functioning on the chip side, final test, or maybe system-level test on the chip vendor side versus how it behaves.”

SLM approaches today contain sophisticated sensors, and/or on-chip monitors that measure a variety of metrics. These sensors/monitors need to be embedded throughout the design to make sure data is available as the design moves along the DV chain.

What harnesses the full power of the chip and test data is looking at it in conjunction with other data sources that exist in each stage via algorithms.

“Creating an outlier detection algorithm based in chip HVM both on test data, and the new deep data from the chip can bring a much better solution than just looking at one data source. The same goes to the system level where you can now take a structural tests like ICT, functional tests, along with the software which is associated to that, combine those data sources together to understand the root cause, variances and so on,” Baruch said. “Then you need an analytical platform that understands the data, and has the ability to combine data — different measurement data with that new deep data from different stages of the lifecycle. Then you need all the algorithms and the mechanics on top of that data to make it actionable. What do I do with it? What does it say? How do I interpret certain signatures that I see? Are they correlated, or are they standalone problems that I need to handle, or just ignore maybe? This gives you all the tools and capabilities to learn and understand what you were seeing and what kind of action should be taken.”

A gradual evolution
While this all sounds promising, it may be a few years before all bits of the methodologies are put in place and working at full steam.

Synopsys’ Fish said that while some monitors are available now, others are still in development, but adoption in the field is the longer pole in the tent. “We work very closely with the test community, and in volume production we have a very big footprint there now. We work with the ATE providers. Then, as you go into the field, there’s no single proven use case. There are a lot of moving parts. There’s the semiconductor supplier, there’s the foundry, there’s the end user. Is the end user the data center, or is it you sitting there accessing data? There are a lot of questions for both data centers as well as automotive applications, which are expected to be the two initial drivers. But over the next few years, the infrastructure will start to be built.”

Other tools, such as digital twins, can play a role here, as well.

“Essentially, the whole lifecycle management extends from the chip to the system, which then makes it a digital twin at that level, to the system of systems where you then can do things like optimizing a car based on individual usage,” said Cadence’s Schirrmeister. “Think about the potential positive environmental aspects in a smart city environment to help save energy, and optimize costs. The models of the data bring it all together. This is like system design 20 years ago and the notion of whether to design it top-down. At a chip level, do I have a representation of the whole chip? This is now becoming elevated levels higher.”

Conclusion
At the end of the day, transformation is happening in terms of how chips and systems are designed, verified, manufactured and managed in field.

“We’re showing the systems companies something they’ve never seen before, and which could not have done before,” proteanTecs Baruch added. “That changes the perspective of what else they can do with that, such as how to improve efficiency and quality. ‘I have a return, an RMA came back, how can I troubleshoot it?’ Before that it was a black box. Now I can get a sense of whether it’s the chip vendor, or maybe not. Maybe it’s my component. So many questions can now be answered. That is what’s transforming this whole industry.”



Leave a Reply


(Note: This name will be displayed publicly)