SLM Evolves Into Critical Aspect Of Chip Design And Operation

Silicon lifecycle management applications and techniques are gaining traction as chipmakers figure out how to more effectively use them.

popularity

Silicon lifecycle management has evolved greatly in the past five years, moving from novel concept to a key part of design flows at industry leaders such as NVIDIA, Amazon Web Services, Ericsson, and others.

Along with becoming a major focus for companies developing semiconductors, the use cases have expanded. While initially focused on post-silicon insights, SLM has expanded to cover the entire lifecycle of chips, from design through manufacturing and field operation. Also, there is growing interest to integrate SLM data with the design process, creating a design-data continuum to allow insights from field operation to inform future chip designs.

“Just over the last few years there is general awareness, which sounds trivial, but three years ago when you talked about SLM it was new to most people,” said Randy Fish, product line director for SLM at Synopsys. “If you went into a company and said, ‘Who should I talk to about SLM?’ people would say, ‘I don’t know.’ We’ve moved beyond that. Most people are very aware now. At this year’s ITC (International Test Conference), SLM had a strong presence. There’s presence at other trade shows, and there are dedicated events around RAS (reliability, availability, serviceability) now too, which is part of the SLM theme.”

Others agree. “A few years ago, SLM was an idea,” said Lee Vick, vice president of strategic marketing at Movellus. “It was something that people were moving toward. It felt like we were trying to convince people that this was a good idea. Given the ongoing challenges with design —that things are only getting worse with reticle-limit designs, with more advanced geometries, and now in the chiplet space and 3D stacked chips — it was ultimately going to move to something that we saw as an inevitability. Are we 10 years ahead of that? Five years ahead of that? Five weeks ahead of it? You never know where you are in the timeline. My current expectation is that we’re much farther along than I thought we were.”

At ITC, Vick said it was very clear that SLM has evolved from a concept to something that is moving into the implementation phase. “The hockey stick hasn’t quite taken off yet, but the raw adoption, the understanding, the acceptance, is absolutely there from everyone. I didn’t really see people saying, ‘Oh, that’s interesting, but not really all that useful.’ It was, ‘We have to do this.’”

Technically, SLM includes instrumenting the devices, capturing that data, then running the analytics, and folding that into the test methodology. There are various pieces up and down the chain. There are a variety of on-chip sensors from proteanTecs, Synopsys, Siemens EDA, and others, as well as analytics engines connected to them to interpret the output of those sensors both in manufacturing and throughout the lifetime of a device in the field.

SLM use is expanding
Various tools are available from commercial EDA suppliers to analyze in-silicon data generated by the monitors, with research underway to determine the best use cases for monitor IP once the chip moves to in-field use.

At the same time, it is becoming clearer who the users of SLM are. “The chip designer is responsible for getting it implemented, but who gets the most value from all this data and analysis could be test engineering and manufacturing,” said Fish. “It could be the in-field user. It could be the person who owns the fleet of vehicles.”

As such, there are many user personas for SLM given the growing use of monitor IP in a wide breadth of chips.

“In some cases, we have very deep knowledge of how our monitor IP is used,” Fish said. “And in some cases, we really don’t know how they’re using it. We know that things like AVS or DVFS have been going on for a while, and this is an area where we’re looking to provide more value. We’ve talked about power management and low power design for 30 years, and there’s stronger interest today in being able to add more monitor information and providing firmware stacks to manage data.”

When focusing on DVFS, for example, path margin monitors can be inserted in to a design. That information is then fed back through the design process, so optimizations could be made in the cell selection, Vmin adaptation, etc.

“There are use models for these kinds of IPs,” said Adam Cron, distinguished architect at Synopsys. “Selecting the endpoints for the paths, for example, is a process of physical layout and timing analysis and topology analysis, etc., to figure out what goes where so we can feed that data back and see the difference between silicon modeling versus pre-silicon models delivered to customers prior to perhaps any tapeout. Also, for Vmin adaptations, the raw SLM data, the temperatures, the voltages, the timing margins, etc., we’re also using DFT infrastructure, because that gives us massive observability. You can imagine a logic BiST or a memory BiST. We’re not just looking at a sensor dropped in this little, tiny piece of a design. We’re looking at many effective sensors in terms of flip flops or memory cells, as gathering data indirectly or virtually directly. So we’re combining that, and see that as a place to move to.”

How this data gets used is evolving, as well. In the past, it was largely to determine how devices were behaving in real time. But that data also can be used to add resiliency into designs as they age or as workloads change without focusing just on redundancy.

“Everyone wants to operate on the lowest margins, but the exponential increase in demand for more compute power means these minimum margins may be eaten up very quickly, compromising reliability,” said Noam Brousard, vice president of solutions engineering at proteanTecs. “Ironically, these applications are the ones with the highest reliability requirements. On the other hand, accommodating worst-case scenarios means that most of the time we are over-provisioning and leaving performance on the table. One approach would be to dynamically adjust operating points, the voltage and frequency, so margins are always kept at a minimum, but still a safe amount. But to truly squeeze maximum power/performance, these should be optimized not only per process variation and silicon aging, but also per instantaneous workload. This can be achieved by monitoring performance margins in a device and adjusting voltage and frequency so that the margins are always at the desired level. If margins do become critically low, such a mechanism must readjust quickly to maintain reliability — in other words, workload-aware power customization with a safety net.

SLM’s evolution from DFT
Where exactly did SLM come from? Marc Hutner, director of product management for Tessent Yield Learning at Siemens Digital Industries Software, said that over the past decade, while working on the Heterogeneous Integration Roadmap, the group started to see theme develop around where DFT got used, what customers wanted to do with it, and the need to extract data about temperature and other physical effects. “That started five to seven years ago. It became clear that there are certain markets where they really wanted to get the data out. It’s really getting to the point where we can integrate use of that data in a lot of ways, and how it impacts the system. There are a bunch more use cases, beyond adaptive voltage and beyond adaptive frequency. What we talked about in the Heterogeneous Integration Roadmap was the ‘mostly good die.’ How do you keep it working for a longer period of time? At that point it might be that you turn the frequency down a little bit. That way maybe it’s a de-rated product, like your phone saying, ‘My battery’s low, I want to keep it alive for a while longer.’ There are a lot more applications like that we’ve just started to scratch the surface on.”

High speed access in test, aimed at solving issues around bandwidth related to test, is another SLM application. “As chips get more complicated, you need to do more scan vectors,” Vick said. “You want more I/O pins on the tester to do that, because they only run at a couple hundred megahertz, and you need more of them when physically they’re being reduced in terms of the number of pins they have available.”

In cases where those conflict, existing high-speed serial interfaces like USB and PCIe can be used to move the test vectors back and forth. “We need to translate from the tester world into packets that go over the network, then turn that back into scan inside the device,” Vick said. “That provides two benefits. First, it solves the problem of having a wide-enough pipe to move the data back and forth between the two. Second, it breaks the requirement or the coupling of tests with a physical tester, because if the signals can be electrically sent over PCIe or USB, that can be done in the field. I don’t have to physically be on the tester, and I can’t do environmental things, but I can at least run the same tests on the tester as I do in the field, and that is a significant piece of SLM.”

Engineers are now talking about doing the same thing with UCIe. “With UCIe environments and stacked die for HBM and other things, you physically can’t get to all the places you need to, so you have to have some kind of test conduit,” Vick said. “There are a bunch of very smart people in the test world who are solving that. The concept of a high-speed interface for test that’s decoupled from the physical tester has been accepted universally. I didn’t see anyone who wasn’t on board.”

One of the big challenges with in-circuit monitors is how and where to integrate them. “It’s always that horrible dichotomy that the place you’re most interested in is the hardest to get things inserted,” Synopsys’ Fish said. “You’re interested in the timing margin, temperature, the data path. It’s running at the highest speeds, or in your NoC that’s on 24/7, and you’re trying to measure things. The most important thing is to be able to integrate that part in the design flow to achieve timing closure without blowing out your critical path. We spend a lot of time working with the implementation teams — our own developers, as well as customers’ — to make sure there are flows where the monitors don’t affect the ability to close the design.”

Architecture impacts
That’s part of the challenge. The other is what you’re trying to achieve with these monitors.

“What are the right things to measure and to enable that would enable a system feature or ensure the system reliability in whatever market it’s in?” said Siemens’ Hutner. “In my old architecture role it was always, ‘Is it belts and suspenders?’ meaning, ‘Are you putting in too much stuff?’ How do we put in the right stuff where they can reach their system goals, which also links back to the mission profile statement, i.e., what do you need to prove? In automotive, for example, the purpose of a mission profile is to have that contract that says, ‘I know I can measure these things to ensure that safety is reached.’”

All these considerations affect the architecture in a big way. “It’s becoming a big part of chip architecture,” said Vick. “In the old days, we didn’t have DFT. We just designed a bunch of gates and hoped it worked, then handed it over to somebody to go figure out how to test it. We quickly figured out that wasn’t going to work, that you must design with test and manufacturing in mind, and that’s an important part of design today. Similarly, SLM is going to follow the same trajectory, where as you’re designing your part, you’re making architectural choices based upon how you’re going to capture data and what you’re going to do with that data.”

For example, some of the technologies that exist in the industry allow the architects and engineers to get insight into what’s happening in the power delivery network.

“People use this in a few different ways,” Vick noted. “One thing they’ll do is put sensors into their device, either for in-field silicon or in the lab. Then they will beat their networks to the point of failure so they understand exactly how far they can go. Their architectural approach is, ‘We’re going to make sure that our power grid can survive and will handle what would typically be a droop excursion.’ You turn a bunch of things on at once. You get inrush currents. Your power supply droops. They say, ‘We’re going to solve that problem by raising Vmin.’ The challenge there is that if you raise Vmin too high, then you cause a problem. You’re burning more energy than you need to. You’re causing more thermal stress on your parts. You’re reducing reliability, and you’re not as competitive from a PPA perspective. So you want to over-design it by just the right amount. In the absence of sensing technology, they can’t do that. With sensing technology, they can do that. And so architecturally, it’s impacting their designs. Other folks say, ‘I don’t like that approach. I want to be even closer to my Vmin threshold, so what I’ll do is architect my device so that in the field, 80% of the time, things work fine, maybe even 90% of the time. But occasionally, based upon workload, environment, other conditions, I’m going to get an excursion. I could design around that by raising Vmin, but I don’t want to. I want to be able to resolve that excursion. So architecturally, I will put in both a sensor and a way to mitigate.”

If the sensor data says that something bad is going on, the system can slow down the clock or implement some other mechanism to allow it to solve a droop event. “That buys the performance back,” Vick said. “And so architecturally people were saying, ‘I can go this way, or I can go that way.’ That capability is designed in at architecture time so they’re architecting their devices based upon how they’re going to handle instrumenting and analyzing and resolving the issues that might come up. And that’s just going to happen more and more. A part of the fabric of architecture is understanding what your design methodology is and how you’re going to mitigate issues in the field.”

Managing SLM data
SLM generates a lot of data, and that data needs to be managed. Keysight Technologies takes a complementary view on SLM, which it calls engineering lifecycle management (ELM). “ELM is very similar to silicon lifecycle management,” said Simon Rance, director of product management and strategy at Keysight. “The only difference is that we’re addressing aspects of SLM where PLM systems don’t really play in that lifecycle. PLMs are more for the bill of materials, for the manufacturing side of things — primarily parts management. We looked at our hub, which originally was focused on IP management and IP-based designs, and as we started engaging with more customers, we saw this gap. The gap was already viewed and seen, and that’s where SLM came into play. The difference is almost a boundary of where we provide solutions to, and where we complement. We manage all of the process and design data management across the SLM. We don’t have natural hooks into existing on-chip-based monitors, debug- and trace-type systems yet, but because the data is the same data that’s coming in as part of that lifecycle, we just haven’t got there yet within our existing customer base. We treat that silicon lifecycle management as agnostic to the chip monitoring sensors and data. We just receive the data, wherever that data is, and it stored as part of that lifecycle management, with all of the other aspects of the data. That is for the entire management lifecycle, from the ideation, the requirements, all the way through to in-field, in-test, getting that data back and having that full traceability, especially for automotive.”

To further integrate SLM with existing EDA tools and enterprise software, what likely comes next is focus on integrations that allow for a seamless flow of data and traceability across the entire lifecycle, as well as:

  • EDA tool integration. Tight integrations with major EDA vendors would ensure that design, verification, and test data can be automatically captured and fed into the SLM platform.
  • Enterprise software integration. Stronger connections between SLM and other enterprise-level tools like requirements management, bug tracking, and configuration management systems are needed to enable a single source of truth for all lifecycle data.
  • Open APIs and industry standards. These can facilitate integration with a wide range of third-party tools and systems for flexibility of use, and allow users to leverage their existing investments.
  • Data normalization and correlation. Implementing data normalization and correlation may help ensure that data from disparate sources can be normalized and correlated within the SLM platform. This is expected to enable comprehensive analysis and traceability across the entire lifecycle.
  • Advanced analytics and AI/ML capabilities. These need to be developed further. Leveraging the centralized data within SLM to power advanced analytics and AI/ML models will help drive optimization, predictive maintenance, and other data-driven insights throughout the product lifecycle.
  • Intuitive user experience. By making sure that the SLM platform offers seamless and user-friendly experiences, engineers and cross-functional teams will be able to easily access, visualize, and interact with the lifecycle data.

Conclusion
While chip architects and designers have a better understanding of SLM today, there are still challenges to implementation. “As a system architect, it’s not only the area, but also the time to implement it, the time to validate it, and will somebody use it?” Siemens’ Hutner said. “There’s always the possibility of unintended consequences, but the hope is that by putting them in, people will use these monitors to find a benefit to then use these as system features. The hope is that the first couple times they implement these things they see value in their use, and that way they’ll do more, and conduct thought experiments around what else they can learn from their system.”

Demand for SLM could come from multiple places. “The idea of implementing SLM might come from different areas within the company,” Hutner continued. “It might come from the system architect who says, “Last time we had this problem, maybe we should have put in the embedded analytics solution for bus functional monitors because I want to be able to see this and get it to be a black box.’ In other cases, it could be the operations guys asking for certain features that say, ‘We really need to be able to measure this, because last time it got too hot or we didn’t know where the voltage is.’ So, it can come from many different places. The question is how to get that buy-in to invest the area, the time, the effort for these features. Different companies have different metrics on when to include them, so it’s good to understand what that process is and what the impact is. Are they doing it because they have an end customer that wants to do something, or are they a fully integrated solution, from silicon to end system? There are different goals depending on who’s asking for it.”

What’s clear, though, is that SLM is poised to become an even more integral part of the semiconductor industry, with expanding applications, more sophisticated analysis techniques, and deeper integration throughout the entire lifecycle of chips and systems.



Leave a Reply


(Note: This name will be displayed publicly)