Enabling Silicon Lifecycle Solutions

Extensive monitoring of silicon devices in the field helps improve reliability, safety, and security.

popularity

The concepts of product lifecycle management (PLM) should be familiar, although the semiconductor industry has yet to adopt a system for managing the entire lifecycle of a product from inception through design, realization, deployment, and field service, right through to end-of-life activities such as final disposal. Now, a combination of business and technical pressures is bringing PLM capabilities to the silicon devices that drive today’s high-performance and safety-critical applications. Known as silicon lifecycle management, the pieces are falling in place to enable high levels of reliability, safety, and security through extensive monitoring and control of silicon devices.

Why silicon lifecycle management?

Progress in the semiconductor industry relies on three fundamental drivers: process technology scaling, design scaling, and system scaling. Process technology scaling allows ever-smaller circuitry to be built on-chip, which enables the design scaling that has made chips into fully-fledged systems in their own right. System-level scaling means the systems that chips are built into are ever more complex, pervasive, and critical.

These parallel scaling effects create a perfect storm of challenges for the semiconductor industry and its customers. As systems scale, unexpected inefficiencies, anomalies, vulnerabilities, and system-level interactions cause outsized effects in end-system performance.

But it is the consequences of increasing system complexity that are driving the adoption of silicon lifecycle solutions (SLS).  Ensuring the performance, safety, or security of a device first requires visibility into the SoC so you can monitor and assess it throughout its entire service life. Silicon lifecycle solutions will be required for the next generation of safety-critical devices deployed in cars but are probably closer to reality for the datacenter sector, where every tiny increase in latency has a significant cost associated with it. An example is the classic publicly-reported case of Google’s disk fleet, which once suffered from a low-probability performance effect that produced long latencies in data availability. These system-level effects frequently only surface at scale.

These system-level complexities come in addition to an exponential rise in the traditional challenges faced by the semiconductor industry – from silicon validation through manufacturing test, yield management, silicon bring-up, and hardware-software integration.

As a matter of course, greater product performance, reliability, safety, and security have direct bottom-line consequences.

What are silicon lifecycle solutions?

The concept behind SLS is to provide an infrastructure that gathers performance and other data at every stage of the development and deployment of a silicon product—and also makes that data available in an actionable form to the various participants in the value chain (figure 1). SLS therefore necessarily encompasses several different products and technologies, including:

  • On-chip hardware and design enhancements that gather the data.
  • A layer of service and management software to pull the data off-chip and deliver it to where it is needed.
  • An application layer that allows people to analyze and utilize the data.


Fig. 1: A silicon lifecycle solutions platform.

The applications themselves perform a broad variety of functions. For example, say the owner of a datacenter wishes to optimize its operation to reduce capex and opex, and assure their customers’ satisfaction. Access to appropriate information, based on data gathered from chips within the system, can be a powerful tool for achieving that goal. A similar information infrastructure might be used by an automotive manufacturer to enable predictive maintenance and avoid costly recalls.

The fact that these are two quite different use cases highlights the potential broad applicability of silicon lifecycle management—but also its complexity. In the value chain, the data center manager sits several steps away from the semiconductor supplier. For a provider of on-chip instrumentation IP and software, she is the customer’s customer’s customer! And yet the drive towards complexity makes it essential to enable an infrastructure that can support all three.

The adoption of SLS is a strategic imperative, but in practical terms, what does it look like? It starts with semiconductor companies that wish to establish a holistic approach to improving existing processes. For example, IC designers will use advanced DFT techniques to enhance IC quality, increase test efficiency, and enable diagnosis-driven yield analysis (DDYA). With silicon lifecycle management, designers will use on-chip embedded analytic IP to capture and analyze data related to test and yield, which feeds back into the DFT process. All of this reduces costs, increases quality, and speeds time-to-market for the semiconductor maker.

Moving further along the supply chain, functional monitoring augmentations directly benefit the IC makers’ OEM customers. Data gleaned during the operation of the chip is made available to the OEM, the manufacturer, or even the chip design house at the discretion of the end-user of the device.  For example, fine-grained data about the chip’s real-world functional behavior can be invaluable as a manufacturer goes through the process of systems integration and bring-up for an end product. Going a step further, the same types of data can be used for continuous optimization of systems after deployment.

Part of SLS involves the use of Embedded Analytics to identify potential reliability problems in chips, thereby enabling proactive, predictive maintenance – minimizing the cost of recalls and preventing potential safety issues.

It also becomes possible to address cybersecurity concerns. On-chip functional monitoring modules can be configured to look for events that contravene rules based on the chip design: for example, “no one is allowed to access the memory controller registers unless we are in a safe reboot mode.” Violations of this rule are detected and communicated so some pre-determined action can be taken. Another example is configuring the on-chip Embedded Analytics module to block traffic from a source if it exceeds a certain threshold of traffic in a given time interval, which would prevent denial of service attacks.

In addition to catching defined rule violations, the on-chip monitors can gather information about normal chip operation and pass it to an on- or off-chip processing resource. With enough data, we can build a statistical profile of “normal operation.” With that in hand, you not only flag cyber intrusions but also uncover outlier events that indicate something amiss either in the security or performance of the silicon.

With SLS in place, we can understand aging trends and provide early warnings for reliability problems, collect performance data, and gather forensic records of system failures that help determine liability (for example for insurance purposes). If these sound like system-level concerns, they are. The reality is that next-generation SoCs are truly systems in themselves.

SLS encompasses the traditional semiconductor value chain – design, manufacturing, test, and bring-up. It also reaches deep into the deployment phase of the device; provides information that makes it easier for customers to design-in the device and bring up end products; enables continuous in-field monitoring for preventive maintenance in the field; and ensures devices remain performant after field upgrades. It feeds information forward from the device manufacturer to OEM customers and end-users and takes information back from the field to the semiconductor conception and production process. Multiple actors will be involved in building the complete SLS infrastructure, necessitating partnerships and technical solutions such as open APIs throughout the value chain. This work has already started and includes:

  • The IEEE P2851 standard for functional safety data format for interoperability within the dependability lifecycle
  • Tools like Siemens’ MindSphere cloud-based platform, currently used in industrial and manufacturing sectors
  • Embedded Analytics hardware modules and applications to monitor, analyze, and communicate from the chip to the world

Of course, silicon lifecycle management is in its infancy. Knowledge in the semiconductor ecosystem is siloed and highly specialized. The EDA folks don’t know about optimizing data centers, the data center expert knows nothing about optimizing silicon yield. The yield engineer is clueless about RTL design. Creating the right kinds of information, available to the right people at the right time, is a larger challenge than technical enablement. Challenges, though, are what the semiconductor industry thrives on. There are no worthy problems we can’t solve.

Summary

With silicon lifecycle solutions,  chipmakers can create holistic, data-driven solutions that enhance semiconductor design, realization, and utilization. The bottom-line benefits of such a holistic approach are substantial. The IC design and production process becomes more responsive, agile, and cost-efficient. Devices become easier to integrate into end products, and after deployment, are more performant, reliable, and secure. The ability to ‘monitor what matters’ throughout the useful life of the device enables preventive maintenance and continuous performance optimization in the field, which is vital for new business models.



Leave a Reply


(Note: This name will be displayed publicly)