Pushing the margins of silicon performance makes it critical to predict failures before they occur.
By Lorin Kennedy and Dan Alexandrescu
For everyday consumers, no products require reliability more than automobiles. While consumers may be willing accept their laptops and phones limiting performance or abruptly turning off when systems reach unacceptable temperature levels, that is not the case for the reliability of Advanced Driver-Assistance Systems (ADAS) or other safety critical systems in your vehicle.
Today, automobile manufacturers design a large amount of redundancy into safety critical systems at considerable cost. The same is true of the chips that power many of these systems. A “mission profile” for electronics provides an estimate of how long the silicon can reliably operate within certain temperatures. Operational or mission profile of most automotive-grade silicon is modeled around 8-10,000 hours of use, which roughly accounts for about 15 years of reliable function with an average driving time of around 2-3 hours/day. To ensure safety, automotive-grade semiconductors are overdesigned for their intended use, with most standard parts designed to easily exceed twice that life.
Given the need for electronics to be ‘on’ longer, like in the case of electric vehicles (EVs) to accommodate charging time, the number could comfortably reach closer to 40,000 operational hours under normal conditions. Further, when looking at new electrical architectural designs built on advanced semiconductor process nodes, such as HPC-level zonal/domain control clusters for the Autonomous Driving (AD) system of robo-taxis, the number could be closer to 130,000 operational hours (figure 1). This could feasibly push the limits of the silicon reliability in physical environments that haven’t been fully tested.
Fig. 1: Variability of mission profile in automotive fleet.
In addition, the physical and environmental disparity between vehicle usage can stress silicon beyond the range of the design, creating more situations where use and workload are issues for meeting reliability projections (figure 2).
Fig. 2: Mission profile variability from summer 2023 heatwave in Phoenix, AZ USA.
There are many reasons why electronics reliability has grown in importance in recent years. Electronics content has not only increased, but also become much more sophisticated. ADAS features require large chips with massive processing power, and true automated driving will need to handle even greater workloads.
This involves complex hardware-software interactions and real-time edge processing of inference models that are hard to predict and test in the lab. This is especially true for silicon hardware that is designed well before the final system and vehicle and is fed by a wide range of cameras, sensors, radar, sonar, and lidar technologies while interacting with diverse electronic control units (ECUs). These systems not only need an extended operational life to handle advanced workloads, but also enough headroom and flexibility to handle the final configuration both at the product rollout as well as advanced feature updates that will be expected over the vehicle lifetime. This has led to the growth in SoC and ECU virtualization to better understand hardware and software reliability.
In addition to the complexity in design, nanometer-level chips required for advanced functions will be more difficult to manufacture and extremely sensitive to process variability and manufacturing defects. Multi-die technology is susceptible to assembly (compound yield issues), use conditions and stress adding significant test and repair challenges. Environment stress on these chips is getting worse since many geographies are experiencing record high temperatures. These stresses accelerate chip aging effects, with transistors slowing down over time. When failures occur, root cause analysis and diagnosis is complex, hard, and usually late.
The Zero Defects framework (AEC Q004) championed by the Automotive Electronics Council recommends sustained efforts spent during the full design, manufacturing, and deployment flow. This mandates the various partners to collaborate efficiently on reliability and quality topics, which is challenging. Silicon health monitoring is ideally positioned to help automotive providers, integrators, and their ecosystem partners to deliver high quality products (figure 3).
Fig. 3: Example automotive defects framework for silicon.
Pushing the margins of silicon performance to meet the changing requirements of advanced systems makes it critical not just to detect failures in the field but also to predict them before they occur. This form of preventive maintenance is essential to meet the reliability expectations of consumers and satisfy the functional safety requirements of standards such as ISO 26262 for road vehicles.
Early margin degradation, outlier detection, and failure prediction demands knowing what is happening in the chips (figure 4). Monitoring of silicon status and analyzing the results are part of a silicon lifecycle solution such as Synopsys’ Silicon Lifecycle Management (SLM) solution, a methodology that spans the full chip lifecycle through design, in-lab bring up, production test, and in-field mission mode operation through end-of-life.
Fig. 4: Margin analytics for silicon observability and remediation.
Now, with the ability to collect silicon data from final in-vehicle systems, a more accurate picture of silicon health and reliability becomes possible. Large car “populations” provide an unprecedented visibility into challenging, elusive reliability and quality threats, otherwise difficult to observe or understand during qualification or NPI (New Product Introduction). Prediction of impending failures or prognostics is much easier when anomalies can be compared against data from across the “fleet”. For example, history might show that vehicles with similar chips experienced failures not long after specific anomalies were observed. These fleet level insights can enable proactive preventive maintenance, recommending repair or OTA updates before failures occur in working chips also showing these anomalies.
Data gathered from in-chip monitors can be used to improve the operation of chips as well as their reliability. Tuning operating parameters can optimize chip functionality, especially for high demand processors. Field data can also be leveraged in manufacturing test to improve yield and reliability, or all the way back to the digital twins and models in the design stage to improve derivative or future chips. A small investment in silicon to add monitors and SLM hardware, with embedded analytics in the vehicle and on the cloud, pays dividends throughout the lifecycle. Not only can silicon manufacturers accelerate root cause diagnostics, but OEMs can also benefit from managing silicon workloads and stress to extend useful life and increase long-term customer value. Together, the entire ecosystem can benefit financially and in a better product to meet changing demands for computational power and performance in new automobiles.
In summary, automotive electronics have become a dominant cost component and key factor for reliability and safety. ADAS and AD are forging a new era with dramatically tougher requirements. Legacy quality and reliability approaches are no longer sufficient. Active silicon monitoring and extensive analytics are imperative for safety and reliability. The Synopsys SLM solution is ideal for automotive applications. Chip developers can optimize their designs, achieve the best possible manufacturing yields and provide OEMs the necessary insights and tools to operate safely in the field for the entire vehicle lifetime.
Dan Alexandrescu is an R&D principal engineer at Synopsys.
Excellent break down. I appreciate the article. Learned a lot.