Making Autonomous Driver Chips Safe From The Top Down

Seeing the effect of safety mechanisms on FMEDA early in the design process.

popularity

It’s easy to think of electronics applications in which the chips must be ultra-safe: nuclear power plants, aircraft, weapons systems, and implanted medical devices. Autonomous vehicles, capable of self-driving with only the electronics in control, are rapidly emerging to join this list. These vehicles must be “safe” in all the usual colloquial ways, but they also must meet a very specific definition of “functional safety” as codified in industry standards such as ISO 26262. Effective and compliant safety-critical design requires a methodical, well-documented process that culminates with complex but critical calculations of protection against random and systematic faults and considers the remaining risk after safety measures have been applied. One key analysis to determine functional safety compliance with respect to random hardware faults is Failure Modes, Effects, and Diagnostic Analysis (FMEDA). It is an extension of Failure Modes and Effects Analysis (FMEA), which has been in use for safety-critical designs since the 1940s.

Random faults include permanent faults, such as chip damage due to aging, and transient faults, such as a memory bit flip caused by an alpha particle. Both types of faults can and do occur in the challenging automotive environment. Designing safety mechanisms into chips minimizes the impact that such faults will have on the overall safety. These mechanisms may be able to correct some faults, such as an error-correcting code compensating for a flipped bit. Other faults must be detected whenever possible, with appropriate action taken to avoid catastrophic system failure. For example, an autonomous vehicle might coast to a safe stop on the shoulder rather than proceeding at full speed after a fault has been detected.

Calculation of FMEDA is not simple and involves several metrics:

  • BFR: Base Failure Rate
    • The probability of a design failure due to random hardware faults (permanent or transient)
    • 1 FIT = 1 Failure per 109 (one billion) hours
  • SPFM: Single Point Fault Metric
    • The ratio of safe and detected single point faults to the overall safety related faults
  • LFM: Latent Fault Metric
    • The ratio of safe and detected multi point faults to the overall safe and multi point faults
  • PMHF: Probabilistic Metric for random HW Failures
    • The residual failure rate of the undetected single point faults

FMEDA provides the evidence needed to demonstrate that a design meets the desired Automotive Safety Integrity Level (ASIL) as defined in the ISO 26262 standard. Each ASIL has specific requirements for key FMEDA metrics for faults that are safe (no risk to functional safety) or detected by the safety mechanisms:

  • ASIL B requires SPFM >= 90% and LFM >= 60%
  • ASIL C requires SPFM >= 97% and LFM >= 80%
  • ASIL D requires SPFM >= 99% and LFM >= 90%

Historically, the FMEDA process has involved reams of spreadsheets with much manual analysis and calculation. Because of the amount of effort and the dependence on the details of the chip hardware, FMEDA typically happened late in the design process. If late-stage project requirements such as responses to competitive announcements required design changes, a lot of hard manual work had to be repeated. Fortunately, the EDA industry now provides solutions in which metrics are provided by IP suppliers or calculated separately, the user defines the chip hierarchy (tree) and the results for all blocks are rolled up at the full-chip level.

This automated approach is a huge step up from paper spreadsheets but still performed late in the design process. There is a strong trend in EDA to “shift left” development by starting with abstract models whenever possible and performing top-down analysis, followed by refinement as the design is completed. Virtual platforms, high-level floorplanners, and power intent files are just a few examples of this trend in action. VC Functional Safety Manager has recently made early, high-level, top-down FMEDA available as well. The user can define the design hierarchy tree even before any register transfer level (RTL) code is available and provide high level estimates on design size, diagnostic coverage, and safety of each IP. The tool rolls up these estimates into an early calculation of FMEDA results.

This high-level approach enables easy exploration over a wide range of possible safety architectures and extensive “what if” analysis. The user can easily modify data for design blocks, change which IPs are safety-related and add or remove safety mechanisms. Seeing the immediate effect on FMEDA from the different variations enables informed early decisions on how to add appropriate safety mechanisms to meet the target ASIL level without consuming unnecessary chip area or power. The user can also copy the chip/IP hierarchy to a new tree, make architectural changes and see metrics of the two configurations side by side.

As RTL blocks are completed, the user can import the RTL designs and bind them to the right points in the tree. This hybrid mode continually refines the FMEDA results, with all high-level estimates eventually replaced by detailed calculations.

As electronic devices permeate more aspects of life, it is not surprising that the results of failure become more serious. Therefore, the design of functionally safe chips has become essential for many diverse applications. While far from the only example, autonomous vehicles have become the best-known application. Fortunately, the ISO 26262 standard provides extensive and detailed guidance for the design of safe systems. Synopsys VC Functional Safety Manager automates FMEDA and offers a unique shift left solution for early estimation of key metrics that are refined as the design is implemented. More information is available here.



Leave a Reply


(Note: This name will be displayed publicly)