Functional Safety Verification For AV SoC Designs Accelerated With Advanced Tools

Earning the trust of a wary public remains one of the biggest hurdles for autonomous vehicles.

popularity

Autonomous vehicles (AVs) will be the culmination of dozens of highly complex systems, incorporating state-of-the-art technologies in electronics hardware, sensors, software, and more. Conceiving and designing these systems is certain to be one of the greatest challenges for today’s engineers. The only greater challenge will be convincing a wary public that these automated systems are safer drivers than they are. According to a recent survey by the American Automobile Association (AAA), 71% of Americans say would be afraid to ride in a driverless vehicle. Public perception surrounding automated driving remains to be the greatest obstacle to their success (figure 1).


Figure 1: Earning the trust of a wary public is the biggest hurdle autonomous vehicle technology must overcome.

AV manufacturers will need to demonstrate the safety and reliability of all aspects of the self-driving systems they develop to establish trust with the public and calm their fears. In addition to software, the advanced integrated circuit (ICs) and system-on-chip (SoCs) hardware powering these systems will be of critical concern. To that end, the automotive industry has established a set of procedures and standards focused on the safety of electrical and electronic systems, known as functional safety.

The goal of functional safety is to reduce the risk of electrical and electronic components malfunctioning due to failures. In the automotive industry, these procedures and requirements have been formalized in the ISO 26262 standard. ISO 26262 requires that electronics be tested for random hardware failures and systematic faults.

Systematic faults are those that prevent an IC or SoC from operating correctly according to the product specifications. These could be design bugs, hardware/software interface problems, misinterpreted or incomplete specifications and more. Over time, the IC industry has accumulated a great deal of knowledge, tools, and processes for dealing with systematic faults. In contrast, the industry is less experienced and not as well-equipped for finding and resolving random hardware faults. Random hardware faults are unpredictable and occur over time as the IC operates.

ISO 26262 requires that chips continue to operate, or fail safely in the event of a random hardware fault. Ensuring that an IC fails safely due to a random fault requires four key processes (figure 2):

  • Lifecycle management covers the functional safety lifecycle from planning to compliance. This includes change/configuration, project, requirements, quality assurance, and audit/compliance management. Lifecycle management processes occur continuously throughout development.
  • Safety analysis helps designers understand how the design could fail from random hardware faults. Failure modes and effects diagnostic analysis (FMEDA) identifies the potential failure modes of a design, the failure rate, how each mode will affect its functionality, and the probability that automated diagnostics will catch each failure mode. Then, engineers perform safety gap analysis to determine the safety enhancements needed to reach their safety targets.
  • Design for safety enhances the design to mitigate potential failures from random hardware faults. This is achieved by inserting safety mechanisms into the design that detect and correct faulty behavior, ensuring the design behaves or fails safely.
  • Safety verification proves that the design is safe by validating a set of fault metrics through the process of fault injection. The set of fault metrics includes, single-point and latent fault metrics (SPFM/LFM), and diagnostic coverage (DC).


Figure 2: Four key processes for the creation of safe IC designs.

These processes operate in a closed-loop flow where the results from each process inform the next step. This is critical to addressing random hardware faults and building a safe IC design on first-pass. Let’s briefly discuss each of these processes and advanced verification technologies that can increase the effectiveness of verification engineers at each stage.

Lifecycle management
ISO 26262 includes guidelines for the tracking and management of design changes, test results, and safety metrics. Many companies still rely on their engineers to manually track and gather this information. Manual methods are slow, tend to introduce errors in the data as its recorded, and do not link important information together, making traceability difficult. As a result, engineers spend time cobbling information together before they can create necessary work products for audits and assessments. With the increasing complexity of automotive ICs, manual requirements and compliance management is no longer sufficient.

A requirements-driven verification process is fundamental for companies competing in the automotive IC business. Application lifecycle management (ALM) solutions enable a requirements-driven flow by providing a digital backbone for the entire functional safety process. ALM supplies engineers with the information they need to prove the functional safety of critical automotive electronics, eliminating the time-consuming process of manually gathering this data.

Safety analysis
With a lifecycle management solution in place, the first step to proving functional safety is safety analysis. Safety architects commonly start by identifying the high-level failure modes of the design through the creation of a failure modes and effects diagnostic analysis (FMEDA), then compute the base failure in time (FIT) rate of the design, and estimate the single-point and latent fault metrics (SPFM/LFM). The safety architect can then explore the areas of the design that need to be made safer, and identify the appropriate safety mechanisms to meet target safety levels (figure 3).


Figure 3: Safety analysis identifies failure modes of the design and appropriate safety mechanisms.

The metrics generated during safety analysis will serve as a baseline for comparison during safety verification, after the design is improved. Such analysis should be performed at the structural level of the design to produce the most accurate numbers, thus increasing the likelihood of creating a safe IC on the first pass, saving costly and time-consuming iterations.

Design for safety
Now that the engineers have a plan to make their design safer, based on the safety analysis, the next step is to insert safety mechanisms into the design. Advanced solutions enable automatic insertion of safety mechanisms into the RTL to implement run-time design hardening techniques (i.e. ECC, CRC, parity, duplication, replication). These mechanisms are hardware-based and directly address both permanent and transient single-point faults. Then, engineers can insert logic and memory built-in self-test (LBIST/MBIST) structures, and a controller for run-time operation of these engines. These on-chip testing facilities can identify latent faults that occur in the field, improving the long-term safety and reliability of automotive chips.

Safety verification
Finally, the improved chip design must be verified as safe by observing how it behaves in the presence of faults. Safety verification starts by using the fault list generated from the safety analysis phase. Then, fault simulation is used to inject these faults into the design, producing a new set of fault metrics to indicate the effectiveness of the safety mechanisms that were inserted during design for safety.

Fault simulation is used to verify a majority of the faults identified in the design. At the RTL level, an IC design can have faults in all of the nets, registers, and ports. Another level down, the gate-level netlist can have many times more faults, reaching into the millions. Accounting for safety metrics increases the number of potential faults even further. Furthermore, automotive SoCs contain a mixture of digital and analog/mixed-signal circuitry, adding to the potential number of faults and necessitating solutions that can perform fault injection across digital and analog/mixed-signal blocks.

To keep simulation times manageable, engineers use an array of techniques to reduce the scope of a fault campaign or fault list, this is known as fault optimization. One example is fault sampling, in which a random sample of the faults, numbering into the thousands, is selected. This reduces the number of faults that need to be activated during the safety verification.

For most functions, it is not necessary to verify the safety of the design against all possible faults. Safety-critical components, however, require comprehensive verification to ensure they are free of bugs entirely. Achieving this level of verification on even relatively mundane operations quickly outstretches the capabilities of simulation.

As a result, the use of formal verification for safety critical designs has gained popularity because it achieves the needed level of verification from a drastically reduced set of input conditions. Formal verifies a design “breadth-first,” automatically considering all possible input conditions. From here, formal can analyze the entire set of states that are reachable given the starting conditions. The result is a set of worst-case safety metrics that accounts for all possible faults in the design.

A more capacious verification engine can further improve verification times, in addition to optimizing fault lists through formal techniques. Hardware emulation executes tests in the hardware design at megahertz (MHz) speeds, several orders of magnitude faster than simulation. This enables system verification to begin before the chip design is implemented in silicon and provides full visibility into the hardware design for efficient debug. Furthermore, emulation supports fault injection, monitoring, and results analysis for safety-critical automotive applications.

Functional safety in the autonomous age
Under great pressure to be first to market, automotive startups, established OEMs, and systems companies will need a set of advanced verification tools to meet these rigorous safety requirements on time. As they contend with impressively complex chips, verification teams will rely on robust lifecycle management processes, automation, and a combination of simulation, emulation, and formal techniques to ensure the safety of autonomous ICs.

For more information, download our whitepaper, Achieving Functional Safety for Autonomous Vehicle SoC Designs.



Leave a Reply


(Note: This name will be displayed publicly)