Systems & Design
SPONSOR BLOG

Functional Safety Implementation Goes Mainstream

A pivot to the rapidly growing automotive market means new requirements for software tools used in design.

popularity

Electronics engineers are being thrust into the automotive market like never before. The move to electrify automobiles, along with the advent of self-driving cars, means that silicon designers will be designing ever more sophisticated automotive ICs. But cars aren’t like most other electronic systems; it’s imperative that they cause no harm should they fail.

This brings us to the realm of functional safety, a new layer of responsibility that used to affect the small number of engineers working on military and aerospace designs. It has now gone mainstream, enabled by the ISO 26262 standard that governs how functional safety is to be assured in electronic systems and designs for automobiles. That means that there is a much broader set of engineers affected by functional safety as their companies pivot towards the rapidly growing market for automotive applications.

The challenge is that functional safety adds non-trivial tasks to the design flow. Before the current automotive push, it not only required a lot of expertise in knowing how to implement safety mechanisms such as redundancy; but, it also went against the grain of typical design goals such as optimal area. This doesn’t fit so well into a competitive commercial market where power, performance and area are the key driving goals. Engineers need design tools that allow them to meet their functional-safety requirements while maintaining competitive productivity. Areas that can provide dramatic benefit in simplifying ISO 26262 requirements for IC design are:

  • ISO 26262 certification of design tools or tool chains (as covered in ISO 26262 Section 8-11);
  • Assistance with implementation of safety mechanisms like redundancy, error correction, and fault campaigns;
  • The ability to specify safety intent in a manner that can be utilized by tools (in much the same way that the UPF works for power intent); and
  • Ensuring that there’s a backup strategy for each tool.

What automotive functional safety requires
The ISO 26262 standard identifies four levels of safety, expressed as Automotive Safety Integrity Levels (ASIL) A through D, with A applying to systems with relatively no safety implications and D referring to systems with higher risk that a failure could cause harm to lives. Many of the newly electrified automotive components are classified as ASIL D, while there has been movement from ASIL B and C designs to ASIL D as autonomous driving moves towards reality. This means that many new systems-on-chip (SoCs) are being developed for ASIL D applications, and those SoCs have to be reliable for a minimum of 10 years.


Figure 1: Automotive design migration to ASIL D with increased autonomous driving capabilities

In order to be compliant with ISO 26262 requirements, companies need to perform a software tool qualification assessment of the EDA tools they use to establish that the design tool will not introduce or fail to detect a functional safety issue in the design. EDA vendors have simplified this process by getting third-party ANSI-accredited assessors to perform ISO 26262 individual tool or tool-chain certification.

Such certification isn’t trivial, however. As an example, Synopsys involved around 100 people over almost a year to achieve certification for its digital and custom/AMS tools. Certification doesn’t imply that a tool is guaranteed never to fail, but rather that, for each possible failure scenario, a solution has been identified to mitigate the failure. These mitigations are documented as conditions-of-use (CoU) and assumptions-of-use (AoU). These CoUs and AoUs are documented in a functional safety manual, which, for every tool, provides guidance for tool use cases when doing safety-critical designs. The information can then be used in the company’s software tool qualification.

Random hardware faults represent uncontrollable events like single-event upsets (SEUs), such as a solar flare, which can cause glitches or loss of state. They can’t be outright eliminated, so their risks must be mitigated through redundancy and other safety mechanisms. The higher the ASIL, such as ASIL D, the more safety mechanisms are required to achieve the metric goal, such as calculations required for single point fault metric (SPFM). For example, ISO 26262 mandates that designs reach the following metrics for different ASIL levels:


Table 1: Increasing requirements for SPFM moving from ASIL B to ASIL D

Mitigation for random hardware faults is responsible for the bulk of the additional design implementation effort. The first step is to identify those logic paths that are vulnerable to such faults. All of the registers along those paths must then be made either redundant or error-tolerant. There are three ways of implementing redundancy, and the one chosen depends on the ASIL level. The options are:

  • The most stringent designs rely on triple-module redundancy (TMR), which can correct a fault. Each affected register is replaced by three registers and voting logic. If one of the registers experiences a failure that the other two don’t experience, then the two correct values will prevail, hiding the fault. It’s important to place the three registers far away from each other and isolate them electrically from other parts of the circuit to minimize any interactions that might hurt their mutual independence. Of the three strategies, this consumes the most silicon area.
  • The second alternative is using dual-mode redundancy (DMR); however, this is useful only for error detection and not correction. Other logic to determine what to do when an error occurs would need to be implemented.
  • For lower ASIL levels, critical paths can simply be hardened with fault-tolerant registers to reduce the chance of failure. In this case, a fault is neither corrected nor detected, but is simply made much less likely. This consumes the least silicon area of the three alternatives; but, it also would result in a lower SPFM calculation – which would be fine for ASIL B designs.


Figure 2: Safety Register types that can be implemented as safety mechanisms

Another safety mechanism that can be implemented, particularly with designs that have processor cores, is dual-core lock-step (DCLS); and, while it can’t correct a fault, it can detect that a fault has occurred in either one of the cores. Two cores with identical input logic run in parallel, and their output is run through a comparator. If the output values are different from each other, it indicates a fault. This can then raise a signal so that the system can take some kind of remedial action, like moving into a known-safe state until the effects of the fault can be neutralized. Here again, careful layout of the two cores is important for ensuring that they are truly independent of each other such that cells or buffers from one core are not placed in the other, and that there is truly physical separation of the two cores. In addition, routes should not be shared or traverse from one core to the other.

Keeping productivity high through automation
While these steps are important for automotive designs, they can also be extremely time-consuming if done manually, especially for new automotive designers. Automation is the key to meeting the requirements of both a competitive market and ISO 26262. But, in order to automate this, we first need a way of expressing our functional-safety intent for consumption by the tools. This makes it possible for the tools to implement the various safety mechanisms.

The first step is to analyze the safety critical paths to identify those that must be enhanced with safety circuits. Once identified, those paths can then have the redundancy or fault-tolerant registers automatically inserted, placing them carefully with proper separation and, if needed, with taps on either side of the register. This makes them less susceptible to disturbances from the SEU fault. The rest of the tools must then respect those specific elements, since the natural inclination of tools is to optimize redundancy away. Finally, a verification step is needed to confirm that all of the desired circuits have been created, placed, and routed correctly.

One important consideration, however, is the ability to have a backup tool, which is always a good practice for verifying that safety elements are implemented. If you’re relying on a single tool for a task like, say, synthesis, then you really want another tool that can confirm that the task was performed correctly. For synthesis, you have numerous different verification tools that can act as that backup – including the new functionality needed to confirm the redundancy of safety circuits.

A brand new world
There’s no question that safety-critical design involves tasks and considerations that are not required for other types of design. And no amount of tool automation would make it possible to proceed in a manner oblivious to the safety requirement of the design. But, with certified tools, you have the ability to specify safety intent, you have automation to simplify the analysis and insertion of redundant circuits, and you have plenty of documentation. Given those tools and a modicum of training, SoCs for ASIL-D applications can be created more efficiently, removing much of the friction that has been previously associated with implementation of a safety-critical design.



Leave a Reply


(Note: This name will be displayed publicly)