Dependent Failure Analysis For Safety-Critical IP And SoCs

Stopping faults from propagating through coupling factors or cascading from one element to another.

popularity

By Shivakumar Chonnad, Radu Iacob, and Vladimir Litovtchenko

Due to the increased complexity in safety-critical system hardware, software, and mechatronics, the functional safety development process must address systematic and random hardware failures. Numerous safety-related activities are performed during safety-critical IP and SoC developments, as part of the safety lifecycle, from product concept through decommissioning. A safety plan, including dates, milestones, tasks, deliverables, responsibilities, and resources, is a key work product required to plan, manage, and guide the execution of these safety-related activities during product development. The safety plan must define the Dependent Failure Analysis (DFA)[1], which is a key activity.

The main purpose of a DFA is to reveal potential dependencies caused by Dependent Failure Initiators (DFI)[1].

This article explains the importance of implementing DFA in the automotive IP and SoC development cycle and how DFA helps meet the technical independence essentials according to the design’s safety requirements.

Dependent failure

A Cascading Failure (CF)[1] in several IP components or IP modules in an SoC may occur between two or more elements. A failure in one component resulting from a root cause, either inside (intrinsic fault) or outside of the block, can cause a failure in the same component or different module, leading to safety requirements violations. Alternatively, a failure of two or more IP components or within an SoC resulting directly from a single specific event or root cause, internal or external to all of these elements, can cause a Common Cause Failure (CCF)[1], as illustrated in figure 1[1]. Element B is free of interference from element A if no failure of element A can cause element B to fail. Element A and element B are independent if no interference and no common root cause for failures exist. The intent is to analyze the DFIs to stop faults from propagating through the coupling factors or cascading from one element to another.


Fig. 1: Illustration of a common cause failure and a cascading failure.

Dependent failures have a chain of dependencies propagated through a common characteristic or relationship between the affected blocks. The propagation chain is caused by coupling factors, which can be conducive to coupling between signals, common elements like power and clock networks, etc. A DFI is a single root cause that leads to multiple elements failing, which can lead to further propagation through the coupling factors. The affected blocks’ hierarchy structure and the temporal functional behavior between them form the coupling factors. It is important to note that SW can also be a source of systematic dependent failures, which is caused by the interaction between the given elements, including SW or HW elements, or both, through their coupling factors, potentially impacting the safety requirements.

Why Dependent Failure Analysis?

SoC or IP designs can have various sub-blocks with or without safety relevance. Each sub-block is developed in accordance with the measures needed to achieve its highest applicable Automotive Safety Integrity Level (ASIL)[1]. When determining the element’s ASIL, the cascading failures analysis can provide a rationale for Freedom From Interference (FFI)[1], as described in figure 2. FFI is the absence of cascading failures that could lead to the violation of safety requirements and is also used to consider and justify coexistence of elements.

Fig. 2: Technical independence and freedom from interference.

FFI can be achieved by suitable block partitioning such that any fault within one block can be detected and/or mitigated to avoid cascading into another block. Independence between the blocks can occur only after confirming the absence of dependent failures that can lead to safety requirements violations. If the DFA reveals a cause for a dependent failure, then adequate safety measure can be put in place in accordance with the initial safety requirement ASIL.

To ascertain the technical independence, a CCF analysis in addition to the cascading failures analysis must be performed. By identifying the potential causes of CCFs or cascading failures, DFA can support safety measures planning. If necessary, these additional safety measures are implemented at the SW and/or HW level or system level to achieve sufficient independence. It can be an opportunity to allocate homogeneous functions with different safety criticality during the design partitioning step.

The IP components require a DFA in order to share the potential sources of the dependent failure causes with the SoC integrator or safety assessor. Examples include a single fault[1] affecting both mission logic and corresponding safety mechanism[1], a fault in shared resources or shared infrastructure elements like clock tree. In schemes where redundant logic like the Dual Core Lock Step (DCLS) for CPUs is present, a systematic error can affect both primary and redundant cores, causing a dependent failure. Another important part of the safety analysis and reporting is the disclosure of the possible susceptible points at the interfaces or potential

failures arising from the interaction between SW and HW through the Hardware Software Interface (HSI). By analyzing such potential causes or dependent failures initiators, suitable safety measures can be defined for detection or mitigation. Inductive or deductive HW design safety analysis can identify fault causes and effects.

Deductive (top-down) versus inductive (bottom-up) Dependent Failure Analysis

In the case of dependent failures, a deductive analysis features a top-down approach, starting from a top-level failure or a safety goal violation, as illustrated in figure 3[2]. Thus, the safety goal violation is further decomposed into specific failure modes and subsequently analyzed to identify potential risks of dependent failures. For example, decomposing the safety goal violations include the mission function, between mission function and safety mechanism, and other common cause failures such as shared resources, shared inputs, etc.


Fig. 3: Deductive method used in dependent faults analysis.

This top-down analysis is typically performed at the architectural level, providing input for design implementation decisions and techniques. The deductive analysis[1] should continue along the design development as more design data becomes available.

As design architecture is further refined, it is useful to perform an inductive analysis[1] using a bottom-up approach. Inductive analysis is the appropriate analysis to carry out if a given set of initiating causes are identified to determine the resulting consequences.

With more design implementation details, an inductive data driven analysis can start from actual common cause initiators, coupling factors, and fault propagation paths. This analysis may lead to identifying additional failures that may have not been considered during the deductive analysis. In general, both deductive and inductive approaches must be employed to get a complete set of dependent failures. The deductive approach has the benefit of focusing the analysis on the undesired events while the inductive approach ensures the analysis is broad enough to include all possible scenarios.

Performing a Dependent Failure Analysis

DFA begins with identifying and analyzing all the blocks or sub-blocks that require independence. The analysis is applied to the level for which the FFI or technical independence requirements are to be achieved, for example at the system, HW, or SW levels. The top-level safety requirement is translated into detailed HW or SW safety requirements for implementation by independent elements. Functional redundancy approach can be used such that two independent architectural elements allow monitoring and detecting faults. These elements are sufficiently independent to ensure:

  • Dependent failures that can detect safety requirements violation
  • Each identified dependent failure is detected and controlled by an adequate safety measure

Then the elements’ dependent DFI and CCF are identified allowing further analysis to identify the blocks where the dependent failure initiator root cause propagates into a cascading failure. The coupling factors which propagate the dependent failures amongst such blocks are further identified. Using the resulting data, analysis is performed to narrow down all possible events, faults, or failures that may propagate between elements inducing causal failure chains.

Resulting data is used to identify possible measures to mitigate the DFI and coupling factor effects to prevent dependent failures. This ensures design or hierarchy provides functions separation for independence and required safety measures to monitor the shared resources for dependent failures are in place.

As a result, the adequate safety measures can be selected to prevent or to detect and control failures with the potential to violate safety requirements as illustrated in figure 4.


Fig. 4: Typical steps in a Dependent Failure Analysis flow.

When to perform a Dependent Failure Analysis?

Safety analyses for dependent failures are applied during the development lifecycle phase. These analyses can be related to each other. DFA can begin as early as the architectural phase, where the various elements’ models and behaviors have been defined. As soon as the specifications to the micro-architecture level or the detailed hardware specification is ready, the interface behaviors and the coupling factors, for example paths of propagation, are known. The findings from the DFA at the various phases are then fed back into the implementation and verification phase to mitigate the failures due to the DFIs.


Fig. 5: DFA related activities in a safety related project cycle.

What happens after a Dependent Failure Analysis?

DFA can confirm freedom from interference or sufficient independence between components. The design DFIs are identified based on the DFA results. Verification activities like simulation of various scenarios through directed or random stimulus generation confirm the propagation of the common cause and/or cascading failures. Validation schemes can confirm if safety mechanisms functionality and efficiency are monitoring or detecting the dependent failures. Data reviews during pre- and post-implementation stages sufficiently address relevant identified faults and their safety measures. The reviews also confirm the achieved level of independence or freedom from interference between the relevant HW and/or SW elements. Consider having additional safety analyses to complement the DFA such as Failure Tree Analyses (FTA), for a structured approach to the deductive method, and Failure Modes Effects and Diagnostics Analysis (FMEDA), which provides a list of safety mechanisms that may be subject to dependent faults in regard to mission functions they monitor. For a Safety Elements out of Context (SEooC), external measures for dependent failure control are further back annotated as Assumptions of Use (AoUs) within the Safety Manual. The results of the DFA are necessary inputs for the design verification report per the verification plan, aiming to ascertain the achievement of technical independence or freedom from interference as required by the Architectural Specification and Technical Safety Concept (TSC). The independence between the blocks is verified when the DFA-related CCFs are addressed and FFI is achieved.

The data from all of the above analyses can be consolidated into a DFA Report that contains information about the DFA rationale, based on the FFI, which is required for coexistence of elements and ASIL decomposition, and technical independence, which additionally takes into consideration the CCFs. The report should also include a section dedicated to DFIs. As part of the analysis, potential coupling factors are also examined. Similar to FMEDA, the DFA analyzes all available product configurations. The DFA Report should further include the Common Cause Failures analysis summary, with mitigation measures for avoidance or control. The DFA Report should also include a Cascading Faults analysis summary and recommended measures for avoidance and control. Sufficiency of these measures is demonstrated by verification procedures included in the product Verification Plan and referenced in the DFA document. Evidence of the verification results is presented in the Verification Report and summarized in the DFA Report. The report concludes with an evaluation of the remaining risk for dependent failures. If the mitigation measures are insufficient and residual risk exists, the safety measures are improved, and the effectiveness evaluation is repeated. A list of follow-up actions are presented in this final section of the report.

Conclusions

Modern automotive product development requires dedicated functional safety features to avoid or to control failure risks. The functional safety development flow involves various safety analyses, including DFA which is initiated from the project’s early stages, ensuring FFI or technical independence, as illustrated in figure 6.


Fig. 6: Summary of the DFA activities.

DFIs, such as shared internal resources and common external resources, must be analyzed as applicable. The DFA demonstrates that the requirements for FFI and technical independence are met according to the design’s safety requirements. The safety measures for dependent failures are verified in a similar manner to other safety mechanisms. If residual risk exists, the measures are improved and effectiveness assessment is repeated. DFA is an important prerequisite for FMEDA given the impact of dependent faults on the safety mechanisms diagnostic coverage. In case of SEooC, external mitigation measures are back annotated as Assumptions of Use (AoU) in the Safety Manual.

The DFA document and the DFA report are standard deliverables for Synopsys automotive DesignWare IP products featuring ISO 26262 compliance. DFA ensures that the independence assumptions made in the Technical Safety Concept (TSC) and the FMEDA are properly achieved. Synopsys’ automotive-grade DesignWare IP portfolio includes safety packages, which consist of Failure Modes Effects and Diagnostics Analysis (FMEDA) reports, Safety Manuals, and certification reports to accelerate safety assessments and help designers reach their target ASILs.

References:

  1. ISO 26262 International Standard, Second edition 2018-12, CP401 Ch. De Blandonnet 8 CH-1214 Vernier, Geneva
  2. Fault Tree Handbook With Aerospace Applications, Michael Stamatelatos, William Vesely, et , NASA Headquarters, Washington, DC 20546, 2002

Radu Iacob is a quality and functional safety engineering manager at Synopsys.

Vladimir Litovtchenko is a senior manager for functional safety and quality engineering at Synopsys.



Leave a Reply


(Note: This name will be displayed publicly)