Ensuring that safety-critical designs can detect and recover from errors.
Functional safety is a major challenge for field programmable gate arrays (FPGAs) and other semiconductor designs. Safety requirements go beyond traditional verification, which focuses on design bugs. Chips in safety-critical applications must be able to handle a variety of faults from sources such as temperature and power extremes, device aging, radiation, ionization and component failures. Applications include autonomous vehicles, medical equipment, implanted medical devices, military/aerospace systems, nuclear power plants and industrial controllers. Many of these applications have industry safety standards that manufacturers must satisfy.
Safety-critical design entails detecting faults and either correcting them or taking appropriate measures such as safe system shutdown or reset. For example, a self-driving car that can no longer operate correctly might pull to the shoulder as quickly and safely as possible. Some faults can physically damage a chip by breaking connections, changing the values of components or causing a signal to be stuck at one value. These are classified as permanent faults. Other types of faults such as signal glitches or bit flips are transient and may be easier to correct. It is also possible for multiple faults to occur at the same time, complicating recovery.
There are several types of functional safety mechanisms that can be included in chips to survive faults. Error correcting code (ECC) bits in memories or registers provide partial data redundancy to detect and even correct flipped bits. A fault might propagate to a finite state machine (FSM) and cause a deadlock, so safety-critical designs detect and recover from incorrect transitions. In extreme cases, triple module redundancy (TMR) may be used. Critical logic is triplicated and each output from all three modules is fed into a majority voter. If a fault produces an incorrect value from one module, the voter will select the correct value from the other two modules.
These techniques are all equally applicable to FPGAs, ASICs, and full custom chips. Many safety-critical applications are low volume, so FPGAs are a logical and popular choice. They offer high performance and capacity without long delays and huge costs for chip fabrication. FPGA designers can manually add safety mechanisms in their register transfer level (RTL) code, but this is a significant effort. The additional logic must be instantiated, connected and verified. Changes made to the RTL may have ripple effects that require manually updating the safety mechanisms. Design for safety is much more efficient and robust when it is automated.
Logic synthesis is the best stage in the development flow to automatically insert safety mechanisms, and Synopsys provides the ideal solution. Synopsys Synplify Premier has been the industry’s most advanced FPGA design and debug environment for more than 20 years. It provides the best timing and area results with the fewest iterations. Synplify Premier supports a wide range of programmable devices, with deep knowledge of the unique aspects of their architectures. This ability enables it to automatically insert safety mechanisms in the most efficient way possible, leveraging the structures of the chosen FPGA target device.
Synplify Premier supports all the techniques discussed previously, and the resulting FPGA is guaranteed to match the specified intent. But the design team may miss portions of the chip that need safety protection or select a technique that is inadequate for the desired safety level. Thus, proper safety operation must be verified by inserting faults into the design and observing their effect on the operation. This is challenging to do in a physical FPGA, so fault simulation is the established method. Fault simulators are optimized to run on the post-synthesis netlist, at which point all the safety mechanisms to be verified have been inserted by Synplify Premier.
The Synopsys Z01X fault simulation solution is the industry’s leading platform for verification of safety-critical designs. It uses fault models to mimic the types of faults that can occur in the actual chip. It automatically inserts and simulates both stuck-at faults (input, output and net) and transient faults. Z01X has the performance and capacity to handle massive fault lists for large FPGA devices. It uses a concurrent fault simulation algorithm to inject multiple faults at the same time. The fault list can be partitioned across multiple processor cores and machines in a server farm or cloud environment.
The user supplies a fault list in a standard format plus the strobe points where the effects of injected faults will be observed. At these points, Z01X compares the results of the design with the injected fault against the design with no faults. A disagreement means that the fault has been detected. If all tests pass with identical results, the fault is not detected, which may indicate the need for additional safety mechanisms. Many functional safety standards have strict requirements for fault coverage; Z01X generates fault coverage statistics for either the entire design or a specified functional block, providing details for all individual faults.
Safety-critical FPGAs are required for many applications, from tiny implanted medical devices to massive nuclear power plants. Synplify Premier automatically adds the necessary safety mechanisms to protect against faults from internal and external sources. Z01X provides industry-leading fault injection and fault simulation capabilities to verify the safety mechanisms and calculate the fault coverage statistics required by safety standards. Problems that affect functional safety are found and fixed early, saving months of FPGA debug in the bring-up lab. The Synopsys functional safety flow is unparalleled in performance, capacity and ease of use.
To learn more, download this white paper.
Leave a Reply