Lock step, redundant execution and split-lock—which is better and why.
As the automotive industry accelerates innovation toward fully autonomous vehicles, one of the underlying values of this effort will be safer roadways. Nine in 10 vehicle accidents are caused by human error. Work underway, and future innovation in autonomous and semi-autonomous vehicles, should shrink driving fatalities by tens of thousands in much the same way that the introduction of seat belts greatly reduced accident fatalities.
But fundamental challenges need to be solved to enable pervasive deployment of such vehicles. Autonomous systems require lots of compute performance and because they are capable of controlling the vehicles’ direction and speed, they require the highest levels of safety integrity.
Fortunately, functional safety standards such as ISO 26262 help guide automotive design to reduce unreasonable risk due to hazards caused by malfunctioning behavior of electrical and electronic systems.
The various safety standards also define different levels of safety integrity, i.e., how ‘safe’ a particular system needs to be. For example, the system controlling the brakes in a vehicle would be expected to have the highest levels of safety as failure of such a system could be catastrophic. In contrast, a system controlling the motors in the driver’s seat, whilst still having a safety requirement, would be expected to have a lower rating.
In ISO 26262, this is defined as the “Automotive Safety Integrity Level” or “ASIL”. ASIL currently is defined as four different levels ranging from “A” (the lowest) to “D” (the highest). These levels have a direct correlation to the diagnostic coverage a system must attain or, in other words, how many faults a given system is expected to detect.
While standards such as ISO 26262 have clear objectives and are widely accepted, implementing designs with them can be complex and nuanced. In other words, the devil’s in the details. So what are the technical options to achieve this?
1. Lock-Step
Configuring two CPU cores in ‘Lock-Step’ is a traditional way of achieving high levels of diagnostic coverage – the ability to detect the occurrence of an error condition. The principle is straightforward—the cores each feed into a block of comparator logic and each executes exactly the same code. The comparator logic compares the outputs on a cycle-by-cycle basis and as long as the results are equal, all is well. If there are discrepancies between the results, this could be an indication of a fault condition that should be investigated or acted upon. The resulting action is defined by the system developer and is dependent upon the system in question. It could be as simple as rebooting or rechecking if the error condition still exists after given a period of time. This lock stepping is fixed in the silicon by design and therefore has no flexibility, so the application is effectively using two cores but only achieving the performance of a single core. This approach is ‘proven’ and has worked well for microcontrollers and less complex, deterministic microprocessors for many years.
2. Redundant execution.
CPUs that offer higher performance capabilities are often a lot more complex and less deterministic and therefore much more challenging to Lock-Step. This has led to more ‘exotic’ approaches to solve the aforementioned challenge. Software redundancy or redundant execution is certainly one alternative.
This approach assumes that two independent applications are being executed, potentially on different CPU cores, or even within different virtual machines if virtualization is being implemented. As the outputs of the applications become available, they are compared by an additional, high-safety-integrity core(s) for correctness, commonly referred to as a “safety island” due to its independent clock and power supplies. This safety island would be responsible for the final “decide and actuate” phase. This approach can reduce the diagnostic coverage requirements on the high-compute cluster and can also introduce a greater degree of flexibility in to the implementation coupled with improved efficiency. However, it also dramatically increases the level of complexity of the system coupled with a lower granularity of cross-checking. Due to the benefit of software flexibility, this approach may become more widely deployed for certain applications requiring safety and high compute performance in the coming years.
3. Split-Lock: The best of both worlds
The ultimate solution must be the one that brings together the benefits of both approaches – flexibility, performance, simplicity and proven effectiveness. With the introduction of the ‘Split-Lock’ capability on the Cortex-A76AE, Arm has done exactly that – high compute performance coupled with high safety integrity support. How does split-lock differ from Lock-Step? In essence, it adds the flexibility that wasn’t available in lock stepped CPU implementations. It allows the system to be configured either in a ‘split mode’ (two independent CPUs that can be used for diverse tasks and applications), or ‘lock mode’ (the CPU’s are lock stepped for high safety integrity applications) at boot up. This flexibility could even be extended to support potential fail-operational modes – the ability to continue to operate in a degraded mode rather than completely shutting the system down. For example, when running in lock mode, if one core starts to exhibit a failure condition, the system could be quiesced and the faulty core be taken off-line (split) allowing continuation in a degraded mode of operation. This ‘split available’ capability is critical for any autonomous system.
The Split-Lock capability implemented within the Cortex-A76AE autonomous class processor also allows the same base design to be used across multiple applications, with or without safety, such as in-vehicle infotainment systems as well as autonomous vehicle systems enabling huge design efficiencies to be achieved throughout the supply chain.
Summary
The Cortex-A76AE is the latest addition to Arm’s new Safety Ready program and augments a rich heritage of functionally safe IP. It’s the first autonomous class processor with integrated safety – the compute performance level coupled with the split-lock functionality enable new levels of innovation and scalability in the automotive domain. This product is complemented by the industry’s broadest portfolio of safety IP also encompassing software elements, tools and comprehensive documentation.
Learn more about the Split-Lock capability of the Arm Cortex-A76AE: The first autonomous-class processor and the Arm Safety Ready program.
Leave a Reply