Collaboration between supplier and customer is key to achieving functional safety goals.
The ISO 26262 standard is a weighty series of documents that many believe has all the force of law or regulation; however, it is not a dictate. It is an agreement on best practices for participants in the vehicle value chain to follow to ensure safety as far as the industry understands it today. There is no monetary fine if the standard is not followed, though it will be difficult to sell automotive products without compliance. The intent is to define a framework and set guidelines that will encourage collaboration, best efforts, and continuous improvement towards safety – while still allowing some flexibility to address the incredible complexity of current and future automotive systems. Sometimes, though, this distinction between dictate and framework is forgotten.
Here’s an excerpt from the ISO 262626:2018 abstract:
“This document describes a framework for functional safety to assist the development of safety-related E/E systems. This framework is intended to be used to integrate functional safety activities into a company-specific development framework.”
Safety in complex automotive products built through an extended value chain requires cooperation between participants. This collaboration has become even more important as systems have become more sophisticated, moving towards “supercomputers on wheels” and an “internet of cars.” There is change affecting people from two directions: First, established automotive microcontroller unit (MCU) builders are adapting to almost unfathomable increases in semiconductor complexity as they transition from single-function MCUs to huge SoCs. And second, hungry new entrants are just learning about automotive functional safety.
These changes make for some interesting learning experiences. One thing that trips designers up is assumptions of use (AoUs). No matter how well written, natural language limitations will always have some level of ambiguity. AoUs are one way to tighten up the specification by adding supplier expectations and requirements for using an IP or system. A key point is that engineers must check the AoUs very carefully – before the building begins. Sometimes, though, designers read and think an AoU will be covered, check again when the design is almost complete, and realize an important interpretation was missed. In rare instances, the functional safety team members are pulled into an SoC design too late in the process, after the register-transfer level (RTL) has been frozen or the chip has taped out.
At this point, the customer (“integrator” in ISO 26262-speak) may invite the supplier to discuss possible remedies. Of course, the supplier will help solve this problem because it values strong relationships. Suggestions may include reconfiguring the network-on-chip, other IP, or software, checking some trial configurations, reanalyzing diagnostic coverage, and/or recommending a dreaded post-tapeout engineering change order (ECO). No matter what, the supplier and integrator are expected to collaborate to fix the problem.
Still, these are Hail Mary passes to solve a complex issue, and this is no way to run a business. Usually, there is a fix to achieve the desired system-wide automotive safety integrity level (ASIL), but there is no guarantee that this will always be the case. Depending on late-in-the-schedule measures is not only risky but also much less efficient than if the issues were found earlier in the development schedule. Now, if the integrator makes a mistake, certainly, that company is responsible for fixing the problem. And if the integrator makes the error and the supplier helps fix the problem, that is a good thing. However, as with all quality and functional safety standards, process improvement is imperative and repeat instances of failing to follow AoUs are evidence of systematic failures in a company’s product development process.
Bottom line: Always Read The Fine (Safety) Manual, i.e., RTFM, at the beginning of the design process and ask when you see any ambiguities!
Equally, if the customer does not know the desired outcome or is unwilling to share information with its suppliers, a partner cannot be expected to have infinite patience on this voyage of discovery. It is optimal for the integrator to have well-documented safety goals and functional requirements at the beginning of a chip project so that IP and software providers can assess suitability of their products for the intended applications and provide additional information that may impact these goals. Collaboration works best when goals are clearly defined with shared context and understanding by all.
Going back to the regulation versus guideline discussion, what is more important – meeting a safety goal or formatting a table exactly the same way someone learned in a training course? Sometimes engineers take away from training more than intended, for example, expecting that all failure modes, effects, and diagnostic analysis tables have the same columns in the same order as learned in class. Instead, safety engineers should follow the intent to relay the details of safety coverage. Each company has reasons for varying the format here and there. If the tables convey the necessary information, that should be sufficient. [As a side note, the Accellera Functional Safety Working Group and the IEEE P2851 Working Group are working together to create standards for exchange and interoperability of safety analysis and verification data. If you have an interest in addressing these issues, then please join us in these working groups!]
Quite often, the integrator must collaborate with the Tier 1 or original equipment manufacturer further “up” the value chain. Sometimes a problem in the larger system ripples back to chip design. Maybe the supplier found another issue, perhaps a place where safety mitigation was not included, and the RTL code is already frozen. What options are left? If the loss of coverage is relatively small, maybe there is a debate between the chip maker and the Tier 1. Whose fault was it anyway, and is it really so important in the big safety picture? Should we add a late fix to address a small issue in diagnostic coverage, even though the system will still meet its intended ASIL? Or will late fixes actually decrease system-level safety by introducing systematic errors?
Of course, if the impact is critically important, corrective action is essential. This could be a minor software fix (though software engineers never think of these as “minor”!), an engineering change order (ECO) to the hardware that requires work on the netlist, or in the worst case, a total re-spin including resynthesis of some parts of the chip logic. Open communication and collaboration between IP and software suppliers and chip integrators throughout the design process is critical to reducing frequency and severity of issues when they do occur.
It is essential to understand that the ISO 26262 safety standard is a framework. Practical partners are willing to adjust to each other to meet the spirit of the framework by having common understanding, expectations, and guidelines. In value chain interfaces between companies that provide different types of deliverables (IP vs. software vs. chips vs. systems), there will always be some ambiguity, interpretation, discussion, and eventually common understanding. What makes ISO 26262 work is good preparation, clear communication, and a culture of working together to achieve a shared common goal.
Learn more about Arteris IP’s investment in functional safety here.
Leave a Reply