Verification Of Functional Safety

Part 1 of 2: How do you trade off cost and safety within an automobile? Plus, a look at some of the challenges the chip industry is facing.

popularity

Functional safety is becoming a key part of chip design, and an increasingly problematic one for many engineering teams.

Functional safety for electrical and electronic systems is nothing new. It has been an important element of the military, aerospace and medical industries for many years. But the growing importance of functional safety within the automobile industry presents a number of new twists. For one thing, cost is much more important than in other industries where functional safety has been a concern. In addition, while massive duplication was an acceptable approach in the past, growing complexity and design constraints within the automobile make that an untenable solution.

So how exactly do you trade off cost and safety? The semiconductor industry will have to come to terms with this very quickly, because it is moving at a pace that more closely matches consumer electronics than the traditional automotive industry. To make matters worse, the verification of functional safety is putting a strain on an already stretched set of tools, which have to change to meet new requirements.

“The problem of verification, which was extremely complex to begin with, has probably increased by an order of magnitude,” says Apurva Kalia, vice president of R&D in the System & Verification Group at Cadence. “Now, not only do you have to ensure that the chip is working, but you also have to ensure that it will work if there is something unplanned that may happen – a manufacturing effect, an alpha ray impingement, or even a malicious attack, which could render the chip inoperable. How do you make the entire system safe in that context?”

How similar are functional verification and functional safety verification? “They are very different, says Marc Serughetti, director of business development for automotive solutions in Synopsys.” The challenge that you have within functional safety and the type of tools required is a new area of verification. This problem has to take into account the system perspective when you go beyond the IP and SoC. Another challenge for automotive companies is how to do things earlier, and that includes doing things before the physical hardware is available. So they have a lot of interest in how to test software, how to test the system well before the silicon or hardware is available.”

Safety and ISO 26262
Today, ISO 26262 is the standard that defines what must be done for the verification of functional safety. Bryan Ramirez, strategic marketing manager at Mentor, a Siemens Business, provides a condensed summary of what we need to be concerned about. “Fundamentally, ISO 26262 sets out to define state-of-the-art practices to address two types of failures that could lead to malfunctions-systematic failures and random hardware failures. Security failures are the third type that should be considered, but they are not directly covered by ISO 26262.”

Systematic failures are defined by ISO 26262 to be “failures related in a deterministic way to a certain cause, which can only be eliminated by a change of the design or of the manufacturing process, operational procedures, documentation or other relevant factors.”

“Another way of looking at systematic failures is that they are mistakes or oversights in the design that are the result of human error somewhere along the development process,” explains Ramirez. “By contrast, random hardware failures are unpredictable in how the hardware may fail. ISO 26262 defines random hardware failures as ‘failures that can occur unpredictably during the lifetime of a hardware element, and that follow a probability distribution.’ The idea behind random failures is that a piece of hardware can be designed and built perfectly without any systematic failures in the process. But that hardware is not 100% reliable forever and has some probability of breaking down and failing over time.”

ISO 26262 puts more emphasis on random hardware failures than any other industry. “In the past this problem could be managed just by doing manual safety analysis with experts to identify how a design could fail and thus how to mitigate those failures,” Ramirez says. “This is still an important part of ISO 26262, but the size and complexity of today’s automotive SoCs make it impossible to think through every possible fault condition and failure scenario.”

On top of that, when different industries are involved, terms can become confusing. “The term functional safety was coined by system companies,” says Kalia. “The term fault tolerance came up from the semiconductor and chip designers. Since the automobile is seen as a system, the functional safety term has prevailed. But at the heart of it, the way it is detected and measured, is all about fault detection.”

Dual approach
Different challenges exist, depending upon the type of company approaching the problem. “There are two types of companies going after the next-generation SoCs for automotive,” says Serughetti. “There are the traditional consumer guys that know how to build big SoCs, but they have never considered functional safety. They have the challenge, ‘How do you apply functional safety from concept and architecture all the way through the development?’ On the other side, you have the more traditional automotive companies that were doing microcontrollers. They understand functional safety but they have no experience with chips that are getting complex and require more computing power, security and networking. The tooling that was in place was working for small designs, but is not scaling for the current designs.”

This change is happening rapidly. “Automotive chips have gone from being industry laggards a few years ago in terms of size and complexity to now being leaders,” asserts Ramirez. “Not only are these devices much more challenging than previous automotive chips but they are significantly more complex than any other ICs being developed for other safety critical markets today. In turn, this is creating opportunities for more automation to make the functional safety process more effective.”

For the semiconductor companies, there are some big surprises in store for them. “The part that changes is that in addition to quality you have to be able to trace everything that you have done,” says Alexis Boutillier, functional safety manager at ArterisIP. “Tier 1 companies that are buying the SoC want to be able to see how it was built. They want to see the procedures that were used to define the requirements. There is an entire chain of custody that is built upon traceability requirements.”

The question many semiconductor companies ask is how they deploy an existing chip within the automotive industry? “A lot of semiconductor and IP companies tend to focus on functionality and as a result, safety becomes an add-on,” says Rajesh Ramanujam, product marketing manager at NetSpeed Systems. “It is a step-child. Apart from performance and area degradation that is associated with this approach, it can have a negative impact on the functional safety integrity levels that you can achieve. Safety has to be considered a first-class issue, both from project planning perspective and throughout the development process.”

So the short answer is that it does not work. “There is a quick evolution as people realize this,” Serughetti says. “You need to start at the architecture level and think about the functional safety mechanisms. There is a lot that happens before you get to the verification phase. The key is that functional safety is not the end-of-the-line activity. There are two parallel tracks that need to talk to each other, and they bring together business decisions, costs decisions, safety decisions-and in the future, security decisions. There is a change of mentality that needs to happen for companies coming into this market. For people who are already in this market, it becomes a question of scalability of the environment used for functional verification.”

Boutillier points to three Ps – people, process and product. “When you look at a new product, you have to look at what you can do with your people and what training they need. You have to look at the process to see how to enhance your capability, and then you look at the product to see how you can perform this analysis and build safety into the product. When I look at traditional automotive people, they have an understanding of the safety, but they usually come from simple microcontrollers. When they try to build a very complex design, they are blocked by their way of doing things. They would duplicate everything, and that was a good methodology. But when you have 20 million gates, it is no longer possible. Their issue is one of scaling. They need to be able to partition what they are building.”

So both aspects of the problem run into scaling issues. On the traditional automotive side, the challenge is building in redundancy, such that you can be protected from random failures. On the semiconductor side, it is scaling the verification tools and methodologies so that tools can assert that all failures can be detected and handled without unnecessary duplication.

The path forward
The industry is in flux today, and there is even a divide between what is happening in the real world versus where the industry standards are going. “My prediction is that ISO 26262 will have to change,” says Kalia. “ISO standards tend to be slow to change. And looking at it from the point of view of a gate-level ADAS chip, which is 200 million gates, it is not going to scale. I was hoping that the upcoming version that is scheduled to be released in April would cover this, but it will not. It still focuses on the gate level. Subsequent versions will have to take this to a higher level, at least to RTL and maybe even to the architectural level. Otherwise the problem does not scale.”

This is drawing increasing attention to the system level, even for IP suppliers. “A lot of customers at the IP and SoC-level may not have or understand the system-level context, so they are from an ISO perspective, a safety element out of context,” explains David Hsu, director of product marketing for Synopsys. “They may be trying to make their product suitable for ASIL D applications, and if so, they need to equip their IP or SoC with the right collection of safety mechanisms. Some of those may be hardware and may use redundancy, and that is a way to avoid doing some of the functional safety tasks because it is by definition going to be safe. But that is expensive, so there are other ways that are hardware-based or potentially software-based. You have to have a system-level perspective.”

But you also have to prove that those safety mechanisms are up to the task. “Hardware safety mechanisms are a core ingredient to enable functional safety and they need to be verified from both qualitative and quantitative perspectives,” says Jörg Grosse, product manager for functional safety at OneSpin Solutions. “Since the purpose of hardware safety mechanisms is to protect against random faults, some type of fault injection is recommended by the safety standards to verify them. On a basic level, this means that a tool must inject a fault and verify that the safety mechanism can detect it.”

Automobiles are also deploying emerging techniques, such as neural networks, which add other factors into the notion of functional safety. “Neural networks and artificial intelligence are bringing in a new concept in terms of how decisions are made and how you verify it before the system is in place,” says Serughetti. “If you find a problem, how quickly can you update the system? In aerospace, when you find a problem, the industry grounds the entire fleet of airplanes. The automotive industry is different here. You cannot ground all of the cars because there is a defect. There is an evolution where people are trying to figure out how to do verification. There will continue to be all of the current verification techniques used on the hardware, on the software, and on the system. But neural networks bring another dimension, and there are uncertainties.”

Part two of this article will examine both the design for safety approaches being adopted by the industry and the verification tools and methodologies being developed to show that a design is safe.

Related Stories
Functional Safety Issues Rising
Cost and time spent in simulation and test grow as more chips are developed for automotive, industrial and medical markets.
Is Verification Falling Behind?
It’s becoming harder for tools and methodologies to keep up with increasing design complexity. How to prevent your design from being compromised.
Tech Talk: ADAS
What will change in automotive design on the road to autonomous vehicles.
Tech Talk: ISO 26262
What’s new in the automotive standard and how to design cars that can fail safely



3 comments

Syed Hussain says:

Nice informative article, when we gonna see part 2 of this article?

Brian Bailey says:

Part 2 should be out the last week of February (Next SLD release).

Rohilla Arbind says:

Very nicely written ! will wait for part 2

Leave a Reply