Rethinking Verification For Cars

Second of two parts: Why economies of scale don’t work in safety-critical markets.


New tools, approaches, and methodologies are in various stages of development and deployment under the umbrella of functional safety, as more electronics find their way into cars, medical devices and industrial applications.

As shown in part one, verification needs to be rethought for these applications. Underneath the umbrella will be ways of doing negative testing, ways of categorizing, and measuring negative testing. Effectively, this will look like fault injection technology such that failures that can occur can be described in electronic form, said John Brennan, product management director at Cadence.

But functional safety is not something like a layer that you can add on, and forget about the details. It needs to be architected.

“You must understand how to ensure I have the architectural detail enabled in every component in my design,” said Anush Mohandass, vice president of marketing and business development at NetSpeed Systems. “How do I ensure I go from fail safe to fail operational? The methodologies are being formulated right now.”

Different economies for IoT, automotive
On the surface, there are many similarities in the economic models of IoT and automotive. But Mark Olen, product marketing group manager at Mentor Graphics, says there is a big difference in the amount of time spent on verification for ISO 26262 certification. And that time could rise by a factor of two or three if they meet the spec and add on failure analysis.

This is where the economics begin looking far less attractive. The last generation of chips many of these companies designed for the automotive market were not safety-critical rated. So while it will take two or three times as long, they might only get a 10% pricing premium.

“Economically, how do you produce a product that only generates 10% more revenue but could cost you two to three times as much in a major part of the design and verification process?” asked Olen. “And where that has a similarly in IoT, it’s got a similar economic value.”

Given these considerations, Olen views IoT and automotive/safety critical as having very similar economic models. From a technology or market point of view, Mentor looks at the safety critical aspects similar to what happened with low power 5 to 10 years ago. Originally, it was the wireless industry that cared about it, and low-power techniques were adopted in cell phones early on. The EDA industry tried to figure out if it was a coming trend or just a niche in that particular market. Of course, now there are very few industries that design without care of power consumption.

Similarly, the EDA industry is looking at safety-critical design methodology and trying to determine if it’s going to go the way of low power, where it starts out in automotive, and spreads to any application where a human is involved.

“We’ve come to the conclusion that it is going to follow the path that low power followed, where a particular industry drove its initial need but then it branched and grew and blossomed from there. If I’m going to get on my smartphone, get on the IoT to turn on my microwave five minutes before I get home to start defrosting my dinner, that’s just as safety-critical because it could burn down my house,” he said.

The impact on verification is significant. “The philosophy is to verify everything as it is supposed to work, so all tests are written to confirm proper operation or that things work properly. We’ve spent very little time analyzing the impact if something actually doesn’t work properly. In manufacturing test, the focus has always been on what happens when something goes wrong—fault simulation. Let’s test every chip and let’s apply this ‘stuck at fault high’ and see what happens, then ‘stuck at fault low.’ Then you move into the design verification realm and it switches to everything that works well and properly. We’ll have to spend more time looking at that impact,” Olen said.

To this end, he spelled out three main areas that need understanding and development for functional safety. The first is understanding what’s critical, what’s not, and whether this can be automated. Second is random-failure analysis.

“A random failure is a failure that occurs to a design in the filed while it is operating, and something actually caused a change to the design — either permanently or temporarily. You might have spilled coffee on something, or a gamma ray causes temporary failures because our geometries are sub-22nm. In those cases, it is caused by a change in the design,” Olen explained.

Interestingly, the knee-jerk reaction to this was to resurrect fault simulation, but that’s just not going to happen, he said. “If you run a good circuit simulation on a 10 million-gate design, which is not out of the ordinary today, let’s suppose that it probably takes you a day. But suppose it took you an hour. Now if you’ve got 10 million gates, that would equate to 20 million raw faults. Let’s suppose you’re really smart at collapsing those faults and equivalencing them, and a good compression algorithm could probably get you down by a factor of 10, so those 20 million faults could now be compressed down to 2 million or even 1 million faults. If you were to run serial simulations of each one of those faults times an hour, you’d have a million hours of simulation. We all know that we’ve got parallel fault simulators and concurrent fault simulators. Coming out of the test world, what people forget about is that fault simulators were not invented to analyze designs. They were invented to analyze the tests. They analyze the test and tell you if the tests are good enough.”

The third area that needs development for functional safety is systematic-failure analysis, Olen said. “A systematic failure is one that occurs in the field, but there is no physical change in the design. The design just fails. That one is interesting because, in some respects, that just means something wasn’t verified. You could call this type of analysis just doing better verification, and it is starting to drive more engineering teams to do coverage-driven verification. What ties in here is the Portable Stimulus standard efforts because it is like a next-gen constrained random testing.”

He believes the work in Portable Stimulus is going to be one of many parts in the safety critical automotive space because it is meant to help teams do a better job of broader verification. Cadence, Mentor Graphics, Breker, and others are investing here.

With fault simulation effectively off the table for Mentor Graphics, the company is researching derivatives of formal verification. That includes using formal technologies to find single event upset issues, rather than relying on traditional assertion solvers. The goal is to analyze a design for its behavior when a failure is injected.

David Kelf, vice president of marketing at OneSpin Solutions, also believes formal verification is ideal for these systems. “Engineers must build capabilities into the device to allow them to trap and repair random faults that might occur in the device during operation due to electromagnetic interference, overheating or other causes. The verification of these fault-handling circuits requires faults to be inserted in the device during verification, and their outcome on device operation recorded. Interestingly, this verification approach is similar to that of Design for Test from many years back, and similar techniques are often used.”

But it is early days for the functional safety space, and many paths are being explored.

Simon Davidmann, CEO of Imperas, pointed out that historically, fault simulation was about the gate level, and then started to move to the RTL. The problem is the devices that are going into cars may have a 32-bit ARM or MIPS cores in them, operating at 100MHz, and running tens of millions of instructions a second.

“You can’t simulate this with a fault simulator, even at the RTL level,” said Davidmann. “So if you need to demonstrate that it’s tolerant of these memory bit flips, how do you do that? The methodology of fault simulation is a good idea, but the old fault simulators are only good for small blocks of logic. They will not do your systems that include processors. They are only good for RTL. They are good for block-level design. But they are no good to validate your whole system. As the systems get more complex and more dependent on software, the requirements for safety, especially in automotive, are getting more and more stringent, as well. We are finding that with the old tools, the concepts are applicable but the actual tools don’t work. With traditional gate-level/RTL simulation you can’t boot software on it. This is why the big EDA players are getting into hardware assist (emulators) again because they can’t see any other way to do it.”

Other companies, like Imperas, and universities, like UFRGS/LIRMM are working with fault simulation at the instruction-accurate level. This work is processing similar fault sets and getting similar coverage results, but using instruction-accurate as opposed to cycle-accurate (gate level) simulation, while running thousands of times faster than gate-level fault simulation approaches. It is these new approaches that will be most useful for certification and compliance with automotive standards such as ISO 26262.

Fundamental differences
When it comes down to it, verification for IoT and automotive are conceptually different. Automotive is forcing people to have standards that validate things because of the safety issues. IoT, regardless of whether it is industrial, automotive, medical, or consumer — is all about interconnected, communication between complex processing subsystems. Both areas are concerned about security and the need to protect themselves from intrusion and harm, but the verification required in each is different.

Still, as the challenges evolve, the tools must also evolve.

“The old simulators can only run 10 million instructions per second, but it now has to run 100 million just because the performance of the hardware has increased of the electronic product,” Davidmann said. “So you have to do much more verification in the same time. If you’ve got a million lines of code, you can’t run it over a month. You’ve got to do it in a day or an hour because everybody is checking code changes in, and you need to use modern software development practices like continuous integration. Do we need new tools? Yes, we do, whether they are the same and just faster. But absolutely the tools need to be evolving and improving their performance as fast as the performance increases in the products.”

Related Stories
Rethinking Verification For Cars (Part 1)
How the car industry can improve reliability.
What Can Go Wrong In Automotive (Part 1)
Security, diagnostics, standards and the future of autonomous vehicles.
Security And Technology Questions Persist
Fallout could be slower adoption of autonomous vehicles as ecosystem proceeds with caution.
The Higher Cost Of Automotive
Suppliers looking to enter this market pay a premium in design time, certification and verification requirements.