Variables Complicate Safety-Critical Device Verification

Experts at the Table: What’s the best way to approach designs like AI chips for automotive that can stand the test of time?


The inclusion of AI chips in automotive and increasingly in avionics has put a spotlight on advanced-node designs that can meet all of the ASIL-D requirements for temperature and stress. How should designers approach this task, particularly when these devices need to last longer than the applications? Semiconductor Engineering sat down to discuss these issues with Kurt Shuler, vice president of marketing at Arteris IP; Frank Schirrmeister, senior group director, solutions marketing at Cadence; Ted Miracco, CEO of Cylynt; Dean Drako, CEO of Drako Motors; Michael Haight, director of business management, Micros, Security & Software Business Unit at Maxim Integrated; Neil Hand, director of marketing for digital verification technologies at Mentor, a Siemens Business; Sergio Marchese, technical marketing manager at OneSpin Solutions; Marc Serughetti, senior director, verification group at Synopsys; and Hagai Arbel, CEO of Vtool.

SE: Where does the industry stand with the task of verifying safety-critical devices today?

Haight: It doesn’t seem universally adopted in all systems of cars shipping today, though robust security solutions exist in the form of Hardware Security Modules (HSMs) from a number of semiconductor suppliers. At Maxim, we are beginning to see an uptick on interest in our automotive secure authenticator for authenticating safety-critical automotive components where an HSM might be overkill in physical size, complexity, power consumption, and cost. We also see standards such as ISO 21434 nearing publication that call for security to be considered at all phases of a car’s life cycle. While the standard is not expected to call out specific algorithms or implementations, we believe this and government regulation like EU Cybersecurity Act will drive increasing requirements for automotive authentication by the OEMs for safety-critical devices.

Miracco: When it comes to safety-critical devices, there are a tremendous amount of vulnerabilities. Software is especially vulnerable, which needs better code hardening, a better process around the security of the software, and the management of the software. Also, we see a lot of espionage, and the semiconductor industry is a highly targeted industry. It’s a very vulnerable industry, and there has been a lot of theft of not only source code, but design code and IP. In the semiconductor industry, there needs to be better accountability. There’s a lot to be done. I don’t think we’re in a very good place. A lot of industrial espionage is going on and a lot of IP is not being properly protected, and not being paid for from a royalty perspective when it gets used.

Arbel: First, safety-critical verification means that verification needs to be done by the book. Basically, if verification is done by the book, it meets, by definition, most of the requirements of ISO 26262. Of course, that’s not happening yet, so it forces companies to do it in a better, more constructive way. Second, there is really no ‘done’ in verification. You go to tape-out when you feel comfortable enough or safe enough that your device is going to function. But now you have to prove it, and when lives are depending on that, you need to somehow converge the cycle and measure your maturity of verification much better. In that sense, there should be better tooling for verification convergence, verification debug, and verification maturity. The whole definition of ‘done’ in verification is going to take more of a serious aspect in the chip design cycle.

Hand: The big challenge when it comes to safety-critical designs is that for decades we’ve been building designs for functionality. We design for that functionality. We’re trying to think of what we need to do. Will we break that functionality? When we look at safety-critical designs, we’re looking at designing for resilience, whether that be for functional safety or security. In many ways, it’s trying to verify for how things will function incorrectly — ‘What’s going to happen when things go bad?’ It’s a different mindset. It’s a mindset that we have to provide tools for. It’s a way of saying when, not if, something goes wrong in ways it was never designed for, because you’re assuming an underlying hardware fault or an inconceivable way of disrupting the design either for safety or for security or anything like that. You’ve got to give people that insight, and that’s a very different way for both designers and architects to think. It’s fundamentally different. We’ve got decades of figuring out if a design works and how does it not work when we intend it to not work — constrained random, complex verification environments. Once you get to functional safety, it’s a completely different mindset. Safety criticality, whether it be through security or through random failures, is a whole new way of looking at the problem. It’s a whole new set of skills that designers need to learn, and as an industry we have to enable them with the tools to do that.

Serughetti: When we look at what’s happening in the market, functional-safety and safety-critical aspects are really important. First, functional verification is key to starting good functional-safety security. What we’re seeing and what we’re looking at are multiple dimensions that are changing. Safety is not standalone. It goes with security. Everything is linked together. Second, you don’t address safety in one place. There’s the design aspect, and there’s the verification aspect. The third dimension is really that when we look at it, safety is not hardware, it is not software. It’s both of them — and the system, as well. Complexity in the activities related to safety-critical is increasing, and as a result we need to think about verification, but also more intelligent ways to look at those aspects because the systems are becoming more complex.

Schirrmeister: I agree there’s a need for mirroring the verification flow and the question of when we are done with a safety verification flow. There is no security without safety, and no safety without security. They are intertwined. From a holistic perspective, a couple of additional elements need to be added. One is the holistic portion across the analog and the digital domains. It’s not just the digital piece and digital piece with the software piece. As Marc already pointed out, it’s including analog components. Holistic also means that it’s going across the design, verification, and implementation flow. Add to that the implementation aspect. That’s why some of the items that are currently done in the standardization committees are very important to get to a holistic view of the data formats. The last element of holistic view requires some individual specificity across application domains. We see safety today in automotive, in aero/defense quite a bit, but it goes across. But then when you talk to health care people, suddenly new standard numbers come up. When you talk to industrial people, new standard numbers come up. They’re always somewhat related, but they address the specificity of the different application domains.

Marchese: As far as the state of safety, we have seen over the past 10 years that the importance on safety has increased a lot, and now knowledge and awareness are a lot more widespread and expertise is available, particularly with the pre-silicon hardware development flow — and that’s great. A lot more is being done today. I see a lot of my old colleagues from 20 years ago — we didn’t know what safety was. Now everyone is exposed to safety at all stages of the flow. As everyone here said, of course now security is coming up strongly, and there is no safety without security for sure. Again, to make security by design, to integrate security into the whole hardware development flow, there’s going to be a lot of learning to do, there are standards coming up, like the automotive standards, which for sure are going to be very important. The industry is going to learn how to really integrate security, particularly also for safety applications because it’s very important. More awareness and knowledge are going to spread across the engineering community. That’s the next big thing.

Shuler: At the chip level we still have a situation where the verification people and methodologies are separate from the functional safety people and methodologies. This results in some overlap and rework. As tools and data interchange standards (like IEEE P2851 being led by both IEEE and Accellera) mature, we’ll be able to have more automation where functional safety validation through fault injection can be executed as part of regular verification processes. This will help everyone in the industry have more confidence that products don’t regress in diagnostic coverage as new versions are developed and will provide integrators/users of safety-critical systems to more easily perform fault injection validation of safety mechanisms if they desire.

SE: To address safety and security with the multiple dimensions and specific needs in automotive as far as safety critical aspects, do developers need an application-specific operating system for automotive?

Drako: Yes, we do. What are the major categories of electronics? Desktop, PC, laptop, phone, gaming machine, datacenter, and car. Those are the high-volume applications. For the phone, there is iOS and Android. People tried to do it with Linux but it didn’t work. For desktop, there is Windows and Mac. For data center, there is Linux, and Windows Server. For game machine, there is either Sony OS, and whatever the Xbox comes with, which is a dedicated OS for the Xbox. What’s there for the car? Nothing. Every automotive maker is trying to make their own thing. So there are 20 or 30 different OSes. The only one that’s halfway decent is the Tesla, because it’s based on Linux. But it’s evolving way beyond Linux. It’s like the iOS of cars. The other OEMs have nothing. Tesla is like the Apple of cars, and what Tesla has done is used an x86 PC, put in graphics from Nvidia, and put one computer in the car. They worked on software to make it do what it needs to do. On the other hand, if you look at a Ford or a Chrysler or a Ferrari or a Mercedes or a BMW, they have not one computer in the car, but 100, and they try and make them talk via CAN bus. The Tier 2s make their own computers and the OEM tries to hook it all up, and it is basically a mess that costs a lot more money. What they need is one OS.

Haight: Interesting question. Who would develop this? While it certainly might be helpful for developers to have a common OS for development, absent government mandates, I don’t expect this would ever be driven by automotive OEMs. This is not likely their area of expertise and it may be hard for them to monetize it if it is focused more on “under the hood” architecture than it is user experience.

If we take PCs and smart phones as case studies, these are markets that started small and a couple players developed their OS as their industries grew, steering more people to their products and services until they eventually dominated the space. In PC space this was Microsoft and Apple. Then a similar situation with Apple and Google in smartphones. Even giant Microsoft failed in the smartphone space. But the automotive industry is very mature already, and will the car’s under the hood OS really be a selling point for the consumer? It seems difficult to envision any one OEM developing an OS and then dominating with it in either open source or closed source models. On the other hand, as cars become increasingly interconnected with V2X, I envision that players like Microsoft, Apple, Google, and perhaps even Amazon will investigate business models centered around an OS that facilitates connection to their cloud service offerings. How deeply would such an OS penetrate into the car’s architecture, and would consumers have a choice to switch? The deeper it penetrates into the architecture, perhaps the more it can help safety-critical system development, but the harder it would be to offer the consumers multiple OS choices. This would likely tie an OEM down more than they want with a given OS vendor. My short answer is that I don’t see this happening any time soon for a deep under-the-hood OS.

Hand: It’s an interesting idea, but the reality is that we’ve always taken on multiple solutions to a single problem. Anything that is a homogeneous solution never seems to last very long. The most important thing is to have an understanding of the challenge you’re trying to solve, and have the tools and the capabilities to address that. We may have a very specific operating system, but there are already operating systems that are designed for automotive. There are operating systems that are designed for mil/aero purposes, and so on. But there’s never one. There’s always going to be a new design with new requirements that OS may not support. The question is, should you rely on that OS provider to do everything you need? The answer is that’s just not how the industry works. The industry finds an opportunity. The industry then grows and addresses that opportunity. There are layers below the OS. You have to make sure that hardware and software work reliably. Can you inject errors into software, and see those errors be propagated through the software and the hardware and get to the end point? The idea of an application-specific operating system is a good one. Do I think it’s the solution? No. There will always be two or three different versions, and there always has to be an underlying methodology that allows us to say, ‘How safe is this? Can we bombard it with security threats? Can we inject random faults, not just in hardware but in software and see how they propagate through the system? That’s the challenge. How do we give those tools to people to say, ‘If you’ve got a perfectly working system, go use it, it’s all good.’ To rip off Jurassic Park, ‘Nature finds a way.’ There will be another operating system. There will be another way to do it.

Miracco: I agree with Neil. I don’t think one operating system is sufficient. That’s going to be way too rigid. The only way it could work is if it was open-source, because you need a means of addressing zero day threats, and you need to be able to fix things quickly because nature will find a way. You cannot leave those vulnerabilities open for years while committees get together trying to figure out how to secure the wide-open barn door.

Schirrmeister: Don’t we already have it? There are already layers of these OSes. If you look at automotive-specific OSes, they already have vertical emphasis in Autosar, Genivi, and others. Some have open-source flavors and collaborative flavors. They handle the automotive-specific abstraction. Autosar and its various incarnations help the specifics of the design chain semiconductor Tier 1 and OEM to interact efficiently. There is a layering. There is no one for all. There are also trends with containerization where people create hypervisors, and on top of it you layer the various other components of it, like, for example, the multimedia-related aspects. If you do it more like Genivi or even an Android, you can containerize the secure elements. Some of it is there, but nature will find its way.

Serughetti: You cannot lock people into technology, because if you’re just driven by technology you’re basically going to stifle innovation. Great innovation always comes by doing something different. When it comes to safety-critical, there are multiple aspects. There’s the technology, there is the mentality of safety-critical when you do develop your product, and then there are the tools that enable you to verify into compliance, to drive compliance. You can’t lock yourself into a specific technology. It’s a complete solution you need to have. It’s a state of mind that you have to have a function for safety-critical aspects.

Shuler: Not sure what you mean by “application-specific.” They definitely need an RTOS that has been analyzed for functional safety compliance, both for diagnostics and development process. The common commercial ISO 26262-compliant automotive RTOSs I know of are QNX, ThreadX, Integrity and eMCOS. I’m sure there are more. Some of the commercial providers offer application-specific software “platforms” that bundle the RTOS with software middleware and applications for ADAS, security, etc. In addition, there are some open source RTOSs that have been analyzed for ISO 26262 compliance including SAFERTOS and Zephyr RTOS. A lot (most?) of software is developed to the AUTOSAR architecture. This is a good thing because there is more and more virtualization of software tasks and applications within these embedded systems and an agreed-upon system-level software architecture is required to allow software from different teams and companies to interoperate, especially when we’re mixing real time deadlines with virtualization.

Hand: I wanted to follow on something Frank said because he raised an interesting point. If you look at some of the modern safety-critical systems, it’s no longer one OS. There is a containerization. It’s true. You look at some of the systems we’re dealing with, there are multiple OSes on a hardware platform that spans multiple boards, multiple systems, and there’s not a single answer. So you have to figure out how to deal with these complex systems. It’s a level of complexity we haven’t dealt with even in normal, cloud compute or home compute or any of those kinds of things. It’s really complex and, when you look at it in that it’s not a single vulnerability, it’s a cascade of vulnerabilities which is always where any systemic failure lies. In the traditional sense you had a single OS, a single hardware platform, a single software platform, you secure that, you’re all good. But as Frank pointed out, that’s not most of the modern systems. Most of the modern systems have multiprocessor containerized environments, multiple OSes, and that is part of what Ted was alluding to earlier. It’s not a single solution. You’ve got to have a systemic approach to the problem.

Marchese: In automotive, the difficulty with the architecture is that it’s changing a lot, anyway, so we will see changes on the OS and the software side. We are going from independent ECUs to more centralized systems, with a lot more sensors connected. Then there are all the security requirements coming up, as well as completely new functions like AI that will function with vision autonomous driving. A lot of changes are coming up that will affect the software at all layers, from the OS to the upper layers.

Pivoting Toward Safety-Critical Verification In Cars
Experts at the Table Part 3: Changing the automotive mindset; verification after manufacturing; security updates.
Why Safety-Critical Verification Is So Difficult
Experts at the Table Part 2: Proprietary hardware makes software development more difficult; how to deal with over-the-air updates.

Leave a Reply

(Note: This name will be displayed publicly)