Verification And Test Of Safety And Security

Functional verification is being stretched beyond its capabilities to ensure safe and secure systems. New support is coming from hardware and software.

popularity

Functional verification can cost as much as design, but new capabilities are piling onto an already stressed verification methodology, leaving solutions fragmented and incomplete.

In a perfect world, a semiconductor device would be verified to operate according to its complete specification, and continue to operate correctly over the course of its useful life. The reality, however, is this is becoming much more difficult. Chips are being deployed in more mission-critical and safety-critical applications. Device geometries make random faults more likely. Aging can cause chips to degrade over time. And it has become very lucrative to force devices to do things outside their intended functionality.

The list of new problems is extensive. Quite often, additional capabilities are inserted into a design to deal with these issues. This places additional burden on the verification and testing of the device, both during the design phase and throughout the entire lifecycle of the product.

The entire subject area is huge, but it can be visualized in a very simplified form as a three-dimensional matrix (see figure 1), where the axes are pre- and post-silicon, safety and security, and hardware and software. Many issues span large areas of the total matrix, and while some of them may be viewed separately today, they are all tightly intertwined. For example, many companies will have separate safety and security experts, but you cannot have a safe system that is not secure. There also is a separation between hardware and software teams, even though designs rely on tighter integration to improve power and performance. And a continuum between pre- and post-silicon has provided a newly emerging opportunity.

Fig. 1: Matrix of issues associated with safety and security. Source: Semiconductor Engineering

Fig. 1: Matrix of issues associated with safety and security. Source: Semiconductor Engineering

“Safe, secure, and reliable products require sophisticated methods,” says Roland Jancke, design methodology head in Fraunhofer IIS’ Engineering of Adaptive Systems Division. “These need to be established within companies as well as across the supply chain, implemented in respective design tools, and followed during the complete development and validation process.”

But the design industry has avoided adding the necessary methods and circuitry because they are expensive. “Safety is not free,” says Jake Wiltgen, functional safety and autonomous solutions manager at Siemens Digital Industries Software. “There are going to be power, performance, area impacts. What is your safety target, what’s the impact to your silicon footprint, and what’s your power budget? It is a bit of a balancing act.”

Others agree. “Safety requires a lot of investment,” says Robert Serphillips, product manager for Veloce, DFT, and functional safety at Siemens. “It’s a lot of man hours, a lot of chugging along by simulation and emulation to prove these things out. Safety is not a point tool that you go out and buy. It’s an entire change of the way you think about designing your chip and building your chip, validating your chip.”

In the past, verification was done before the device went to tape-out. Now it needs to happen throughout the design flow, and in some cases even in the field. “Safety and reliability requirements need to be understood and managed over a wide set of operating conditions and increasingly longer product lifetimes,” says Dan Alexandrescu, R&D engineer, senior staff at Synopsys. “The unprecedented use of cutting-edge technological fabrication processes, advanced IPs, and complex designs exposes automotive IC and solutions providers to risks caused by process variability, aging, and degradation.”

Standards only add to the cost and the potential disruption. “You need a whole bunch of new processes because there are regulations that you need to adhere to,” says Johannes Stahl, senior director of product line management for the Systems Design Group at Synopsys. “That causes a certain amount of new effort, or you could call it overhead. This is overhead that verification teams were not previously aware of. They had to reconsider their flows, and basically do more verification runs to make sure that not only the functional bugs are being caught, but also that safety relevant scenarios are being investigated. And there are many of them that involve both software and hardware.”

There is always a fine line when self-certification is allowed. “Sometimes you will need to have somebody independently look at it,” says Frank Schirrmeister, vice president of marketing at Arteris IP. “There are businesses that concentrate on certifications and compliance — organizations that take an independent look at what you have been doing. That increases the comfort level of the consumer, or the purchaser of your product. Some of them are executable standards, like system readiness and that extends into things in a hardware software perspective like the Scalable Open Architecture for Embedded Edge (SOAFEE) collaboration for automotive.”

But simulation alone is not enough. “Simulation can help with safety, qualifying how resilient things are to faults, and security,” says Simon Davidmann, founder and CEO for Imperas Software. “Fault simulation is not a new technology, but many people are now using it for the first time. The future is better technologies in the hardware, and that requires new verification technologies and to think differently about how things are done. As we move to better hardware architectures, we will need to build better verification and analysis technologies.”

Safety and security
Safety and security have a lot of similarities and overlaps. “There are differences that need to be considered,” says Siemens’ Wiltgen. “When we think about safety, there is a threat space that is static. It’s known and is often quite expansive. There’s a lot of methodology and tooling required to ensure — at least from a safety standpoint, during runtime operation of an IC or a larger system — that it will operate correctly. And that’s not just functioning correctly due to requirements, but also behaving correctly in the presence of random hardware failures. This is a known threat space, and even that is often hard to close during development. Where it differs from security is that you now have to deal with new attack vectors. The problem becomes dynamic.”

You can’t have one without the other. “If you have a safety issue, you can have a fault in a design. And suddenly, because of that fault, your system is now insecure and is spilling out all kinds of secure content,” says Manish Pandey, vice president of engineering for the EDA Group at Synopsys. “There is a strong relationship between the two. There are various methodologies that addresses both parts, but I don’t think I have seen any good, unified approach where the two concerns are considered together. Faults can have a fairly significant impact on security.”

“And vice versa,” adds Synopsys’ Stahl. “Somebody can tamper with a system and get into your system, and then your safety is really affected. It’s a dependency in both directions from a technical perspective, but the disciplines are specialized. Companies have people who are concerned with functional safety, and people concerned with security, very specialized, and it’s unclear if they even talk.”

The dynamic nature of threats means that future designs may have to take into account things found in the field. “This is where feedback loops during the operational life of an IC are important,” says Wiltgen. “It’s going to continue to grow in importance — being able to look at data, predict data, watch what is happening at runtime, feed that data back into development life cycles so that you can understand failures when they occur, even predict failures prior to occurrence.”

Other aspects of the matrix besides safety and security also are separated today. “Safety and security practices really have to be extended,” says Synopsys’ Pandey. “Today it is also compartmentalized on the hardware side and software side, and further compartmentalized through the complete integrated stack. The systems of the future, when you can reason about them, should really consider the whole hardware, firmware, software stack. That’s where more and more of this security and safety practices will be heading.”

Pre- and post-silicon
While functional verification and safety used to be done exclusively pre-silicon, it is becoming increasingly difficult to maintain this separation. Not only does ageing impact safety, sometimes in as yet unknown ways, but new security threats often require device updates through firmware or software patches.

“Safety is a multi-layered approach that starts at design, through verification, through manufacturing test, through into production,” says Siemens’ Serphillips. “It’s not a product. Safety is not something you can think about after the fact. You can’t build a chip and then add some safety software to it. Is that being done in the industry? Yes, but there’s a lot of challenges associated with that.”

Manufacturing defects used to be detected at test time, and assuming a burn-in process was used to weed out the infant mortalities, devices were expected to continue working for their specified lifetime. “Manufacturing defects or circuit sensitivities involve process, voltage, or temperature, and are detected during test,” adds Serphillips. “Those are your defined operating conditions, but once you get into the field, you can’t always guarantee that all parts are going to behave the same way. You could have test escapes or circuit sensitivities that worked fine in test, but when you put it out into the real world, into a vehicle or a tower or something like that, it starts behaving slightly differently than what you anticipated in the controlled environment.”

Standards are catching up. “To address these new challenges, the work group ISO/TR 9839, ‘Road vehicles — Application of predictive maintenance to hardware with ISO 26262-5,’ established a list of recommendations for the third edition of ISO 26262,” says Dan Alexandrescu, reliability core team leader within Synopsys’ Strategic System Engineering. “In this proposal, degrading intermittent faults caused by aging must be managed with similar consideration as permanent and transient faults, from early modeling and design stages to in-field.”

New process geometries have demonstrated shorter lifetimes due to a number of physical effects. “Functional verification can be used to qualify the part in the first place, but how do I monitor this over the lifetime and to meet the higher ASIL levels like C and especially D?” asks Pete Hardee, group director for product management at Cadence. “You need to incorporate failure detection circuitry. That could be memory BiST, logical BiST, those kinds of things. To meet those higher levels, the effect on the actual design process means you have a lot more built-in test. This is not manufacturing test. You’re not detecting faults that were introduced at the manufacturing stage. You are detecting faults that happen due to aging, changes in thresholds, failure due to end of life of components.”

Hardware and software
It is said so many times that modern systems are defined by their software, and yet software and hardware are still generated in separated silos. Rarely do they come together for a unified verification phase, either pre- or post-silicon, and they certainly are not certified together.

Standards like ISO 26262 only talk about hardware. “In the automotive industry, we take a too hardware-centric view of things,” says Synopsys’ Pandey. “If you look at the infrastructure built by some of the superscalers, they have tens of thousands of machines and hardware constantly fails. The system almost never goes down because they have built a sophisticated software layer. They have distributed systems on top of this to make sure the systems are highly available, protected against multiple faults. Some of those techniques will increasingly become adopted. Despite hardware failures, they automatically isolate the bad hardware. The software stack in automotive and other critical systems has to evolve.”

But even that says that software needs to be tested when faults are injected into the hardware. “Some companies are doing this,” says Imperas’ Davidmann. “They are trying to see if the software is resilient to things that like bit flips. That can be modeled accurately by bit flipping in the virtual platform, because the software can’t tell it’s not running on the real hardware anyway.”

Understanding the interactions of both hardware and software is essential. “In the case of a combined hardware-software system, you really have to be taking the analysis down to the hardware level during the initial architecture phases, introducing clear boundaries and clear descriptions of what is going to be handled in hardware, what’s going to be handled in software, and how those things will interact with each other,” says Mike Borza, scientist in Synopsys’ Solutions Group. “Some of the automation that is provided by the software integrity group doesn’t work well at the deeply embedded level. That’s not a criticism of any of the work that they’re doing. It’s more of a reflection of the different markets, because a lot of their market is really concerned with end user applications. It’s appropriate for those to be looking at the world as a mostly software world running on fairly large classes of compute devices.”

But should software be part of the certification process? “This is perhaps more of an issue for cybersecurity than it is for functional safety,” says Cadence’s Hardee. “For functional safety, while we know software can be buggy and software can react differently to different inputs, software doesn’t have a failure mechanism over time in the same way that hardware does. For the functional safety point of view, introducing software upgrades could introduce issues if you start to use the hardware in ways than it wasn’t tested for. But there’s no real way to test for that. The certification, at least at the component level, is purely on the hardware.”

Conclusion
This article hardly even defined the space that is encapsulated by the problem statement. How to address them requires paying attention to the architectures of both the hardware and software, the development processes, and the tools used. Increasingly, this is not being left to current verification techniques, but is being extended to addition hardware support in the devices that can monitor what is happening and to both identify and deal with failures or intrusions.

The industry is still in a learning phase. Many of the standards and practices in use are far from complete or ideal. Even within companies, segregated teams and responsibilities make it difficult to create unified approaches to deal with the issues. Perhaps that is necessary while the industry learns, but it needs to realize that it is not an ideal situation and look for ways to bring the necessary disciplines together as quickly as possible.



Leave a Reply


(Note: This name will be displayed publicly)