Experts at the Table: Keeping systems running for decades can cause issues ranging from compatibility and completeness of updates to unexpected security holes.
Semiconductor Engineering sat down to discuss the myriad challenges associated with chips used in complex systems over longer periods of time them with Jean-Marie Brunet, senior director for the Emulation Division at Siemens EDA; Frank Schirrmeister, senior group director for solution marketing at Cadence; Maurizio Griva, R&D Manager at Reply; and Laurent Maillet-Contoz, system and architect specialist at STMicroelectronics. This discussion was held at the recent Design Automation and Test In Europe (DATE) conference. To view part one, click here.
SE: Chips in complex systems are expected to last a lot longer than in the past. In cars, that’s more than a decade. In industrial robotics, it’s more than 20 years. How does this change design and verification?
Schirrmeister: We’re seeing a shift left of early development, and then there’s feedback loop, which leads to domain-specific architectures. So you build in the ability to maintain and upgrade, to a certain extent. This is part of the idea of long-lifecycle markets, which includes aerospace and defense, where you have various upgrades throughout the lifecycle of a product. Divide-and-conquer is one aspect of this. You need to make sure the individual components have interfaces that over time are constant enough that you can change something inside. But you also need to build in the flexibility to upgrade. That’s where the whole notion of configurable processing comes. With the software, how do you make sure you leave enough room for upgrades? It adds a new challenge to this whole notion of design for performance and cost. If you design your component for between the 95% and 99% performance load level, and then you need to add something later, that poses a challenge because you have to replace a bigger component to do that. Those are new design considerations. And that leads to the whole notion of interfacing of EDA tools to tools from the systems world, like the whole PLM domain.
Brunet: Up until couple of years ago, when you had a problem with a car, you would take it to the car dealer. And most of the time it required a hardware replacement. Now you see companies reboot the software with over-the air communication to your car. The software is completely changed. What’s really happening is they’re doing preventive maintenance. But to be able to connect to certain high-tech components and do preventive maintenance locally on the system, you need to design that in from the get-go. The hardware and the software has to have flexibility to be able to do that, and the technology has to be able to connect. It’s a complete integration between the software and the hardware. But you need that flexibility from the very beginning of the design.
Laurent Maillet-Contoz: That calls for new methods for design. We have designed for test. Now we have designed for upgradeability somehow. This should be considered at the device level or the component level, but that along is not enough. You also need to consider and validate in advance that the design you are planning to deploy will indeed reach those targets. You need to anticipate the issues you might face with communication capabilities. In the case of a car, you need to be in a place where your are certain it will be downloaded and installed correctly. And if you do it over the air, you need to make sure that you have the the appropriate systems in terms of security to ensure the integrity of the upgrade. And you need to make sure that functionality-wise, the system with your upgrade will continue to execute correctly. So you need to have the mechanisms in place before the deployment to make sure that this upgrade will be okay.
Griva: Secure over-the-air upgrades are now a must-have feature. In the past, it was a nice-to-have feature. When Apple entered the cell phone market, it disrupted the entire market. The leaders at that time were Nokia and Motorola, but they did not develop a system where you could download an application or even upgrade the software. Apple did this at the ground level. It was not an add-on feature for some particular applications. It was something you could do, and must do, on a regular basis. So having OTA ready on devices is a ‘must-have.’ Older devices that have wireless connectivity, or least have some kind of connectivity with proximity nodes or with central systems, now are required to be upgradeable. But changing the software on such systems creates a lot of challenges. First, you must be 100% sure you don’t break the system. Imagine if your Tesla reboots during the night with the wrong version of software, and the car simply doesn’t switch on again. That means you have to get somebody to come there. If it’s in your garage, it’s simple. If it’s in the middle of a place where you spent the night, it’s much more complex. Trying to ‘unbreak’ the system is not acceptable. Another problem is that we not only must be able to change the applications, but we must be able to patch the operating system and the DSP. All the libraries that are needed are now part of the group of things that you must change over the air. How can you really be 100% sure that your signature software is secure and nobody can break into it and provide a fake version of your PLC software, which may run a turbine on a plane or the engine on your car? A second part of the reliability of a system involves the speed and the quantity of the data you must change. You need to be sure the data from the customer — like security keys or neural networks that have been taught to recognize speech on the device — are not changed because they are specific to that device.
Schirrmeister: Application specificity drives some of those requirements. There’s a huge difference between an iPad not working on a family trip and something in an industrial environment, where robots in a fab are covering thousands of miles. If you upgrade a robot at the wrong time, it will cost you that many wafers. And if my car is booting up, and suddenly I have an emergency and I need to leave my garage while the update is running, that can’t happen. So it becomes very application-specific, and in the case of health and medical electronics, it might even be life-threatening.
SE: Security really has not been addressed in a lot of designs in the past. But with cyber-physical systems, security potentially can affect human life. In the past, security typically focused on software. Increasingly, it needs to involve the hardware. There are firmware updates, much tighter hardware-software integration, and there are implications for both AI training and inferencing. What needs to change to be able to add this in from the start?
Griva: It used to be that we would start security by design with software development, system development, lifecycle design. And then we said, ‘Okay, let’s start with no security at all, and we will add it later.’ That absolutely compromises the approach of doing things better at the beginning. The IoT has been designed without no security at all, which poses a problem in the system development lifecycle of pattern and process from the beginning.
I’m not talking of high-level, secure-by-design sectors like aerospace and avionics. But cars, which we expect from an electronic point of view to be pretty secure, can be breached through the CAN bus, simulating a legitimate node on the network. And a lot of things in the industrial world have no security, or very little security. This very clearly is not just a software problem anymore. It is becoming a hardware issue because software developers expect to have a secure area for secure data — PKIs and certificates — as well as on-chip authentication for mutual recognition between the node and the cloud. So all these kinds of things are much more secure if they stay in hardware.
Schirrmeister: And you have many more ‘designed for’ aspects. You have design for test, design for upgradeability. And one of the effects is essentially taking into account the previous project for the next project. If I find out something in my current project, especially in the security environment, I can design for the known threats at that time. And you can be creative in imagining what hackers might come up with. But then it’s a back-and-forth, because the hackers always will try something new. So for the next project, meaning over time, you need to take these items into consideration. And some of it may be upgradeable. In a hardware/software context, some of it needs to be designed into the hardware. There is a lot of discussion right now about how the hardware security mechanisms are combined with secure software running top of it when it comes to key storage and all those items.
Brunet: There is design for test and design for security. But no matter what technology people are deploying, we see two things. First, the verification challenge is increasing tremendously because you need add considerations with your device under test under that are different than what we used deal with in the past. So more cycles of verification are key. They can be deeper in terms of the depth of vectors and stuff like this, but it’s more cycles globally. And second, it’s consuming a tremendous amount of additional capacity for customers. They have to put their devices under certain configurations that basically never really limit the size of what the design is, but always increase with the size of the design. So the number of cycles and capacity are an important challenge for the verification provider.
On the security side, I don’t think there’s clear winner yet on what methodology works for design for security. We see a lot of proprietary and confidential solutions related to a specific hardware and software implementation. Providing something that is generic is a challenge. As a provider of hardware-assisted verification solutions, we see more cycles, more complexity, and far more consideration about how the design will be put under a specific test, which requires more capacity.
Schirrmeister: You need more virtual physical prototyping and the emulation for that. There are new methods evolving. People do these Red Team-Blue Team attacks, where you you create an early scenario in which you prototype as much hardware and software as possible, and then you basically have people shoot darts at it. But you need model it all and run it all as early as possible.
SE: If you have to fit into a security architecture with other products, how much of that actually falls on the chip design side? And how do you keep pace with security changes over time?
Laurent Maillet-Contoz: We certainly have to consider this situation in the design and the design methodology. Using device models could help for that. So for the EDA industry, this allows them to sell more capacity. But we also need to consider validation from a novel perspective. Instead of running scenarios that will present potentially some random defects or directed scenarios, we also need to imagine new ways to describe scenarios. So we also can test the system from another perspective, such has how hackers target a system, and try to figure out what scenarios make sense from the functionality perspective. What are the scenarios from the architecture perspective? What are the scenarios from the micro-architecture perspective. And we need to have all of these implemented in such a way that you can learn as early as possible which tests to run on the device, but also test the models of the device. That way, hopefully, we can identify potential glitches and fix them as early as possible.
SE: Going back to the upgrade cycle, how do you partition it and say it’s now complete? Do you add a whole level of redundancy, or do you say, ‘It’s complete up to this point?’
Schirrmeister: It’s logical and functional partitioning. But there’s also the issue of obsolescence. If you have a 20-year market or a 10-year market, you have to put that many parts away to be able to deal with replacements. But at some point those things become obsolete. We have seen integration requests. ‘I have several subsystems, which at the time of development were in a reasonable complexity range, but now they are obsolete.’ So you want to design and replace them. That’s really an in-place replacement process, where you have hardware and software to be re-verified, and potentially you need to figure out whether your software will even still run. Do I need to re-certify it? So it becomes a logical partitioning with the ability to upgrade within it, by replacing several components and perhaps integrating them with the latest technology capabilities. But then you have the hardware/software aspects on top of it, which have to be considered. ‘Does my software now break? Do I need to upgrade? That’s a hot topic these days. When you look at aerospace and defense, the planes developed two decades ago still may be state-of-the-art today, but you have to upgrade subsystems in place. So you have stable interfaces you need to design to, but then you can do more integration underneath.
SE: What happens with a manufacturing plant, where you have some equipment that’s been upgraded some equipment that has not been upgraded. Does everything have to be upgraded at exactly the same time? And what happens when you have difference in terms of what version they’re using?
Griva: The lifespan of a industrial equipment is several tens of years. Within manufacturing plants we find a constellation of different ages and devices. So you have closed systems, semi-open systems, electromechanical, no electronics, and some of them have some kind of intelligence and software on top of them. At the beginning, in a manufacturing plant, you can put in anything. But you cannot ask to the owner of the plant to change all the systems once they are in place, or even to upgrade the software to the latest version. We still find Windows 95 out there. We must upgrade what is upgradeable, and we can put the latest technologies down there to revamp and retrofit machinery, add some new functions, and add new communication systems and new sensors to those systems to bring them into the Industry 4.0 network. But we also must make sure we do not bring in an unsecure system. Just connecting a system because it’s open is not a good idea. It may look like a good idea in the short term because it’s easier to do, but it’s not a good idea in the medium and long term because it it becomes a weak point, and therefore it can be attacked by a intruder on the industrial network. So we apply a schema when we have a consistent network. Let’s say all the nodes would be able to speak the same language. So they typically have TCP-IP on that machinery. But it’s not like it was 10 years ago, when that was an elite communications system.
Related
Part 1: Big Challenges In Verifying Cyber-Physical Systems
Experts at the Table: Models and standards are rare and insufficient, making it difficult to account for hardware-software and system-level interactions and physical effects.
Longer Chip Lifecycles Increase Security Threat
Updates can change everything, whether it’s a system or something connected to that system.
Design For Reliability
How long a chip is supposed to function raises questions design teams need to think about, including how much they trust aging models.
Leave a Reply