Security At The Edge

Experts at the Table: How to keep devices that last longer secure, particularly when AI/ML are added in.


Semiconductor Engineering sat down to discuss security at the edge with Steven Woo, vice president of enterprise solutions technology and distinguished inventor at Rambus, Kris Ardis, executive director at Maxim Integrated; and Steve Roddy, vice president of Arm‘s Products Learning Group. What follows are excerpts of that conversation. To view part one of this discussion, click here. Part two is here.

SE: End markets are splintering, so a device developed for one use case may not be the same thing that is going into the company next door. It’s not like you’re buying a cell phone or it’s going to be used everywhere. How do we achieved economies of scale?

Ardis: The problem is even worse than you might expect. For example, an agriculture company had a smart mowing system that learned how to do something on a particular crop in Nebraska. The company took it to Europe, but that exact same plant species grows completely differently there. That same piece of equipment won’t work. This is a retraining problem. It’s not so much that the hardware can’t be made appropriate for various applications. It’s that it requires investments in retraining for various applications, or even different shades of the same application.

Roddy: Transfer learning, retraining, and the democratization of the toolkits around deploying ML models are critical to get to the volume you’re talking about. If there’s a camera for recognizing objects, there are millions of manufacturers around the world who might buy 100 cameras for a factory. That’s hundreds of billions of units. But there’s now way each individual manufacturer in Texas or Germany is going to spin up a data scientist to learn how to program it. It has to be pre-canned, with functionality that can be fine-tuned on the particular object you want to recognize, or the crops that you want to analyze, or whatever it happens to be. You need the ability for the user or the manufacturer to be able to quickly adapt these things to the particular use case so two buildings right next to each other can use the same equipment with effectively a different application that only took a few days of data labeling, training and retraining to prove it is working up to speed. That’s the key. It’s about pulling in automation, and that connectivity to be able to personalize them.

Woo: It is interesting how Amazon is handling all of this in AWS. They have a capability called Sagemaker Autopilot that helps automate the process of even trying to decide what neural network models will be appropriate for you. It takes care of trying a bunch of different permutations of different types of models and some algorithms and fires them off, and then decides which ones do the best. And then it can fire off permutations of those, as well. That type of automation leverages the same underlying hardware, but uses a layer of software to cause the differentiation. That’s another way industry has tried to get the economies of scale at greater volumes of the hardware by some adaptation in the software. I agree that there’s this automation aspect, and there’s going to be greater dependence on the software layers. Upper-level software developers will need to figure out how to build the right infrastructure to automate all of that.

Roddy: You see that in say the ecosystem of AI specialist software players. If you go back say four or five years ago there was a small and rapidly growing collection of folks who hung a shingle and said ‘Hey, I’ve got 10 or 12 data scientists, AI developers. Come to us and we’ll do a custom implementation of a voice processing and image processing system for you.’ So you had to engage in a proof of concept deployment. It would take weeks or months to get an answer back from your contractor, and there’s dollars involved. Many of them have shifted now to having pre-canned things. The transfer learning, where the retraining is done, happens at the user site. That requires installation of software, of a data set. But they’re selling more of a product as opposed to a pure consulting type service, and that’s part of that transformation. It has to be something where, if I’m adopting it for my store or my manufacturing operation, within days I can come up to speed and running. It can’t be months and a six-figure check to a consulting company.

SE: Is 5G having an impact on machine learning — particularly sub-6GHz rather than millimeter wave?

Woo: The biggest thing for 5G is the opportunity to create data in more locations and have higher speed connections available to lots of end devices. The distances that you’ll want to go over those high speed connections will be shorter than with predecessor technologies, which means that now you’ve got these opportunities to build more regional or aggregation data centers at the edge where you can do processing and more localized training. Some of the things you might see are maybe region-specific for dialects of languages, or maybe the properties of certain types of plants in certain regions of the world. Those can become more localized, as well. It also has this other interesting benefit, where by keeping the data a little bit closer, there’s a natural improvement in security. You’re not sending it halfway around the world to be processed in a data center. So if you have an explosion in the amount of data and the need to support many more end device sensors, 5G provides an opportunity to move much more of the compute out of the data center and closer to where those devices are.

Roddy: It does two additional things. The unit cost of a connection for 5G is supposed to be significantly lower than 4G, which empowers more devices to connect. And that’s pushes the world forward. Also, if you think about 5G and AI, the deployment of the network is going to consume a significant amount of machine learning because the nodes themselves require a lot more sophisticated tuning. With the high-frequency millimeter wave stuff you’re going to roll out a base station every other light pole, and they’re affected by weather patterns, seasonal patterns, whether the trees have leaves or the trees don’t have leaves. You can’t roll a truck out to tune these things regularly, so there’s going to be a tremendous amount of AI deployed even in the base stations to be able to tune themselves to the usage requirements. From a silicon standpoint and algorithm standpoint, that’s expected to be a high consumption point for machine learning capabilities just in the network themselves.

SE: As we start utilizing more machine learning, AI and complex edge devices — what does aging look like for these devices? These are supposed to be out in the market for a long time. What happens when they don’t behave the right way? With AI and machine learning you expect the system to adapt. Do we have to reset everything? What’s the process for making this work on a long-term basis and making it affordable simultaneously?

Ardis: AI can take a lot of the network bandwidth and handle things at the edge rather than shooting the data upstream, but in many cases a link back will still be required to say, ‘This one’s a little funny. So to help my confidence level and my answer here, I’m going to send this back for somebody to look into later.’ There also are various other approaches to look at suspect data, which then can be retrained, probably back in the cloud or somewhere else, and new models deployed.

Roddy: There’s a whole burgeoning opportunity for deployment, authentication, monitoring, updating, and securing all these devices. If there are a trillion devices out there, someone’s got to keep an eye on them. Arm happens to have spun up a separate division tackling just that, but many other companies have done so, as well. Degradation, aging, and drift of sensors start to play a factor. If you have a secure means of updating your cameras or your environmental sensors to accommodate that you can tweak the model, recalibrate, and re-download a new model to accommodate the fact that maybe you’ve got a range of products out there where some are 10 years old or one year old. They may need to have different variations of the same AI model to account for the sensor differences. Being able to monitor all of these devices, and to help them to roll out updates, is probably a necessity if you’re going to have a large number of devices under your domain. That’s true for a business or a consumer-facing operation.

Woo: We have an infrastructure type product for monitoring lots of devices that are out in the field. We pair that with a secure core that goes directly into the silicon. We have a system-level view of how you might want to authenticate devices, and maybe even take some out of service if they’ve been compromised or they’re malfunctioning. It’s one thing if all the devices in your system were deployed at the same time. But you’ll always see this window of new devices coming in and older devices going out. There’s a tug of war that goes on where people look at the TCO (total cost of ownership) and try and figure out how to plan for something being in the field for 10 years, while technology is advancing at such a tremendous rate that it starts to become disadvantageous to keep it in the field for that long. You can get better power, better economies of scale with the newer technologies. When we talk to people about this type of infrastructure, there is definitely a desire to look at how you rapidly switch in and switch out technology, and how to accommodate fast technology life cycles. All of that has to be built in at a high level. In addition to the management capability that you have at the software level, you also need support at the hardware level to make it as secure and as as flexible as possible.

SE: With AI, one user’s optimized system may be different than another’s. How does that impact security?

Ardis: The simple answer is you throw more bits in it. There are all kinds of guidelines from NIST around how long algorithms will last. If it does 256, don’t even try to hack it. Things like AES-256 will be retired before it’s ever an issue. Having any algorithm use enough bits in the key size is critical for this. If it’s a two-year lifetime, you’re probably okay with something smaller. If it is a 10-year lifetime, go with 256 bits security.

Woo: In the past, security was a retrofit in some older systems. Now we’re seeing this nice transition in the industry where they’re understanding that security is a first-class design parameter. You really have to think about it upfront as part of the development of your whole architecture, both on the hardware side and the software side. So what does the attack surface look like, and what’s the time window over which you need to be secure? You have to think about those things in the architecture, and always have to look to the future for things people are going to implement, like quantum cryptography or quantum solutions to break cryptography. If you need to be immune to those types of things, you have to start thinking about what your technique is going to be to increase the window of security of your data.

Roddy: Arm has been the forefront of publishing open specifications for that around our PSA (platform security architecture), which touches on all those things. It’s the use case, it’s the software elements, it’s the hardware security elements. We’re talking about something connected, like an IoT device. It’s also that whole connectivity back and forth to your data center. You have to look at that as a first-class design criteria so that when you start out, you’ve considered all those elements and you bring in the appropriateness of the building blocks that you need for your particular system. To Kris’s point, what’s the duration of the expected lifespan of your product? You may want to over-design for today’s reality if you expect to need an 8- to 10-year lifespan versus a 1- to 3-year lifespan. So all those things would come into play if you do it systematically and consider all the elements. Then you stand a much better chance of heading off attacks, whether they be brute force, subterfuge, software, or hardware.

Ardis: The other thing you have to think about, especially in IoT-ish systems, is physical access to the device, and not just the crypto. It goes more into what threats you are worried about. Am I worried that somebody could just snag one of these environmental sensor that I stuck on a tree and have an infinitely long period of time to try to break it? Is that a concern? If yes, then you start to worry about physical security approaches to things like that. If you’re just reading temperature, and you’ve deployed a million sensors, do you care if somebody gets their hands on one? Maybe you just throw that data out. So there’s some threat analysis. What’s your level of paranoia? From there you can address each threat.

SE: There is a power/performance hit on security, depending upon whether you’re going passive or active and how many layers you’re trying to protect. Does machine learning increase the stakes, and does it increase the power and performance hit to really secure a device?

Ardis: My initial take is no. With things like authentication operations. those are pretty sparse in how often you do them. You do them to renegotiate a session key or to make sure the download you just got is valid. That’s not something that’s going to be happening at the high duty cycle. Maybe you’ve got marginally bigger uploads, because in addition to a software application you have a new wave network to load. You’re still talking about pretty sparse activity, though.

Roddy: We tend to agree. If you start looking at the portal number of compute cycles running in a given unit of time, if the machine-learning equipped module is running a lot more operations, a lot more code, and if you’ve got security operations running real time every clock cycle, then perhaps there’s an overhead. But if it’s literally just the plumbing getting in and out and validating session keys and authentication and downloads in the normal operation, and it’s not affected by the level security, then it is probably inconsequential.

Woo: It’s a bit chicken and egg, and part of why you don’t do highly secure type of math on your calculations is because it’s such a performance hit and such an energy hit. So people looking at things like homomorphic encryption, things like that where you’re trying to do math on encrypted pieces of data. It’s a lot more challenging. It provides you a level of security, but it’s one that most people wouldn’t be willing to pay for right now or tolerate in terms of the performance in the energy. But if a technique came along that was something that really dropped the amount of additional performance that you lost and the amount of energy that it took, you’d start to see people maybe do a little bit more but at least for right now it’s the level of security and the performance trade off people are willing to accept really influences the kind of security that people are willing to have in that type of infrastructure. So where people are now is authentication of the user and those types of things, where largely you leave the computation alone.

Ardis: If I can do more my computation at the edge and I don’t have to send as much data, I don’t have to encrypt as much data. The crypto is miniscule compared with the transmission power in that scenario, but you could make the argument that you’re saving power by pushing stuff like this to the edge.

Roddy: Most people aren’t going to be examining every frame that comes off a video sensor before it goes into a machine-learning recognition, or to make sure that something happened between frame 34 and 35. You’re going run some sort of periodic background check, maybe at boot-up time or hourly or daily, but not every single operation.

Leave a Reply

(Note: This name will be displayed publicly)