Machine Learning’s Limits

Experts at the Table, part 2: When errors occur, how and when are they identified and by whom?


Semiconductor Engineering sat down with Rob Aitken, an Arm fellow; Raik Brinkmann, CEO of OneSpin Solutions; Patrick Soheili, vice president of business and corporate development at eSilicon; and Chris Rowen, CEO of Babblelabs. What follows are excerpts of that conversation. To view part one, click here.

SE: How much of what goes wrong in machine learning depends on the algorithm being wrong, or one piece of hardware that’s different from another running the same algorithm? What’s the source of errors?

Soheili: At the end of the day, we’re dealing with statistics about what’s right. So we may be looking at 87% versus 90% accurate.

Aitken: There are two areas where errors occur. The first is where the data sets themselves have a problem. In medical, this is one of the classic challenges. Labeled data is very hard to come by. There are lots of images of tumors, for example, but there’s very little that’s labeled correctly. And then, different experts will label the same thing differently. There are challenges in data integrity. The other challenge is in extrapolation. If you train something on this data set, and now you’re looking in a broader data set, at some level you fitted a function to your first set of data and now you’re extrapolating to your new data. If your data is comprehensive enough that you’re only doing interpolation, then everything is wonderful. Once you start doing extrapolations, bad things happen. And because these spaces are so complicated, and we don’t really know how to visualize or think about them, the distinction between interpolation and extrapolation is hard to make. The problems exist in the data set and they exist in what you do with the data set. The tools and the hardware are actually fairly well understood.

Brinkmann: But they also can control the effects of optimizations that you do on the network. You can test it, you can probe it, and it’s something you can control pretty well. It may be time-consuming to do this task, but it is controllable.

Rowen: Neural networks are unusually benign, given their portability and ease of comparing results from one implementation versus another. But there also is a third category for sources of error. No system consists solely of a neural network. It is a piece of a larger system. We’re still in the early days for system architects to understand what is the right role for the neural network inside some larger piece of logic, whether that’s software logic or hardware or software plus hardware. The interaction between the neural networks and all of these other things is really quite important.

Soheili: The architecture and topology are missing a lot of base pieces.

Rowen: Yes. What is the pre-processing that happened on the data? What is the objective function. If the classifier told me A, B or C, what do I do with that information? Many of the neural networks will give you a confidence score in one form or another. It may be very useful information, but do we know how to judge the confidence factors in figuring out how the rest of the system should behave? I don’t think, by and large, that people have much familiarity yet with the larger system design where neural networks are involved.

Aitken: In some cases you can use them to assist in their training. If you have a challenge of, ‘You have this space, you’ve mapped it into something, and now you’ve built a neural network and you’re going to classify all your data into these categories,’ when you run that data you will get various things that are ambiguous. It may be half this, half that. I trained a camera to distinguish between people and non-people, and I trained it to use pictures. I got a picture of a bird walking down the sidewalk. It wasn’t sure if that was a person or not. So then I had to train it for animals, so it could identify birds and cats. The tools can give you insights into properties of your data that you may not have articulated well.

Brinkmann: When you deploy something, you need to get the data back to the factory. If you deployed something and it made a decision, it might have made a decision based on the bird as a person.

Rowen: That goes back to the overall system development methodology. There is no guarantee that a human programmer doing this by hand would have anticipated the walking bird.

Aitken: And in the days before we had classifiers, where would you even begin? You might need to do color balancing and pixel differentiation. It’s easier now.

Soheili: Yes, those would be the baby steps.

Brinkmann: How do you deal with unknown situations—things the system designer didn’t foresee? For example, someone is building a car that’s following another car in a traffic jam. In the testing phase, one of the drivers left the car to take a break. The car assumed the driver was still in traffic, and he had to run after the car to catch it. This is a case where the system designer couldn’t foresee this. So you can train your network as well as you want, but if some basic facts are missing that put bias into one set of data, and now it suddenly goes the other way, then you’ve got problems.

Rowen: There is a broad category of anomaly detection that potentially can be helpful. You can have systems, and maybe deep learning systems, which are trained just to look for unexpected things, and fall back into some very conservative behavior when it feels like it’s gotten beyond the state for which it’s trained. You’d hope that by setting broad criteria of anomalies, that it recognizes something is not right here. ‘In the past, this broad cross-section of data looked like this.’ Because you had a driver monitor camera, maybe there wasn’t the logic to specifically check if there was a driver. But somehow you might have some higher-order separate network that addresses that.

Brinkmann: You’d have a lot more parameters and features, but that wouldn’t factor into your neural network. In this case, you would need to look at more data sources than they did.

Aitken: There’s an interesting corollary to that, which involves limiting the set of possible anomalies. The idea that a fully autonomous car can be dropped in the middle of Silicon Valley and function correctly, and that same car will be able to cruise across the Australian outback and navigate it’s way through downtown Bangalore, is misguided. It’s not going to be able to do all of those things well, but somewhere something is going to have to give and you’re going to have to constrain the system. In Cambridge, England, as far as I can tell, the rules for cyclists is that you can ride anywhere in any direction at any time. When you’re in a car, the bicycles can come from any angle always, and you have to be used to that. That doesn’t happen here. You need to be able to constrain the system so you can eliminate certain classes of anomalies because you can’t possibly specify every weird thing that will happen.

SE: One of the things you’re pointing to here is having a large enough data set to make it more specific. So what you’re doing is customizing the data based upon whatever circumstances you’re in. But do we have enough data to make that possible?

Rowen: We’re working on this question right now, specializing speech neural networks. There really are two broad categories. With one, you build data sets from the ground up for specific cases that you think you’re facing. So ‘this’ is different, and then you go and collect data to represent that. The other is that you take the broadest set of data you possibly can, though it may implicitly design in bias. You may say your customers care about crowd noise, so you’re going to emphasize that in the distribution and make something that is more general purpose than what you think your customers care about, with the hope that in the rest of the data there is some dimension of the data that helps with crowd noise. But at least you haven’t overly narrowed it. If you have data, you use it, and you use as much as you possibly can. In many problems, you simply do not know, a priori, what the distribution is that you’re going to face. Until the product is out there and in widespread use, you can’t gather enough data to tell you anything about the corner cases.

Soheili: Even if you stop to infer, you never stop learning. You keep feeding your network new information and data sets. You could say, ‘I don’t want to make these kinds of decisions, but I would like to make those kinds of decisions,’ keep feeding it information, and then maybe open another layer. You may not want to go beyond $100,000 as a spending limit or kill a bird, but you never want to kill a human being. These all become interrelated with each other as you make the system more sophisticated.

Rowen: Part of the approach is that you have to expect gradual rollouts. You say, this is a Level 2 system. You’re gathering more data, and then you can go to Level 3, 4 and 5.

Aitken: Initially you think you need this huge amount of data. But after a working system is built with that huge amount of data, people will determine you don’t really need all of it. You can do it with less. We did something with microcontrollers, where with a fairly small amount of data you can make it recognize words.

SE: If you add that kind of flexibility into how these devices learn, you don’t know exactly how they’re going to interact because they’re now unique systems. Does that change how these things work?

Soheili: Yes. Someone asked the question, ‘If there are two autonomous cars and one goes wacky, will the other understand that it has gone wacky? Or do they assume everything around them is autonomous and working?’ It’s a great question.

Brinkmann: If you have two machine learning systems working together, and you’re feeding back the data, do we learn what’s going on? At some point, you won’t know what they’re talking about.

SE: They have their own language after that, right?

Brinkmann: Yes. At some point you have a feedback cycle that you haven’t designed in.

Aitken: Some aspects of this are very much an open search problem. The idea is that it goes to a higher level of your system, like if you have two versions of your Alpha Go playing each other. They play 10 million games and they optimize strategy. If you try to figure out what they’re up to, you’ll have no idea. But if you have a global idea of what the system is supposed to do, is what they’re doing reasonable? If yes, then terrific. If no, it’s probably time to pull the plug.

SE: When do you determine that, and who determines it? Is it after an accident?

Aitken: Safety critical is a very difficult application for machine learning. You don’t go there right away.

Rowen: There are lots of interesting methods, especially simulation, that are going to be essential elements. One of the nice things about neural network algorithms is that you can be fairly explicit about the inputs and the outputs. You can build independent models that generate data of interest and scenarios of interest. You can build independent models that can monitor these. So you’re not just looking at a single model and trying to figure out if it’s going to work. You can build a whole environment of things that test for reasonableness and behavior in lots of different ways before you ever put lives at stake. People need to be trying lots of different things. Some of the models that you build for a simulation environment probably won’t teach you anything. But if even one of them finds a corner case that was somehow lost, it’s going to be an important step forward. The irony is that it seems reasonable that it won’t take us long to have systems driving that are better than humans, because humans are lousy drivers in certain circumstances, particularly when you factor in alcohol and fatigue and distractions. It’s pretty easy to have systems that are always alert.

Aitken: But there are some interesting corner cases there, too. When you’re walking across the street you typically make eye contact with the car coming to a stop. I see you, you see me, and if the pedestrian sees that the driver isn’t paying attention, then you stop so the car doesn’t run you over. But with a self-driving car, there’s no one to acknowledge that it doesn’t see you.

Soheili: We think that when we sit behind a wheel, we can forget anyone else’s actions and reactions. We think a driver is going to make a left turn or we think we can multitask. We can take problems away that are more characteristic of human beings, which is why they are safer.

Aitken: We also tend to overrate our own abilities, and arrogance comes with some interesting side effects. We all believe that we are much safer driving our own cars than we are riding in a bus. Statistically we’re wrong. We believe we’re safer riding a bus than flying in a plane, and that’s also wrong. And we have all of these things built up so that we know about every single self-driving car that has ever crashed. Meanwhile, regular cars are crashing around us all day long.

Rowen: That’s the key point. It won’t be hard to be better than the average driver, but it clearly will not be enough for us to accept that. There’s a debate about whether it has to be 10 times or 100 times better for people to accept it.

Leave a Reply

(Note: This name will be displayed publicly)