IBM Takes AI In Different Directions

What AI and deep learning are good for, what they’re not good for, and why accuracy sometimes works against these systems.


Jeff Welser, vice president and lab director at IBM Research Almaden, sat down with Semiconductor Engineering to discuss what’s changing in artificial intelligence and what challenges still remain. What follows are excerpts of that conversation.

SE: What’s changing in AI and why?

Welser: The most interesting thing in AI right now is that we’ve moved from narrow AI, where we’ve proven you can you use a deep learning neural nets to do really good image recognition or natural language processing—basically point tasks—to rival what humans can do in many cases. In image recognition, in particular, neural nets now can do better than humans. That’s great, but it’s really narrow. We’re moving into what we would call broader AI, where now we’re going to take those interesting point solutions and figure out how you integrate them into something that will help somebody do their job, or an actual task beyond, ‘I want to recognize cats on the Internet.’ Recognizing cats is an interesting demonstration, but you don’t get any business value out of it.

SE: What are the next steps to make that happen?

Welser: We’re focused on the problems in industry or in enterprises where AI could help a person in their role. In the health care area, there are ways that AI can help a radiologist read through the images that they’re seeing. But is there a way that it could help them understand, for a set of symptoms, what the potential diagnosis would be? That requires not just the individual being able to recognize blips on a radiology graph. It means being able to understand the context of an electronic medical record that goes along with this person. ‘Here are the test results that have come out from that person. Here are what the symptoms are showing. And here’s the MRI scan.’ You put that together and say, ‘Okay, this blip here is something you need to look at.’

SE: So what you’re looking for is deeper context.

Welser: Exactly. We’re a long way from having the system do the kind of advanced reasoning and context that the radiologist has, for example. But we’re getting to the point where we can take components of those things and pull them together to present to the radiologist a very simplified set of information. That can be used to accelerate how quickly the radiologists come to their decisions.

SE: This is probabilistic, right?

Welser: Yes, and as with most things you come up with an answer that is like 90% confident. The goal is to get to very high confidence levels. That’s the challenge with AI—making sure that what you’re doing is presenting things with very high confidence. But you also need to present it in a way that is very clear to the expert, who has to come up with a decision in the end and determine, ‘Here’s what we’re going to do and here’s the next step.’ We don’t really want the AI system going that far. We we want it to be augmenting what the the expert can do. But in the end, we still want the human to be the one who makes the final decision because they still have the ability to have a much broader context than what AI can do at this point.

SE: Is that based on not wanting to displace jobs, or because this is a valuable tool?

Welser: It’s the latter. I know you hear people talking about AI is going to take over the world, but the reality is that there are things that AI does very well and things that it doesn’t do very well. So it might be really accurate on something, but it’s still going to make mistakes. And unlike a human, its ability to understand it made a mistake is very limited. You have to go back and retrain the network again. So you need humans there to take what it puts out and figure out what the next step is. Humans are really good at things like having a goal or knowing what the goal really is—the true goal, not just recognizing the blip on the image. They understand the context around that, the ethics, what it means to do certain treatments for a patient. Some treatments are more invasive, some have less impact on a person’s lifestyle. Humans are much better at all these sorts of values and goals and next steps. So we want to do is, just like we did with automation in the past, look at how many things can we get done for that person or that expert so this becomes more automatic. Then they can do the part they do really well. AI improves their productivity, improves the number patients they can see, and hopefully reduces error rates.

SE: This also allows for regional differences and biases, right?

Welser: That’s an interesting point. One of the challenges in AI right now is that it’s only as good as the data it was trained on. Even if it was trained on a very large set, that won’t cover every possible case out there. And to your point, you could try to locally train AI for every single place it gets used, or you can say, ‘It’s got a really good base. I’m going to have a human expert there who will take what it gets from the AI system and make those next steps for where I am locally. Or here are some of the data sets, I want to augment this AI training on for my local region.’

SE: In the past, for machine learning and AI to recognize a cat you had to take a picture of a cat from every angle. Has that changed? Is there a better understanding of how to identify a cat?

Welser: We’re still very much in the mode of these deep neural nets just finding features in the image and showing lots of different images. From that you derive this object you want to find, which we’ve told it is a cat. It doesn’t have an understanding that a cat needs to have legs and and ears and tail the way a human does. It still doesn’t do much reasoning about what it’s seeing. So therefore it is still subject to potential errors. It might be learning features too strongly. If you look at some of the algorithms for a cat versus a dog, cats tend to have pointy ears. But dogs sometimes have pointy ears. So then it has to learn enough examples to realize, ‘Here an exception to that rule.’ Just because that’s a cat doesn’t always mean it’s a cat, therefore I need to learn that exception pretty well. This is a very different style of learning than we think humans do.

SE: How so?

Welser: One of the things we are doing research on right now is how do we get better at this on the AI side. There are two reasons for that. One is this question of AI being a black box. So if the neural net is really well trained to recognize cats, but you don’t know why it recognizes cats, that can be a problem. Even when it makes mistakes, how do you understand how it made a mistake, particularly if it’s a subtle mistake. You need to better understand how the neural nets work, what features the neural nets are identifying to allow you to do more research, and how you interrogate to understand what different layers in that neural net are really picking up on. So that’s, that’s one reason for it. The second reason is that we have to have a huge amount of not only data, but labeled data, that says, ‘This is a cat. That’s a cat. There’s another cat’ That is very different than the way a human learns. But one of the things I’d argue humans also do is learn by getting lots and lots of images and lots and lots of angles. The difference is that the human only needs a couple of those labeled for them. As a little kid, you probably saw cars around you at some point you asked your mom what that is. She said it’s a car, and then you immediately were able to take that information and start to extrapolate. ‘That’s probably a car and that’s probably a car.’ You get to see thousands of cars from thousands of different angles, and your brain constantly says, ‘Okay, that’s still a car.’ But at some point you might see something much larger and you asked your mom what that is, and she told you it’s a truck. It’s not just a car even though it has a lot of attributes of a car. And the amazing thing is that your brain is able to extrapolate so much from just a few labeled examples, but also seems to know at some point that maybe that’s not right. AI Systems can’t do that today.

SE: One of the problems we’ve been hearing about is the inefficiency of some of the algorithms. Are we making progress on that front?

Welser: On the hardware side, for deep learning we basically use large amounts of matrix multiplication for lack of anything better. The reason we’ve made so many advances on it is our ability to have enough hardware now to run that at the scale you need to do a really large neural net. So the neural nets we worked with in the ’80s and ’90s were actually similar to what we use today, but on a much, much smaller scale because you just couldn’t run enough processing on it. But it requires a huge amount of power. The the only reason we could even do it today very efficiently is because we use GPUs for a lot of the training. The GPUs are very good at that, but they still are fairly power hungry, and there’s work going on now to make them more efficient. Advances for making them more efficient are focused on building better hardware rather than finding new ways of improving the algorithms. All of that certainly will go on in the background, but the real focus is, ‘Let’s assume neural nets are a good way to go because they appear to be. We know it requires a lot of math and that we aren’t going to get rid of that math, so how best to run it efficiently?’ If I look at our own roadmap on this for hardware, we use GPUs for training and we’re partners with Nvidia. Our power systems are very efficiently set up for training. But it looks like you actually go to lower precision over time.

SE: How much lower?

Welser: Right now, 32-bit floating point is used just because that’s where we’ve always used. But the reality is you probably don’t need 32-bit. You could do 16-bit, maybe even 8-bit floating point for doing training. Although that means your individual multiplications are slightly less accurate in the end, you don’t care whether every number is accurate. What you care about is that as those numbers are coming through, is it giving you the right answer on the other side? That’s the measure of accuracy now.

SE: That’s more about weighting, right?

Welser: Yes. So do you really need 32-bit floating point for all those weights? Turns out, probably not.

SE: So this is like using your eyes for focusing on one part of a picture, while missing the rest of the picture, right?

Welser: That’s exactly right. If you could go down to lower precision, you get huge savings on power, huge savings in area, and potentially speed, as well. There’s definitely a move that direction for reduced precision. I’m going to talk about training here for a moment. I’ll go to inferencing afterwards. If you’re going to less precision, why even think about going back to something like analog, because analog is extremely efficient for certain operations? The problem with analog is that it’s error-prone. We have noise levels involved, so it never scaled way digital did. Moving to digital was the right decision, in particular for the sort of multiply and accumulate step that goes into a neural net that basically is just Kirchhoff’s Law. So if I think about having the two layers of a neural net, and I take the output from this node with a certain weight—which is basically just the multiplication—and I send that input into the next node, then it takes those inputs from all the nodes here, sums up that answer, and then decides whether or not it’s going to do something with that based on the total strength of what came in. Well, if you think about that, if I could have variable resistances between each of these nodes, then the amount of current that it’s going to get from each node coming into it is going to vary, based on what that resistance is, and the total current it gets over here will be to some of those other currents. So I’ve done the multiply accumulate by just running current through a resistor. What if you made those all variable resistors so your training weights literally just changed the value of that resistor? That will give you a rough multiply and accumulate for the next node over, and it’s incredibly efficient. There’s no math involved in that knowledge. It’s literally, ‘Let’s run a current through this.’

SE: How far along is this?

Welser: We’ve already demonstrated it. In the last couple of years we’ve been doing this with phase change memory, which is non-volatile. It can have varying levels, so you can do multiple-levels, multiple bits in one in one resistance. You’re limited there in the digital space because you want to make sure they’re spaced out enough so that noise isn’t a problem. You can distinctly read this one, two, three, four, whatever the number is. In this case, we don’t care if it’s not exact. We’re just going to increase or decrease resistance little by little based on what the training algorithm is telling us to do. What we really care about is having a material that can have lots of different states. We think a thousand is probably plenty. It doesn’t matter if they aren’t super well-defined, as long as they’re fairly linear. So every time I give it an extra pulse of voltage, it moves up by roughly the same amount every time. And more importantly, when I go in the other direction, it moves down by roughly the same amount. That’s actually the challenge on the materials side—getting material that has a nice uniform property because in phase change memory, in one direction it tends to be fairly linear while the other direction is just a reset, so it tends to jump back.

SE: You’ve basically used accuracy as another knob to turn?

Welser: Yes, because if you think the way GPUs work today, one of the reasons they’re so power-hungry is that the training data is coming into the CPU from the system memory. That’s every image that you’re going to try and train with. But literally all the weights on those neurons are being stored in the GPU memory. They get pulled in so you can go through and make your changes, make any updates to them, and then you have to go back and store them in memory. You’re spending a huge amount of energy just moving these models, which are hundreds of megabytes, back and forth off and on the GPU. If instead you just have resistors sitting there, you don’t have to do anything else. All you’re doing is continuing to adjust those numbers every time it goes through, pumping this one up, this one down, but nothing is being stored and moved off and on memory. It’s all just happening right there and you’re just streaming through the image data and making changes to as they go. That’s much more efficient. Now you’ve given up the accuracy, because you don’t know exactly what those will be. But in the end, once they get set, they will hold their resistance accurately at that point.

SE: Now you don’t have to worry about tuning the algorithm the same way you did before, right? Instead, you’re leveraging your hardware architecture.

Welser: Yes, and that is going to be very important in the foreseeable future because the neural nets do appear to be really good algorithms. We’re learning more and more what that means, but finding good ways to actually run them efficiently is going to be important for when you get to really large scales.

SE: What triggered this change of direction?

Welser: It was this idea of XPoint memory. It’s very efficient for storage of high density data. But one of the challenges was that you had to make sure all your levels were all distinct for everything because they had to be digital. And then someone realized, ‘Wait, there’s another application I can use this for that doesn’t have anything to do with memory, and it doesn’t care if it’s not exact.’ That was probably five or six years ago. Now we’ve done some simulations to show that you will be able to get to the same accuracies for your neural net output as you can with state-of-the-art neural nets. That was always part of the question—will you get to the same level in the end because you know you have some inaccuracy here?

Fig. 1: Training and test accuracy using new approach closely matches TensorFlow test data (gray). The TensorFlow curves correspond to 10 different initial network conditions and sequencing of training images, illustrating the modest run-to-run deviations inherent to neural network training. The distribution of weights (b) for initial state and epochs 1, 5, 10 and 20, and the cumulative distributions of all array conductances (c) G (both G+ and G- together). Note that half of G+ are almost zero, corresponding to negative weights, and similarly for G- devices for positive weights. Finally, (d) shows the initial g distribution and successive distributions just before weight-transfer to the hardware-based PCM devices. Source: IBM/Nature

SE: In effect, you’ve taken a more human-like approach to this?

Welser: If you look at the brain, you have all these really great connections. And there is variability in these connections. But if you look at how the brain operates, it has the same issues we do. If you try see what the actual equivalent bits of connectivity you have between them, it’s about 4-bit accuracy. So yes, it’s, it’s analog, but if you actually look at the number of states that takes on, in terms of what it robustly holds, you could roughly emulate this with a 4-bit integer. There was an interesting paper somebody just sent me that showed a correlation between connectivity in the brain and intelligence. I would have said intuitively that more connectivity equals more intelligence. They were actually finding a reverse correlation. It appeared that people who were very intelligent actually seem to have more carefully pruned or curated connections. So they still have a lot of connections, though not just not necessarily the largest number. But they do seem to have patterns associated with them. So maybe the brain is pruning more carefully what connections are most important for actually getting the problems done, versus having a fairly inefficient neuron that has lots of connections everywhere.

SE: Which can result in lots of noise, right?

Welser: Yes. With big data and analytics you can find lots of correlations, but which are the correlations that you actually care about?

SE: What happens on the inferencing side?

Welser: It’s a in a sense, this an easier problem in that you just have to run the neural net. You don’t have to worry about adjusting the weights anymore, because they’re already set for you, which means you actually can probably reduce precision even more. When you’re doing the training and you’re doing the back-propagation, you want to be able to make small movements up and down on the weights to make sure that you are not going to miss the key levels as you go along. So you do need to have some level of either analog or floating point accuracy. Once you get them set, though, the actual amount of quantization can be reduced. The ultimate version of that was the TrueNorth chip that we put out a few years ago here, which is literally one bit of accuracy, right? You either were connected or you weren’t connected to a neuron. The weighting was zero or one. So that’s obviously the limit of what you could do. And we showed that you really could pretty much map any neural net onto that. It was a ton of work and it was fairly inefficient in terms of area, because you really only had one-bit accuracy for everything. It was very efficient, though. So now the question is what is the right level of accuracy where you can be both efficient from a programming point of view but yet still get the power advantage.


SOI expert says:

Very interesting for me.
Accuracy and efficiency, connection and intelligence
, analog and digital, bit and qubit, memory and processor, so on. Make me think more about data and processing.

Gary P says:

“…how you integrate them (AI solutions) into something that will help somebody… ”
This is critical question. AI may be used for suppressing people, which may have no clue how it works or why they are targeted,
This tech revolution can be compared to Atomic-bomb creation time when some physicist realized how it can hurt humanity in wrong hands.

Bill.t says:

With today’s constantly increasing profusion of data diverse and massive groups of people are “faked-out”.
Truths of Nature are easily masked-out by simple abusive harangue about worthlessness of other side. Aristocracy is hardly a pedestal for “freedom of humanity”.
Gary, in what manner do you think AI might assist in enlightening us and such groups. Your AI insights are of greater depth than mine. Perhaps I misinterpret the intent of your remarks. I did associate them to present governance, the absolutely major problem besetting the detailed purposes of a democracy. But, thus far, Religion and War seem (appear to be) the primary language, tools, and forces for aristocratic change within a democracy.

Leave a Reply

(Note: This name will be displayed publicly)