Big Changes In AI Design

Experts at the Table: Why it’s becoming easier to develop AI for edge applications, and how AI will be partitioned between the edge and the cloud.

popularity

Semiconductor Engineering sat down to discuss AI and its move to the edge with Steven Woo, vice president of enterprise solutions technology and distinguished inventor at Rambus; Kris Ardis, executive director at Maxim Integrated; Steve Roddy, vice president of Arm’s Products Learning Group; and Vinay Mehta, inference technical marketing manager at Flex Logix. What follows are excerpts of that conversation.

SE: AI, machine learning, deep learning — whatever slice of this you want to call it — is showing up in almost everything these days. What impact is this going to have on power and performance?

Woo: We’re seeing an expansion of the pie, so the market is getting much bigger. While a lot of work had been done in the data centers in the past, we’re seeing some of that move out of the data center. Obviously, a lot of the inference is going more toward the end points, but the data center is still very relevant. We’re seeing increased use across all the different areas — the data center, the edge, and the end points. Probably the most interesting part of it is the different ranking of needs in terms of what matters in each of these places. There’s still a desire to do heavy training in the data center, but some of that is moving to the edge and the endpoints, as well. The endpoints are very heavy on inference, and obviously some inference happens in the data center, too. The use cases show a mix and match and heavier use of training or inference, depending on if you’re in the data center versus the end points. The ranking of what matters the most depends on if you’re in the data center, the edge or the end points.

Roddy: We see that, too. The data scientist lives in the data center. The exploration around network architectures, the exploration around custom operators — all that will live in the data center, probably forever, just because you need the capacity to train and retrain. But as deployment increases on the edge, two things are happening. One is a growing sophistication of the tooling, both in the training tool sets themselves — such as the whole model optimization toolkit inside of TensorFlow that helps prepare models with deployment in constrained environments — and in tooling from silicon vendors, and from IP vendors like Arm, which help compress and convert models so you don’t run the same exact version of the model in an edge environment as you would in a data center. You also need a lot of the clustering and pruning that reduces complexity, maybe by an order of magnitude, and barely at accuracy. This is something you might not need to do if it runs in a data center at about five seconds for an inference that you only run a few hundred times a day. But by the time you decide to put it in an appliance, obviously you would need a lot more than that coupled with much more efficient silicon and more dedicated processors on the edge. At Arm, we’re in the business of licensing processors. We have a whole line of NPUs and embedded NPUs, so you’re seeing a twofold attack on reducing that complexity to make it work on the edge, 10X and more reductions in complexity of the networks, and significant increases in the efficiency of the silicon. Then the data scientists, of course, make it more complicated and come up with new algorithms and new shapes of new network complexity. It is just a never-ending race of more power, more accuracy and more capability in the silicon.

Ardis: As an actual silicon vendor, I see a Wild West of approaches in what folks are coming out with for acceleration to happen at the edge to address the power problem. Some people are trying to do more with what they have — their TinyML, their TensorFlow Lite running a simple program, simple inferences on existing Cortex M4Fs, for example. Then you’ve got a whole host of startups that have their own unique hardware architectures. Sometimes it’s a DSP on steroids, sometimes it’s something from the ground up. It’s going to be a battlefield of all these different approaches to try to get more out of what’s already there, or bring new solutions to market that can make inferencing at the edge low power and fast enough to be useful.

Roddy: Yes, I like to joke that machine learning is the Full Employment Act for people in silicon and the IP business, because it empowers so many more developers to create so many more unique solutions. And you don’t need to be an embedded DSP programmer. You just train with the data set in the cloud, and then tools help reduce that down. There’s more compute cycles. It takes generally more compute cycles to do something with a neural net than it does with a finely crafted embedded C program, but neural nets take a lot less time to develop. You’re seeing this explosion of opportunities — different applications, different functionality — and an explosion of silicon with many types of idiosyncratic, market-specific solutions that make a lot of sense for certain applications.

Mehta: As you’re doing this training, there’s not one version or one set of weights for a single model that gives you a useful application. It’s called the Lottery Ticket Hypothesis. When you train a model and prune it, researchers found that you can retrain using the same initial set of weights — that exact same weight with that prune model — and ultimately that gives you a 10X reduction. Those methods allow us to move right from the data center to the edge. But as nice as things like that are, it’s nice when TensorFlow just takes care of it. It’s nice when TensorFlow or a toolkit from Arm or Nvidia does it for you. There are people who operate in their environment and they’re contained. In the data center you may be developing a model that will only run on such-and-such hardware, and only run in one kind of environment for these kind of niche applications.

SE: What are the most interesting efforts you’re seeing that can break us away from using brute force in an acceleration?

Woo: For improving training, some of the interesting breakthroughs — although these are older — are things like transfer learning. You take a trained model that does something like one language, and if it’s a Romance language you can use it to train on another Romance language because it has similar kinds of structure. Lots of different examples exist where transfer learning is being used. That was a big breakthrough. There’s a lot of work on pruning, such as pruned models. If you take a model and train it with its full connectivity, prune it, and take that pruned model and use it in a form of transfer learning, that’s now another way where you can get a double benefit because a partially trained network knows how to do something. You don’t have to do as many of the mathematical operations. A lot of the precision work that has been done is really interesting, too. The move to bfloat16, and then the use of higher-precision numbers for accumulation as opposed to the actual Mac operations. Those are all really big things that have helped move things forward. In some ways the conversation is moving around this idea of democratization of AI. There are great toolkits, and there are nice techniques now that the toolkits are adopting, so it makes it a little bit easier for smaller companies that don’t own as much infrastructure to make significant progress on doing things like training and inference. Those are some of the bigger and more interesting things that I’ve seen.

Roddy: Democratization is a good key word there, because there’s more professionalization and commercialization of tools and techniques to allow more application developers to be able to access models and personalize them for a particular application. This is the so-called transfer learning. If you want to deploy a camera in a factory automation system, we want to recognize things in your factory flow or your production line. You don’t have to start at the very beginning with data scientists and massive data sets and train everything and do network architecture research. You realize, ‘Oh, I just need to take a model that is being commercialized by someone for manufacturing systems. I just have to recognize my particular object, my particular widget, my particular assembly line, and it makes it easier to deploy without necessarily even hiring a bunch of data scientists.’

Woo: In the olden days when people were really working on this stuff, it was almost a requirement to understand the math behind what was going on. But these days, with these really nice toolkits and TensorFlow — and there are lots of other ones — you don’t really need to understand the math. That’s leading to this greater democratization of the technology.

Ardis: We’re keeping our hands on the research going on in universities and companies on what we consider the holy grail, which is to throw your data set at a program that will search the program space and give you the best optimized neural network. All the silicon vendors have a challenge when we describe the capabilities of our parts. We kind of don’t know, because depending on how optimized your network is, you could be running your neural network accelerator full out, trying to recognize 10 keywords, or you could run the really optimized model that’s looking at VGA images two times a second or doing something far more complex. And so for that architecture search tool we’ve seen a couple interesting things in the market. Unfortunately, one of them was gobbled up after making some nice progress. Some of the university research is interesting, too. But we’re all waiting for that moment when the programs can figure out the program, and then developers are no longer required. Then AI really takes over.

Roddy: Yes, you really see that in published papers and things like NeurIPS conferences, where in previous years — four or five years ago — it would be a new operator, a new topology of the network, and several percentage points improvements in accuracy. Now it’s more about techniques for automated, neural architecture search. It literally is the über model that is searched across all possible combinations, and it’s a smart technique to not have to do an exhaustive search and try and find something appropriate to the data set. That’s a whole exploding set of capabilities that will help drive efficiency for things on the edge, because if you can find a topology that’s inherently twice as efficient as what was thought to have been the best, you can get that 2X or more increase in battery life performance.

Stay tuned for part 2

Related
Winners And Losers At The Edge
No company owns this market yet — and won’t for a very long time.
Challenges In Building Smarter Systems
Experts at the Table: A look at the intelligence in devices today, and how that needs to evolve.
Memory Access In AI Systems
How memory choices affect power and performance.



Leave a Reply


(Note: This name will be displayed publicly)