Intel’s Next Move

Company’s push into deep learning opens door to a variety of new architectures, including tiles, advanced packaging and more customized solutions.

popularity

Gadi Singer, vice president and general manager of Intel’s Artificial Intelligence Products Group, sat down with Semiconductor Engineering to talk about Intel’s vision for deep learning and why the company is looking well beyond the x86 architecture and one-chip solutions.

SE: What’s changing on the processor side?

Singer: The biggest change is the addition of deep learning and neural networks. Over the past several years, the changes have been so fast and profound that we’re trying to assess the potential and what we do with it. But at the same time, you also need to step back and think about how that fits in with other complementary capabilities. That’s part of the overall transition.

SE: What really got this going was a recognition that you could develop algorithms with machines rather than by hand, right?

Singer: The original approach was from the 1960s, and it went dormant until [computer scientist Geoffrey] Hinton and others found a better way to deal with multiple layers effectively in the early 2000s. The big breakthrough, when deep learning was recognized as a major computational force, occurred a couple of years ago. That was when ImageNet showed you can reach near-human accuracy with image recognition. We started to see great results on speech recognition. Around 2015 and into 2016, results began to look promising enough to be a major change factor. At that time, the world was basically flat, at least in terms of images. It was relatively simple images and simple, direct speech. Most of the effort was proving things were possible with deep learning so you could reach some level of accuracy or some set of results. In terms of the way to create and prove models, the main architectures were CPUs and GPUs. The way to do the problem before that was C++, like some of the predecessors to Caffe, and with proprietary environments such as CUDA. It required a lot of expertise and effort in building the compute architecture, as well as in the deployment. In terms of who was involved, if you look at the technology in the field today, those were the early adopters.

SE: What’s changed since then?

Singer: Over the last few years, we’ve seen the coming of age of deep learning. The data itself has become much more complex. We’ve moved from 2D to 3D images. We’re working with Novartis, which is looking at 3D microscopic images of cells, trying to identify potentially malignant cells. The images themselves are 25 times more complex in terms of data, but what you’re identifying is a more refined model.

SE: Where does Intel fit in with these architectures. One of the big problems with AI and deep learning is they’re changing quickly, so you need a very flexible architecture. What does Intel plan here?

Singer: In the past, the problem statement was clear. You knew what you needed for a graphics chip or a CPU chip two or three years out, and companies competed on having the best solution for a known problem. Deep learning is a space where companies compete based on who best understands the problem as it evolves. You need an architecture that is able to understand and foresee trends, and be ready for what is coming when it’s out there in the market in full production and deployment—not when it’s being designed and tested.

SE: Does that change by market, or is it still the same architecture?

Singer: This affects all aspects. We don’t think one architecture addresses all needs. We believe the winning solution is a portfolio of products that are clearly distinctive from each other. So you have more than one, but you don’t have too many. You look at the full range from sub-1 watt to 300 to 400 watts, from inference and training and emergent ML, from focus on throughput to a focus on latency. There’s also a different set of sensitivities to performance per watt. What is the value of power efficiency in the solution? And are you willing to trade-off something else? It’s not just a difference of a small degree from an instantiation. There are a range of needs, and there have to be a set of architectures that are complementary.

SE: What are those architectures?

Singer: There are three elements. One is that we need a portfolio, because our customers are asking for it. You need solutions that go from the end device, whether that’s a security camera or a drone or a car, to a gateway, which is the aggregation point, and up to the cloud or on-premise servers. You need a set of solutions that are very efficient at each of those points. One element of our hardware strategy is to provide a portfolio with complementary architectures and solutions. Another element is to further make Xeon a strong foundation for AI.

SE: For the training or the inferencing?

Singer: Let’s start with inferencing. Xeon is a great solution for inferencing. It’s a great solution by itself. Xeon does very well as an inferencing solution compared to any of the other products that are out there, and it has additional advantages in total cost of ownership and flexibility. If you look at Facebook, they showed how they do training and inference for the top seven services.

SE: But is it the same for a phone or a camera in a car as it is for a company like Facebook?

Singer: This is why you need different architectures. You want to have inferencing in the big data center. You can use the same compute to do inferencing and any other tasks you have. At the lower end, this is where we have the Movidius architecture, at one watt to a few watts (Intel bought Movidius, which makes low-power processors for computer vision, in September 2016). So you can create music in real-time. And you can detect early skin cancer by connecting the [Intel/Movidius compute stick] to a phone and do pretty significant analysis at the end point.

SE: So you’ve got the data center and the edge. What’s the third piece of your strategy?

Singer: System integration. When you look at system integration, a lot of the value in having the right solution has to do with data movement. A good solution needs to minimize the data movement because that’s 10 times more expensive than doing multiply/accumulate on that data. Optimizing the system and the software stack on how to have the data at the right place at the right time is key to any solution.

SE: This sounds like a top-to-bottom change for Intel.

Singer: Absolutely, because when you look at improving the basic Xeon—we have a good foundation, and now we have the DL (deep learning) boost with VNNI (vector neural network instruction set) and BFloat 16. In the past Intel identified floating point, SIMD and vectors. We’re saying AI requires a set of capabilities. We’re bringing significant new capabilities under the x86. We want to provide both architectures with an optimized solution. This is where we come in with Movidius, and we’re going to introduce Nervana. It’s also where FPGA comes into play. It’s bringing the best of x86, and enhancing it with the best architectures to accelerate it. And then, looking at the system, that’s not just host and acceleration. It’s also memory and networking, and it’s integration. What do you put on die and what do you put on tiles in package? And what do you integrate in the same rack?

SE: So you’re looking at a platform strategy that includes advanced packaging. This is something Intel has never done seriously before. How do you see this playing out?

Singer: In-package integration has a big opportunity to take different sorts of things and tightly integrate them. We are definitely working on it. We see very high value in this.

SE: One of the big shifts underway in new hardware architectures is to basically increase the density of data, where you’re doing more per cycle, right?

Singer: It’s data compression and increased parallelism of the computation. When you look at architectures such as Nervana NNP (neural network processors), that’s built from the ground up. You’re dealing with neural networks, which have tensors. You’re managing constructs. This is the foundation of thinking for architecture. With VNNI, you’re providing instructions for structures to be able to do computing on arrays.

SE: There’s more bang for your buck than just shrinking features, right?

Singer: We need to get what we can from the process. We have always pushed the design and the architecture as a vector. ‘Here’s what the process gives us, and we’ll take advantage of it. But our charter is to drive design and architecture to be more efficient, with more instructions per cycle.’ It was always a vector that needed to run as fast as it can.

SE: But now you have all of these pieces that have to fit together. So things potentially get stored and read differently in memory.

Singer: Yes, and you have to be able to pull those structures of data from memory. The other thing that we need to see is how to inter-mix between pure neural network operations and regular potentially looped-in code. If you look at a lot of the work that has happened, it assumes a lot of the new computing needs to be deep learning. Actually, what it needs to do is a more general task, which has neural networking and deep learning in it. You need a very effective neural network structure. But if there are parts of an equation that are more sequential or more conditional, it needs to do those effectively, as well. And you need to be able to move from one to another. We are solving the question about how to do neural networking in an optimum way in the context of a real solution that has other elements, as well. If you look at NNP machine translation or other similar topologies, they have within them things that are not pure neural network activity. They are complementary as part of the solution.

Related Stories
Big Changes For Mainstream Chip Architectures
AI-enabled systems are being designed to process more data locally as device scaling benefits decline.
Intel Buys NetSpeed For NoC, Fabric IP
Startup becomes part of Intel’s Silicon Engineering Group; Intel shifts direction.
Security Holes In Machine Learning And AI
A primary goal of machine learning is to use machines to train other machines. But what happens if there’s malware or other flaws in the training data?



Leave a Reply