NetSpeed’s CEO talks about machine learning, AI, and memory-centric design.
Sundari Mitra, co-founder and CEO of NetSpeed Systems, sat down with Semiconductor Engineering to discuss machine learning, shifting from a processor-centric to a memory-centric design, and what needs to change to make that all happen. What follows are excerpts of that conversation.
SE: What is the biggest change you’re seeing?
Mitra: We go through a cyclical loop where there’s a burst of innovation on the hardware side and then it stops for a while. Then the software side says, ‘I’m going to provide all the acceleration that is required.’ So the software industry scales up, but it doesn’t work because the hardware underneath has to provide the bench strength and backbone required to run the software. That’s a problem as well as anopportunity. The world has become shy about investing in innovation for semiconductors because it is very expensive, and there is a delayed return on investment—sometimes as much as 10 years. NetSpeed has been around for five years, and if I had a band of investors who were like the social-media happy investors, they would have forced me to exit.
SE: Is part of the problem that it’s much more technologically complex to invest in hardware versus software?
Mitra: Yes. Most investors don’t appreciate why we need to hire people with skill sets that are a lot more expensive. It’s a different pedigree. You can’t hire from the street and build a successful hardware company. You have to select from a small group of people who are experts.
SE: On the flipside, kids coming out of school want to work for Facebook, Google, and software companies doing analytics. If there is a preponderance of them versus hardware engineers, what happens?
Mitra: It tips the balance more to the software side. The majority of the kids want to do that because that is where they see the growth. Having said that, when you go to schools like Georgia Tech, kids come out with an architecture background but no company will hire them into an architecture position because they believe that architecture is something that comes with experience. These kids end up becoming verification engineers in the semiconductor industry. At NetSpeed, we haven’t done that. We have broadened them and given them architecture responsibility because we have a software machine learning tool that we are building to configure the NoCs (networks on chip). It may be the same experience they would get at Facebook, but there they would get it much later in their career because those positions are the gems of the field. You have to pay your dues and do a lot more work on the fringes before you get to work on the guts.
SE: What do you mean by machine learning? That term has been defined in many ways.
Mitra: From my perspective, it is making your design adapt. If you compare machine learning to AI, there’s very little difference because AI is adding synaptic, new connections so that you can do more. You are designing an optimal processing agent size and you are adapting the connections to it to ensure that it can do more.
SE: How do you see on-chip networks evolving with that?
Mitra: You can always have a manual method of doing it. But with the heterogeneity of our SoC design flows, if you don’t have something that allows a customer to modulate the design and configure it based on heterogeneous traffic flows, they are going to be overdesigning or under-designing.
SE: You’re talking about the self-configuration side, right? This is really one step beyond where most people are today.
Mitra: Correct. In a world where power doesn’t matter, you don’t need self-configuration? You can over-design a system, hook up processor agents and modulate the connections between them to make it do task A or task B or a combination of both. You can connect it up completely and then you don’t need any adaptability because it is going to perform the best for every function. But that means you have just exploded your power envelope and increased the complexity significantly, and you have built in a ton of redundancy. With compute for IoT, including the automotive and cloud markets, power is a big deal. Folks don’t want to build out and use general-purpose processors connected in full meshes because they can’t afford the power.
SE: When you are moving down from 14nm to 7nm, any extra margin is going to hurt. At 40/65, if you are doing a system in package, that is probably less of an issue. So how do you really streamline this whole thing as a lot of those chips have already been designed and will be just connected somehow?
Mitra: I haven’t seen it where you have a common substrate and then you have a 40nm and 7nm part riding on the same thing. It is totally different power characteristics. That will be a challenge for a system in package.
SE: What are you seeing companies do with increased heterogeneity?
Mitra: There are a few forms of heterogeneity. If you look at cloud computing and what Microsoft has been touting in their cloud, they want standard processing to happen on Xeon-type chip set that is doing the processing. They augment all the heterogeneity, the variation, the hardware accelerators that are required for their traffic class, in an FPGA. That is one way in which to manage heterogeneity—let the traditional thing be traditional, and instead of being on one chip at a system-level, modulate with an FPGA.
SE: We are seeing ARM moving in alongside Intel chips for similar reasons, right?
Mitra. Correct. Some companies are going in that direction, using many ARM cores and providing that offload to the main processor. As the customers do that and realize the value of it, they typically want it all to be on one chip. FPGAs are great launch pads for something, but if you need high volumes they are just too expensive in terms of power, speed and cost. So eventually these companies want an SoC. They have a solution that works and they are trying to take that and put it in a chip that is optimized and do it fast. They are afraid of an Amazon or a Microsoft beating them to it, and their customer base moving toward the cloud because the performance is what people are dying for. As more and more units are connected—handheld devices, cars, the home—the variation of load and compute that is required, as well as the quantity of compute that is required, are exploding.
SE: So we’re dealing with the need to speed up processors and to make the cycles more efficient, and to do more very quickly across the SoC or whatever the system is, and all using less power. Aren’t these orthogonal requirements?
Mitra: It’s absolutely orthogonal. If you think about automotive—which I will call IoT because whatever is happening in automotive, in some shape, may happen in the IoT segment, with maybe a lesser degree of paranoia—the highest priority is that if a sensor detects a life-threatening problem it needs the highest priority interrupt. It needs to be able to press the brake and move the steering. It doesn’t matter what else is going on in the car.
SE: It also needs to learn from when it is out on the road something else has happened across the network, right?
Mitra. Yes, and you can say that is the AI of the whole system. It needs to learn and adapt. Eventually, you need to use machine learning algorithms and implement them on a SoC. To me, AI is about gathering and distilling the data and pointing to where it can actually be programmed on a chip, or so the chip can adapt to that. When something like this is going on, the one resource that everyone wants to access and at different priority levels is memory. The stored learnings are sitting in the memory, but they all need access to it to be able to figure out what it means when they get a signal. I see a big truck so I need to brake. But they need to know and sense something and figure out the meaning of it, and they are going to do it by matching and determining, ‘This is it. Move!’
SE: You have a lot of traffic and contention for memory. How do you solve that?
Mitra: If you can do fast pattern matching and set up a system that knows how to prioritize the right kind of input to memory, that is when you have a workable solution. There are a couple of ways to do this. We try to bring the memory closer to the agents that require the memory, so we take a small part of that memory and put it inside the processor or as close to the processor as possible—DRAM on the outside and on-chip SRAMs. All of us didn’t envision that there would be agents outside that CPU cluster that actually require access to it. What we’ve done for generations is we’ve optimized our cache coherency design to be very homogenous because we expect that it is going to be low-latency, reasonable-bandwidth traffic. That is what CPUs are all about. That is how everything has been designed. But all of a sudden with automotive there is sensing and graphics that require this, as well. The GPU also needs access to that same memory, but the GPU behaves radically different from the CPU. The GPU is all about bandwidth and doesn’t care about latency as much. There is so much data to distill. It goes back to our brain using 60% for vision processing. It is the same thing.
SE: Machine vision is much more interesting in some ways than human vision, which is all visual acuity and lower acuity on the periphery. Machine vision can use infrared and radar. It’s a whole different way of looking at the world. How does this apply here?
Mitra: The problem that I’m solving is that we now have a GPU that wants access to memory with a CPU. This came about a couple of years ago when people started creating coherent clusters with a CPU and GPU on one die, sharing the cache. The GPU alone is not enough. We need hardware accelerators, for example—the vision processor also needs access to this memory because it needs to know how to act very fast. If it has to go through the CPU, then into memory and resolve it, it doesn’t work. It’s getting its own processor or hardware accelerator next to it and it wants to participate in the same coherent system.
SE: So you are trying to architect a multi-feed type of processing, where it is all using the same brain, but the brain itself has to distribute it around different centers with enough speed?
Mitra: Correct. That is where the industry is going. It is moving away from a processor-centric design to a memory-centric design, because that is the resource that everyone needs access to. Yes, you need very fast processing. We’ll have that in single-thread, multi-thread performance that you need from the CPU. And you have two schools of thought on that. Then you need that GPU because there is so much data that is going to flood your network. But you can’t let it hog the whole bandwidth, when there is an engine that wants access to that. You have to ensure there is no head-of-line blocking in your network, or that it is not stuck behind some big data coming from the GPU. The heterogeneity of the traffic and the heterogeneity of the behavior of the different agents, all wanting to access and respond to what is stored in the memory, is what is defining how these platforms work and how they need to adapt more and more.
SE: How close are we to seeing systems with this behavior?
Mitra: In our experience in working with customers, the automotive and IoT segments are gaining traction with this. Customers are looking for fast reaction times.
SE: IoT is very vague concept. There is IoT today, a vision of what it will look like in a couple years and a long-term vision. They are very different.
Mitra: Agreed, but to me I’m equating NetSpeed-related IoT not to wearables but to the compute engines that need to process that data. Whether they are edge or cloud, it doesn’t matter.
SE: Ultimately, where do you see this heading? Are we getting to the point where the technology sets itself up for us, such as with robotics?
Mitra: Yes, and robotics is an interesting field. It was AI that got the robotics movement going. It is impacting us in a big way. You can do surgeries with robots. You can go into the human body and do it with precision because you can control it. It is really powerful. That has become very real. A significant percentage of surgeries in the U. S. is now being done by robots. Do I see the technology driving us? There is always going to be an element where we are superior to the technology we build. An example of technology driving human behavior is Uber. The computer that is controlling the human to say, ‘Go drive, go pick up,’ is telling the human brain what to do. It is quite a paradigm shift.
SE: It’s redefining the boundaries of machines do and what people do. How do you run a company in this type of environment, where you have all these changes and the future is still very uncertain?
Mitra: For a small company to succeed, if you don’t have the focus you will fail. I’m completely focused on solving smaller problems for our customers. We provide a strong foundation pedestal to build a company’s technology. We are adaptable and customizable.
SE: So how does this technology get used?
Mitra: We have given the architect the method in which they can do bigger and better things, rather than number crunching and spreadsheets. We are giving them a mathematical model that lets them now innovate beyond that. When you are not constrained with those clusters, horizons expand by taking it outside the chip. Why do I have to have a processor and an FPGA that is providing acceleration, but they still have different memory domains? Why not unify that if I can get them thinking alike?
SE: What are the pieces missing to make all this work?
Mitra: A lot of it is reticence for change, and there are good reasons for it. If this were cheaper to do, people would be taking experiments, gambling, and you would get more innovation as a result of it. But they don’t take risks because of a fear of failure. Failure in the semiconductor industry has become too expensive. If you look at the differences in how you verify hardware before we tape it out versus how a company like Facebook does releases of their software, there’s no comparison. For software companies, it’s okay if there is a mistake because they can do a new release that night.