FPGA Graduates To First-Tier Status

Achronix’s CEO looks at the value of customized acceleration, and why FPGAs are better for certain types of computation than CPUs or GPUs.

popularity

Robert Blake, president and CEO of Achronix, sat down with Semiconductor Engineering to talk about fundamental shifts in compute architectures and why AI, machine learning and various vertical applications are driving demand for discrete and embedded FPGAs.

SE: What’s changing in the FPGA market?

Blake: Our big focus is developing the next-generation architecture. We started this project two years ago, looking at what improvements we can make. We started talking to the end customers in each of the key application centers to understand how they would get better performance and efficiency.

SE: What’s the verdict?

Blake: The one thing we heard loud and clear was, ‘Pay attention to memory.’ If the data is not available and stalling the computation engines, it’s a worthless exercise. Look at the holistic problem—don’t just look at a portion of the problem. You have to solve the whole piece. Memory bandwidth came out as being critical. Focus on memory bandwidth.

SE: Does it matter the type of memory that was being used? There’s a whole bunch of memory types out there now and new ones on the way.

Blake: There’s a lot of interest in the new memory types, but ultimately these companies are interested in cost. So they’re interested in new memory types if the cost is right. For example, HBM (high-bandwidth memory) has some very interesting capabilities, but it’s still quite expensive.

SE: But that varies, depending on who you’re dealing with, right? If it’s a chip company the concern over cost would be a lot different than if it’s a system company, which can amortize the cost across a system.

Blake: That is the case. If someone looks at it from a system-level, then compared to all the other components it’s not that significant. But at a chip level, it’s a high cost and you have to look at whether you do HBM or DDR or GDDR, and what the tradeoffs and alternatives are.

SE: Are systems companies focused on rightsizing processor and memory resources for a specific application, maybe building in some margin and headroom so they can grow capabilities as the algorithms change?

Blake: Yes, and there’s a lot of uncertainty as to how things are going to change. You look at the time to build and deploy hardware in this space, and you want to look over the horizon as much as you can. You have to think about future-proofing and the changes that may exist, and therefore you have to look at the flexibility of what kind of data you can use. It’s not that everything is going to be FPGA. It’s going to be a mixture of these technologies, whether it’s CPU, GPU, FPGA or custom ASICs. They’re all good tools that are very good for certain things. The question is what’s the mix of workloads that you’re going to have to implement, and especially what’s the mix of workloads that you’re going to have to implement 18 months down the road.

SE: Does FPGA include eFPGA?

Blake: It does. The core technologies are the same.

SE: So an eFPGA can end up in an ASIC, which acts like a bridge across various application spaces, right?

Blake: Yes. If it’s a technology that moves you along that compute efficiency and it gives you flexibility, you don’t just have to buy it as a standalone product. Just add it to a custom ASIC to give you that extra flexibility.

SE: With data-driven designs, one of the things that changes is there are many types of data. Vibration data is radically different from voice or vision data. How flexible is the FPGA for dealing with these different types?

Blake: It’s probably going to drive the resource mix of what would be in it. So the fundamentals won’t change, but the ratio of your computational block or your NLP (natural language processing) block to memories might change. It’s much more likely that the mix is going to change. What’s the throughput? Do you need a small or large cord? There are those kinds of tradeoffs. But the ability to change that mixture of resources is what is most powerful, because it will be different across different types of workloads.

SE: FPGAs used to be laggards versus CPUs. They added programmability, but with high overhead. Now it appears that a CPU or GPU isn’t the most efficient way of doing floating or fixed point math and that the FPGA is better at it. Suddenly you have to look at various processing elements in a whole different light.

Blake: The compute model has been flipped over. In the past you’d think of FPGA as a prototyping vehicle or a connectivity solution. You didn’t really think of it as being a computational engine. Now, all of a sudden, the difference between running something on a CPU versus an FPGA is that now you have a program that is going to fetch data and instructions and you’re going to compute those and get intermediate results. The difference in the FPGA is that you’re going to generate a customized data path that is going to blast the data set through at full speed. There’s no ‘fetch this instruction, decode.’ There’s none of that anymore. It’s just a customized data path that does that task. Once you’re done with that task, you can tear it down and build something else.

SE: You have chips now with hundreds of compute elements on them and hundreds of small memories that are next to them, as well. Each one is doing a much smaller job than it did in the past, but at least in theory, all working together. Where does the FPGA fit into that? Is it one of many elements?

Blake: If you have a multi-core processor architecture, you have many, many elements all doing things in parallel, but they’re still typically running the 32-bit or 64-bit engine. The difference fundamentally in the FPGA is now I’m going to build 3-bit arithmetic where that’s important. I’m going to build 32-bit where that’s important. And then I’m going to scale that and build thousands of copies. I’m going to build a data flow that simply does that computation.

SE: So you’re actually turning the FPGA into a customized approach—which is basically what you’d build an ASIC for in the past.

Blake: In a CPU architecture, you might build a certain number of ALUs (arithmetic logic units). So you very carefully build that ALU to give you maximum flexibility to do all of these things. In a FPGA architecture, now you can build a custom ALU that just happens to be for 3-bit arithmetic. Then you can build a thousand of those. Then you can add 16-bit floating point and build a data path with some of those, which can be stitched in with how that data is computed.

SE: So you’re modeling based on the data type as well as how that data moves?

Blake: Yes, you have a much more data-centric view of things.

SE: That becomes your starting point as opposed to, ‘Here’s our hardware, how do we run the data through it?’ You’re now looking at it from the standpoint of, ‘Here’s the data, now what do we do with it?’

Blake: Yes, and this is similar to the datacom space where people have been doing routing and switching over Ethernet. All the feeds and speeds from a networking standpoint are always data-centric. As those data rates went up, going from 10Gbps to 25Gbps, they were very used to dealing with all those things and building custom data flow engines that manage those data sets. A similar thing is happening in the compute environment.

SE: With a difference that you don’t necessarily move all the data, right?

Blake: You’re looking at what is important. Using an analogy in packet processing, I’d be looking at header information and source and destination. You’re not always processing all the rest of the packet. In the same way, you look at which computations actually make sense in order to move the problem to the next stage and only do that.

SE: You add in an element inherent security by doing that because you avoid CPU-related issues such as branch prediction and speculative execution. You don’t have to deal with either of those, do you?

Blake: No. You do the execution that you actually need.

SE: Is there an added element of security that is needed on FPGAs, as well?

Blake: Yes, now you’re seeing the requirements that all data in motion must be protected. We’re seeing that people don’t always just want the vanilla flavors of that ability to protect data. In some cases, they want the ability to customize what that encryption/decryption capability is or how to protect that data.

SE: So in addition to the ability to use 3-bit instructions, can you now make it 4 bits and add 1 bit of encryption?

Blake: You can choose what you do. But you don’t need that overhead of saying it’s either 32 bits or 64 bits.

SE: Does that also give you the ability to prioritize for applications such as autonomous driving, where you need an instantaneous response?

Blake: Yes, for anything that will be mission-critical, you can prioritize that in hardware.

SE: When did you first spot this opportunity for FPGAs?

Blake: I saw it quite a long time ago, but the first time around we weren’t able to do anything about it. I talked with some of the folks that were doing a Google search very early on. If you were doing basic search and looking for text string in a very long list, you want to see what the matches or partial matches are. I always thought it would be very inefficient for the CPU to fetch a piece of data and then another piece of data, which is the list, and then you make some kind of comparisons and you move on. It seemed much faster to stream in that data into an FPGA and stream the keys that you’re searching into it, and do a logical expression match in hardware, cycle by cycle. That data is just being streamed in real time—and not just once, but many copies of it. We had some ideas that it would be an interesting way of trying to accelerate a problem. At the time, we couldn’t cross the chasm. We were talking hardware, ASIC or FPGA language of hardware, and they were really looking at why wouldn’t you just add more CPUs for more performance. There was this chasm where we couldn’t talk the same language enough to link it up. At the time, I thought there was going to be a very good fit for parallelizing these things. That was a very long time ago.

SE: They took the brute force approach because there was enough headroom to continue cranking up the clock frequency?

Blake: Yes, and the idea was that CPUs are going faster, use more and that will fix the problem.

SE: Is it surprising how fast this has changed?

Blake: The rate of change and innovation is much, much higher than in the past. We’re building these new architectures add in flexibility. But the rate at which software and algorithms and new problems are being solved on it is shocking.

SE: Most of that change in the past happened on the software levels. It’s now happening at the hardware level. Why?

Blake: You’ve run out of performance gains. This goes back to Moore’s Law slowing down and not providing enough computational efficiency. If you’re going to enable many of these things, you need a new level of compute and power consumption. There wasn’t any VC investment in semiconductors for a period of time. All of that has changed because now there are going to be substantial gains.

SE: We have so many different ideas coming into the industry right now, but obviously not all of those can work. Are you seeing any trends yet or still too early?

Blake: It’s still too early, but it’s going to go on a very fast ramp up and there’s going to be a lot of churn. That will continue for some time. Eventually, you’re going to start to see trends emerging out of this where things are best done one way or another years where you see that these types of things are best done one way or another. But for the next five years, it’s going to be the Wild West—rapid growth with new ideas that we haven’t even thought about.

SE: How does this compare to previous inflections?

Blake: The Internet growth in the late ’90s was a very interesting phase. So was the PC era. But this is going to be much bigger and much more disruptive.

SE: There was a big crash at the start of the Internet boom.

Blake: That was partly because there were companies that didn’t really exist with ridiculous valuations. Now there are real products and services. That makes it very different.

SE: So what are some of the interesting applications.

Blake: There is some of the intelligence in things like voice processing. For the longest time, I hated when you dialed a call center and got the voice activated system and it didn’t work. Then, you’d have to say, ‘Operator’. Now, all of a sudden, that speech recognition percentage is in the mid- to high-90% range. It was like someone flipped a switch.

SE: How does this affect Achronix? Do you hire the same people as in the past?

Blake: We’re hiring on two fronts. We’re hiring more people on the engineering side who can do fundamental 7nm design and verification of those pieces. We’re also hiring a lot of software people.

SE: What kind software?

Blake: We’re looking at how to build some of these end applications—what kind of library building blocks that we have to do to facilitate that. Underneath a lot of these algorithms, there’s a pre-defined set of math. We’re trying to insulate the end users from having to be completely expert in the details of the architecture. You want to abstract that upward. That’s what we’re focusing on right now.

SE: Your software is a lot closer to the metal than a standard OS would be, right?

Blake: We’re building the first level of building blocks on top of software that is very connected to the metal. You’ve got to move that upward to the next level of building blocks, and then stitch that within some of the frameworks, whether that is Caffe or from an AI standpoint.

SE: Does that require a regular software engineering background or something different?

Blake: You hire people who have a software background but who also understand hardware. You want people who have a foot in both camps. They can’t be just software engineers, where they’re only writing routines and abstracting on top of a standard process. You want them to understand some of the nuances of what the hardware provides so they can take advantage of that.

SE: Do you foresee a day when you need look beyond that to where the FPGA becomes a hub for control a system or system of systems?

Blake: It could go that way, but you would have think about this very differently.

SE: What’s driving your growth now?

Blake: We’re doing some things that inherently were quite difficult to do. We built architectures, we built software tools, and we made our customers’ businesses grow. But then we just got fortunate because there was a confluence of this FPGA technology, which used to be used for prototyping, quirky interface technology stuff, and we discovered, ‘Wow, this stuff can accelerate compute acceleration by 30X to 50X.’ That was the game changer. We had been successfully doing some of these things, and then this came along to supercharge it.

SE: Is any of this tied to a specific region?

Blake: There’s a massive amount in China. When you look at the 2025 play by China to become a leader in this space, they recognize how important the next-generation phase of growth will be. The level of investment is going up significantly.



Leave a Reply


(Note: This name will be displayed publicly)