The Race To Accelerate

What’s behind the buzz surrounding eFPGAs, and why now?


Geoff Tate, CEO of Flex Logix, sat down with Semiconductor Engineering to discuss how the chip industry is changing, why that bodes well for embedded FPGAs, and what you need to be aware of when using programmable logic on the same die as other devices. What follows are excerpts of that conversation.

SE: What are the biggest challenges facing the chip industry?

Tate: A huge one is the growing cost and complexity of new designs. That’s what’s driving a lot of industry consolidation. Just as people had to band together because they couldn’t afford their own fabs, now they have to band together because designing a chip is getting so expensive. To design a chip you need a big team of people, so even a company that’s pretty good size has to centralize these kinds of things. The economics of scale are working toward concentration and consolidation, where everyone knows how to build switches and that the next switch chip has to be twice or four times as fast. On the flipside, we’re also seeing a bunch of chip and system companies springing up to address new stuff, such as AI and LiDAR.

SE: What’s different about these startups?

Tate: A lot of them are not necessarily just chip companies. People are looking for chips plus software, or systems with chips. Just building and selling chips as a business model is less favored these days. If you’re selling to system companies, the power is with the systems company. If you can sell more of a total solution with software, or if you can actually be the systems company and build your own chips, that’s much more attractive to the venture folks. Most of the big established companies, just like always in the past, aren’t likely to be the ones that are going to be good in the new areas. So the industry is consolidating on one side, and creating on the other.

SE: Where does Flex Logix fit on that wave?

Tate: When we started, the appetite for semiconductor companies among VCs was non-existent. Nobody wanted to talk to you if you said ‘semiconductor.’ Fortunately, we were able to get some funding from Lux Capital and didn’t need a lot to get started. Now the interest in semiconductors is way higher. There’s a resurgence there, but the VCs are looking for a different kind of semiconductor company than before. Half of our value proposition is software. In the past, most semiconductor companies were terrible at software. Investors are looking for the companies to have a different set of skills so the customer has more of a total solution. They want these companies to be able to retain value and market share-and not be knocked off so easily.

SE: The synergy between the software and hardware is certainly much tighter than it used to be. We’re seeing more software-defined hardware, with a lot of iterations back and forth.

Tate: In general, the new startups have a specific mission in mind and they work the hardware and software together to address some specific problem. It’s not like a traditional chip company where people build chips and the system folks have to figure out how to handle the software. In our case, we’re developing embedded FPGAs, so what we need to do is look like FPGAs and to use existing synthesis tools so we’re not totally starting from a blank slate. We have to provide a set of software, and a lot of the value in what we deliver is our suite of tools to deliver high performance. Without the software component, an FPGA is not useful.

SE: There is a growing push toward heterogeneity in designs, with lots of specialized processors scattered around a chip. How do you see embedded FPGAs fitting into this mix?

Tate: It seems very likely that embedded FPGAs eventually will be as widely used as embedded processors, so there would be multiple embedded FPGAs on chips just like there are multiple processors on most chips. Customers today tend to think about having one eFPGA, because that’s the paradigm they’re used to with FPGA chips. But some of these chips have an amazing number of processors on them. Even Intel processors have little processors on them.

SE: So how do you envision this playing out in the future compared with discrete FPGAs?

Tate: The way FPGA chips are used for systems is not going to be the way FPGAs will be used inside chips. Inside of chips everything has to be very cost effective. You only want use an FPGA if you need to reconfigure it. In the system, a lot of the logic in an FPGA isn’t changing. Some of it is changing, but not all of it. It just happens to be more convenient to dump it all in there. If you want to integrate that FPGA into an ASIC, a smart company will figure out what doesn’t need to change and hardwire that, and take islands in the logic that need to remain reconfigurable and have blocks of embedded FPGA distributed throughout the hardwired logic. They’ll achieve the most flexible, lowest-cost solution that way. It doesn’t really cause any issues with respect to noise or anything like that. Our embedded FPGA is a digital logic block that doesn’t have any special characteristics. It’s all digital-all digital DRC design rules, robust power grid. It’s solid as a rock. It’s not an issue whether you have one or a bunch inside the chip for many characteristics like noise. Once a customer starts thinking about multiple blocks, it’s an easy transition for them and it’s not hard to do the integration.

SE: Where do engineers run into problems with embedded FPGAs?

Tate: Many chip engineers don’t know anything about FPGAs, so they don’t get hung up on preconceived notions. But where they do tend to get stuck is when they try to make too big of an embedded FPGA. Rather than partitioning the logic into hardwired and reconfigurable parts, it’s just easier to say, ‘Why don’t I just take the whole block because it would be nice to be able to change it.’ Inevitably they decide it’s too big, and then they have to think about partitioning. Partitioning requires architectural work. Sometimes they have to think about partitioning stuff that they’re not necessarily experts in. Some don’t have the skill set or the bandwidth.

SE: Is it better to partition or run separate embedded FPGAs?

Tate: You want to hardwire everything that doesn’t need to be reconfigurable. Whatever is left is what should be embedded FPGA. And if it turns out there are multiple blocks, then you’ll have multiple embedded FPGAs. It depends on what you are trying to do.

SE: What’s the overhead on using an embedded FPGA versus hardwired?

Tate: The best way to answer that is to get the evaluation software, take your RTL and run it through. That will tell you how many lookup tables it has, how many square millimeters it is, and then you can compare it to what it would be for your design if you hardwired it using your existing tools. What’s interesting is that different customers will come up with very different answers. The underlying architecture may work very well sometimes, and at others it may not. Suppose you have a processor and it has a 32-bit multiplier, but you have an application that needs 33-bit multiplies. You have to do a 64 x 64 multiply to achieve your objective. So you can make it work, but it won’t be nearly as efficient as the hardware where the multiplier is 1 bit wider. That’s just an arbitrary example, but if the customer’s multiplies fit well with the underlying architecture, it’s a lot more efficient than if it doesn’t. So we deal with a wide range of customers with a wide range of numbers. In general, it’s an order or two of magnitude less area-efficient than an ASIC. On the other hand, if you need reconfigurablity, you need to consider what other reconfigurable solutions could you do other than embedded FPGA to achieve your objective.

SE: A lot of that reconfigurability has been done in software in the past.

Tate: Right, and the tradeoff there is that FPGA is likely to be faster and more efficient. But you also can have different sizes of processors. If you are processing I/O, many processors aren’t very good at that. FPGAs can indirectly talk to all the I/Os, so you have to compare it. Now you have to look at, ‘This block needs to be reconfigurable, I can do it with a processor, I can do it with an embedded FPGA.’ In a lot of cases, what people look at is adding lots of registers and logic and extra modes and commands and stuff into their hardwired logic, trying to anticipate the range of change they may need. That’s another approach. You can have state machines, which are kind of a subset of FPGAs. You can have microcode. You can do a DSP, which is like a big microcoded thing. You can do a lot of things that are reconfigurable. Then you have to compare them all and decide which one of those is better for your application. If you are trying to write a C program and you have a fixed function C program, you can hardcode that in. If you know exactly what it is doing and put in a bunch of gates, it would be way faster and way smaller than a processor. An embedded FPGA is bigger, but it’s reconfigurable. If you don’t need reconfigurability, you can stick with hardwired.

SE: Are there any differences with timing closure when you are dealing with an eFPGA versus an ASIC?

Tate: With an embedded FPGA, you’re dealing with an ASIC because it’s surrounded by an ocean of hardwired logic. There’s no issue with timing closure. We recommend flopping the inputs and outputs and then use all the timing closure tools at the ASIC level to make sure signals get in and out of the embedded FPGA. And you should use the FPGA timing tools to do all the stuff that happens inside the embedded FPGA.

SE: How does it work when you flop the input and outputs?

Tate: If you want to close timing, you need to have fixed and known timing paths. If your timing path goes from the ASIC into an embedded FPGA without clocking at the eFPGA boundary using the eFPGA I/O flip-flop, that means the path terminates within the eFPGA. But if the logic is reconfigured, the terminal flip-flop within the eFPGA is likely placed in a different location, and so the timing changes, too. We recommend everyone using eFPGA flop inputs at the I/O boundary of the eFPGA when entering and exiting to close the ASIC timing. For timing within the eFPGA, use the eFPGA timing tools to determine the critical path for the various process corners and voltages.

SE: Once you get familiar with this, what else can you do with it? Is it just a way of building flexibility into existing architectures, or can it be used to expand out into new architectures?

Tate: Once people get comfortable with it, they’ll think of all sorts of things they can do that they couldn’t do before. That’s why it will become pervasive. The applications are shifting toward where they need that performance. And what customers find is that once they actually start using this, it’s much easier than they thought. We did some work with Harvard, and from the time we shook hands to the time they taped out was two months. To be fair, they’re building academic chips so they don’t have to do some of the steps that a production chip company has to do, but it was still very quick. They worked it into their architecture at the last minute because they wanted to be able to update their AI algorithms in real time. The most important thing the customer wants to see is someone else going first. In aerospace, Sandia announced that they are going first. They’ve got silicon, fabs, they’re building chips, and they’re fully committed and everything works. In the aerospace community, we have a lot more activity going on as a result of that.

SE: How about in the commercial chip market?

Tate: Last year and this year should be the turning point where early adopters actually adopt embedded FPGAs, but it still takes time before they come to market. It will be next year or the year after before they’re in production and shipping. The biggest sectors initially will be networking, base stations, and microcontrollers. There’s a wide range of applications, but that’s the biggest concentration.

SE: Does it matter what node you are running at 16/14nm or 10/7nm?

Tate: The customer specifies it. We’re not the determining factor for choosing the process. In our case, we support a fair number of processes already, and we design using standard cell-based design, so that gives us an advantage. Standard cells often cover two or three different flavors of a process.

SE: Is it all TSMC or other foundries as well?

Tate: We’ve done four different ports to TSMC, because that is what most customers want, and one to Sandia. We are customer-driven.

SE: Can you move into any of the advanced packaging options, including chiplets?

Tate: We have a target product spec on the chiplet. We know how to build and offer a chiplet, but the chiplet market is going to be some time in the developing. Any multi-chip packaging today involves a lot of custom design and maybe standard memories. The logic is all custom. If there are two logic chips, both sides have been custom designed. And all the interfaces are totally custom. I don’t see it being a really big standard market any time soon. Integrating into the same piece of silicon will be faster and cheaper than having a chiplet.

SE: What’s happening with the AI, ML, and DL markets?

Tate: We’re working with Harvard, and you can look at their 28nm paper. They developed a DNN engine for edge AI. They’re using us to be able to have some part of their chip where they can change algorithms quickly and immediately. What I see is that people are trying a whole lot of stuff, which is what they usually do in a new market.. There’s a whole range of tasks-vision, inference, training, and my guess is that in 10 years they will have figured out what the optimal architecture is. They’re experimenting right now and keep evolving. I don’t think AI will be all-FPGA or all-GPU, or all-anything. In 5 to 10 years, the optimized silicon for AI will be some hybrid of things. It’s likely that embedded FPGA is part of that.


Karl Stevens says:

-What about metastability/clock domain crossing?
Quote”But if the logic is reconfigured, the terminal flip-flop within the eFPGA is likely placed in a different location, and so the timing changes, too. We recommend everyone using eFPGA flop inputs at the I/O boundary of the eFPGA when entering and exiting to close the ASIC timing.”

-ASIC uses gates for logic while eFPGA uses LUTs — it seems that it would be easier to start with an FPGA (with all the DSP, memory blocks, etc.) , hardwire the part that will not change and only use the fabric for the changeable part.

– An FPGA based design would also allow many little heterogeneous C programmable processors to be used as accelerators. (their functions would simply be whatever was loaded into their memories.)

Leave a Reply