Getting better density and performance for complex, frequently used blocks.
We work with a lot of customers designing eFPGA into their SoCs. Most of them have “random logic” RTL, but some customers have large numbers of complex, frequently used blocks.
We have found in many cases that we can help the customer achieve higher throughput AND use less silicon area with Soft Macros.
Let’s look at an example: 64×64 Multiply-Accumulate (MAC), below:
If you program this using Verilog and run through the synthesis tool, the tool will do a reasonable job: for 16nm EFLX eFPGA, the 64-bit MAC is generated using 20 DSP blocks (22×22 MACs) and 110 RBBs (blocks of 4 LUT4s each) with a period of 18ns worst case.
Instead, we can work with a customer to define a Soft Macro for their particular frequently used complex logic block which more optimally uses the resources available.
See below for an implementation of a 64×64 MAC optimized by our Solutions Architects based on our understanding of our eFPGA architecture and how to best map the 64-bit MAC on to it:
This Soft Macro 64×64 MAC achieves 16ns using 9 DSPs and 32 RBBs OR optionally 15ns using 12 DSPs and no RBBs.
The Soft Macro delivers a speed up of 10-20% with about half the resources!
This Soft Macro is easily instantiated in your Verilog code and recognized by the EFLX Compiler to be properly mapped instead of being synthesized by the synthesis tool.
Other algorithms have frequently used complex blocks: encryption/decryption, communications algorithms, blockchain and more. Soft Macros can improve density and performance for all of them.
The more compute power, the better. But what’s the best way to get there?
Yield rises with mask protection; multiple sources will likely reduce costs.
More heterogeneous designs and packaging options add challenges across the supply chain, from design to manufacturing and into the field.
CNTs promise big performance improvements, but achieving consistency and replacing incumbent technologies will be difficult.
Computational storage approaches push power and latency tradeoffs.
Gate-all-around FETs will replace finFETs, but the transition will be costly and difficult.
An upbeat industry at the start of the year met one of its biggest challenges, but instead of being a headwind, it quickly turned into a tailwind.
The backbone of computing architecture for 75 years is being supplanted by more efficient, less general compute architectures.
How long a chip is supposed to function raises questions design teams need to think about, including how much they trust aging models.
New interconnects and processes will be required to reach the next process nodes.
After failing in the fab race, the country has started focusing on less capital-intensive segments.
Servers today feature one or two x86 chips, or maybe an Arm processor. In 5 or 10 years they will feature many more.
SRC’s new CEO sheds some light on next-gen projects involving everything from chiplets to hyperdimensional computing and mixed reality.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
Leave a Reply