INT8 provides better performance with comparable precision than floating point for AI inference. But when INT8 is unable to meet the desired performance with limited resources, INT4 optimization is the answer. This INT4 optimization achieves up to a 77% performance boost on real hardware in comparison with the current INT8 solution.
Xilinx provides an INT8 AI inference accelerator on Xilinx hardware platforms — Deep Learning Processor Unit (XDPU). However, in some resource-limited, high-performance and low-latency scenarios (such as the resource-power-sensitive edge side and low-latency ADAS scenario), low bit quantization of neural networks is required to achieve lower power consumption and higher performance than provided by INT8. However, extremely low bit quantization (such as binary or ternary) has accuracy degradation.
Thus, a full-process hardware-friendly quantization solution of 4-bit activations and 4-bit weights (4A4W) achieves better accuracy/resource trade-off. This white paper describes the implementation of a low-precision accelerator for CNN 4-bit XDPU on the Zynq UltraScale+ MPSoC and Zynq-7000 SoC families (16nm and 28nm), which takes full advantage of its DSP capabilities by efficiently mapping convolutional computations. This solution achieves 2X solution-level performance over the XDPU. On a 2D detection task in an ADAS system, the implementation achieves an inference speed of 230fps on a Zynq UltraScale+ MPSoC ZCU102 board, which is a 1.52X performance gain over the 8-bit XDPU. In addition, this solution achieves comparable results to full precision models on different tasks of the ADAS system.
Click here to read more.
An upbeat industry at the start of the year met one of its biggest challenges, but instead of being a headwind, it quickly turned into a tailwind.
The backbone of computing architecture for 75 years is being supplanted by more efficient, less general compute architectures.
The semiconductor industry will look and behave differently this year, and not just because of the pandemic.
Experts at the Table: Any chip can be reverse-engineered, so what can be done to minimize the damage?
More than $1.5B in funding for 26 startups; December was a big month for AI hardware.
Taiwan and Korea are in the lead, and China could follow.
An upbeat industry at the start of the year met one of its biggest challenges, but instead of being a headwind, it quickly turned into a tailwind.
New data suggests that more chips are being forced to respin due to analog issues.
New horizontal technologies and vertical markets are fueling the opportunities for massive innovation throughout an expanding ecosystem.
The backbone of computing architecture for 75 years is being supplanted by more efficient, less general compute architectures.
Rising costs, complexity, and fuzzy delivery schedules are casting a cloud over next-gen lithography.
Experts at the Table: The current state of open-source tools, and what the RISC-V landscape will look like by 2025.
Nvidia-Arm is just the beginning; more acquisitions are on the horizon.
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
Leave a Reply