Data Formats For Inference On The Edge


AI/ML training traditionally has been performed using floating point data formats, primarily because that is what was available. But this usually isn't a viable option for inference on the edge, where more compact data formats are needed to reduce area and power. Compact data formats use less space, which is important in edge devices, but the bigger concern is the power needed to move around... » read more

A PIM Architecture That Supports Floating Point-Precision Computations Within The Memory Chip


A technical paper titled “FlutPIM: A Look-up Table-based Processing in Memory Architecture with Floating-point Computation Support for Deep Learning Applications” was published by researchers at Rochester Institute of Technology and George Mason University. Abstract: "Processing-in-Memory (PIM) has shown great potential for a wide range of data-driven applications, especially Deep Learnin... » read more

Low-Power Heterogeneous Compute Cluster For TinyML DNN Inference And On-Chip Training


A new technical paper titled "DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training" was published by researchers at University of Bologna and ETH Zurich. Abstract "On-chip deep neural network (DNN) inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy, and flexibility requirements. Heterogeneous clus... » read more

Will Floating Point 8 Solve AI/ML Overhead?


While the media buzzes about the Turing Test-busting results of ChatGPT, engineers are focused on the hardware challenges of running large language models and other deep learning networks. High on the ML punch list is how to run models more efficiently using less power, especially in critical applications like self-driving vehicles where latency becomes a matter of life or death. AI already ... » read more

Transforming AI Models For Accelerator Chips


AI is all about speeding up the movement and processing of data. Ali Cheraghi, solution architect at Flex Logix, talks about why floating point data needs to be converted into integer point data, how that impacts power and performance, and how different approaches in quantization play into this formula. » read more

Challenges For New AI Processor Architectures


Investment money is flooding into the development of new AI processors for the data center, but the problems here are unique, the results are unpredictable, and the competition has deep pockets and very sticky products. The biggest issue may be insufficient data about the end market. When designing a new AI processor, every design team has to answer one fundamental question — how much flex... » read more

Convolutional Neural Network With INT4 Optimization


Xilinx provides an INT8 AI inference accelerator on Xilinx hardware platforms — Deep Learning Processor Unit (XDPU). However, in some resource-limited, high-performance and low-latency scenarios (such as the resource-power-sensitive edge side and low-latency ADAS scenario), low bit quantization of neural networks is required to achieve lower power consumption and higher performance than provi... » read more

Formal Verification Of Floating-Point Hardware With Assertion-Based VIP


Hardware for integer or fixed-point arithmetic is relatively simple to design, at least at the register-transfer level. If the range of values and precision that can be represented with these formats is not sufficient for the target application, floating-point hardware might be required. Unfortunately, floating-point units are complex to design, and notoriously challenging to verify. Since the ... » read more

Advantages Of BFloat16 For AI Inference


Essentially all AI training is done with 32-bit floating point. But doing AI inference with 32-bit floating point is expensive, power-hungry and slow. And quantizing models for 8-bit-integer, which is very fast and lowest power, is a major investment of money, scarce resources and time. Now BFloat16 (BF16) offers an attractive balance for many users. BFloat16 offers essentially t... » read more

Reducing Latency, Power, and Gate Count with Floating-Point FMA


Today’s digital signal processing applications such as radar, echo cancellation, and image processing are demanding more dynamic range and computation accuracy. Floating-point arithmetic units offer better precision, higher dynamic range, and shorter development cycles when compared with fixed-point arithmetic units. Minimizing the design’s time to market is more important than ever. Algori... » read more

← Older posts