A PIM Architecture That Supports Floating Point-Precision Computations Within The Memory Chip

A technical paper titled “FlutPIM: A Look-up Table-based Processing in Memory Architecture with Floating-point Computation Support for Deep Learning Applications” was published by researchers at Rochester Institute of Technology and George Mason University. Abstract: "Processing-in-Memory (PIM) has shown great potential for a wide range of data-driven applications, especially Deep Learnin... » read more

Low-Power Heterogeneous Compute Cluster For TinyML DNN Inference And On-Chip Training

A new technical paper titled "DARKSIDE: A Heterogeneous RISC-V Compute Cluster for Extreme-Edge On-Chip DNN Inference and Training" was published by researchers at University of Bologna and ETH Zurich. Abstract "On-chip deep neural network (DNN) inference and training at the Extreme-Edge (TinyML) impose strict latency, throughput, accuracy, and flexibility requirements. Heterogeneous clus... » read more

Will Floating Point 8 Solve AI/ML Overhead?

While the media buzzes about the Turing Test-busting results of ChatGPT, engineers are focused on the hardware challenges of running large language models and other deep learning networks. High on the ML punch list is how to run models more efficiently using less power, especially in critical applications like self-driving vehicles where latency becomes a matter of life or death. AI already ... » read more

Transforming AI Models For Accelerator Chips

AI is all about speeding up the movement and processing of data. Ali Cheraghi, solution architect at Flex Logix, talks about why floating point data needs to be converted into integer point data, how that impacts power and performance, and how different approaches in quantization play into this formula. » read more

Challenges For New AI Processor Architectures

Investment money is flooding into the development of new AI processors for the data center, but the problems here are unique, the results are unpredictable, and the competition has deep pockets and very sticky products. The biggest issue may be insufficient data about the end market. When designing a new AI processor, every design team has to answer one fundamental question — how much flex... » read more

Convolutional Neural Network With INT4 Optimization

Xilinx provides an INT8 AI inference accelerator on Xilinx hardware platforms — Deep Learning Processor Unit (XDPU). However, in some resource-limited, high-performance and low-latency scenarios (such as the resource-power-sensitive edge side and low-latency ADAS scenario), low bit quantization of neural networks is required to achieve lower power consumption and higher performance than provi... » read more

Formal Verification Of Floating-Point Hardware With Assertion-Based VIP

Hardware for integer or fixed-point arithmetic is relatively simple to design, at least at the register-transfer level. If the range of values and precision that can be represented with these formats is not sufficient for the target application, floating-point hardware might be required. Unfortunately, floating-point units are complex to design, and notoriously challenging to verify. Since the ... » read more

Advantages Of BFloat16 For AI Inference

Essentially all AI training is done with 32-bit floating point. But doing AI inference with 32-bit floating point is expensive, power-hungry and slow. And quantizing models for 8-bit-integer, which is very fast and lowest power, is a major investment of money, scarce resources and time. Now BFloat16 (BF16) offers an attractive balance for many users. BFloat16 offers essentially t... » read more

Reducing Latency, Power, and Gate Count with Floating-Point FMA

Today’s digital signal processing applications such as radar, echo cancellation, and image processing are demanding more dynamic range and computation accuracy. Floating-point arithmetic units offer better precision, higher dynamic range, and shorter development cycles when compared with fixed-point arithmetic units. Minimizing the design’s time to market is more important than ever. Algori... » read more

Week In Review: Design, Low Power

Tools & IP UltraSoC debuted functional safety-focused Lockstep Monitor, a set of configurable IP blocks that are protocol aware and can be used to cross-check outputs, bus transactions, code execution, and register states between two or more redundant systems. It supports all common lockstep / redundancy architectures, including full dual-redundant lockstep, split/lock, master/checker, and... » read more

← Older posts