The Murky World Of AI Benchmarks


AI startup companies have been emerging at breakneck speed for the past few years, all the while touting TOPS benchmark data. But what does it really mean and does a TOPS number apply across every application? Answer: It depends on a variety of factors. Historically, every class of design has used some kind of standard benchmark for both product development and positioning. For example, SPEC... » read more

Evaluating NVMe SSD Multi-Gigabit Performance


The multi-channel parallelism and low-latency access of NAND flash technology have made Non-Volatile Memory express (NVMe) based SSDs very popular within the main segments of the data storage market, including not only the consumer electronics sector but also data center processing and acceleration services, where the key role is played by specialized FPGA-based hardware for application-specifi... » read more

Power/Performance Bits: Jan. 28


Accelerator-on-chip Researchers at Stanford University and SLAC National Accelerator Laboratory created an electron-accelerator-on-chip. While the technique is much less powerful than standard particle accelerators, it can be much smaller. It relied upon an infrared laser to deliver, in less than a hair’s width, the sort of energy boost that takes microwaves many feet. The team carved ... » read more

Defining And Improving AI Performance


Many companies are developing AI chips, both for training and for inference. Although getting the required functionality is important, many solutions will be judged by their performance characteristics. Performance can be measured in different ways, such as number of inferences per second or per watt. These figures are dependent on a lot of factors, not just the hardware architecture. The optim... » read more

MLPerf Benchmarks


Geoff Tate, CEO of Flex Logix, talks about the new MLPerf benchmark, what’s missing from the benchmark, and which ones are relevant to edge inferencing. » read more

Revving Up For Edge Computing


The edge is beginning to take shape as a way of limiting the amount of data that needs to be pushed up to the cloud for processing, setting the stage for a massive shift in compute architectures and a race among chipmakers for a stake in a new and highly lucrative market. So far, it's not clear which architectures will win, or how and where data will be partitioned between what needs to be p... » read more

Better Benchmarks Through Compiler Optimizations: Codasip Jump Threading


The architectural efficiency of embedded processor IP is measured by a small set of industry standard benchmarks, that even though often bear little correlation to real workloads, continue to persist. The most popular benchmarks are Dhrystone and CoreMark. An interesting observation regarding these test suites is that the performance numbers continue to improve for a given architecture, even... » read more

Building Your First Chip For Artificial Intelligence? Read This First


As artificial intelligence (AI) capabilities enter new markets, the IP selected for integration provides the critical components of the AI SoC. But beyond the IP, designers are finding a clear advantage in leveraging AI expertise, services, and tools to ensure the design is delivered on time, with a high level of quality and value to the end customer for new and innovative applications. Over... » read more

Isambard Analysis of HPC-Optimized Arm Processors


Written by Simon McIntosh‐Smith, James Price, Tom Deakin, Andrei Poenaru (all from the High Performance Computing Research Group, Department of Computer Science, University of Bristol, Bristol, UK) In this paper, we present performance results from Isambard, the first production supercomputer to be based on Arm CPUs that have been optimized specifically for HPC. Isambard is the first Cray ... » read more

Inference Acceleration: Follow The Memory


Much has been written about the computational complexity of inference acceleration: very large matrix multiplies for fully-connected layers and huge numbers of 3x3 convolutions across megapixel images, both of which require many thousands of MACs (multiplier-accumulators) to achieve high throughput for models like ResNet-50 and YOLOv3. The other side of the coin is managing the movement of d... » read more

← Older posts