Isambard Analysis of HPC-Optimized Arm Processors


Written by Simon McIntosh‐Smith, James Price, Tom Deakin, Andrei Poenaru (all from the High Performance Computing Research Group, Department of Computer Science, University of Bristol, Bristol, UK) In this paper, we present performance results from Isambard, the first production supercomputer to be based on Arm CPUs that have been optimized specifically for HPC. Isambard is the first Cray ... » read more

Inference Acceleration: Follow The Memory


Much has been written about the computational complexity of inference acceleration: very large matrix multiplies for fully-connected layers and huge numbers of 3x3 convolutions across megapixel images, both of which require many thousands of MACs (multiplier-accumulators) to achieve high throughput for models like ResNet-50 and YOLOv3. The other side of the coin is managing the movement of d... » read more

Use Inference Benchmarks Similar To Your Application


If an Inference IP supplier or Inference Accelerator Chip supplier offers a benchmark, it is probably ResNet-50. As a result, it might seem logical to use ResNet-50 to compare inference offerings. If you plan to use ResNet-50 it would be; but if your target application model is significantly different from Resnet-50 it could lead you to pick an inference offering that is not best for you. ... » read more

Real-Time Object Recognition At Low Cost/Power/Latency


Most neural network chips and IP talk about ResNet-50 benchmarks (image classification at 224x224 pixels). But we find that the number one neural network of interest for most customers is real-time object recognition, such as YOLOv3. It's not possible to do comparisons here because nobody shows a YOLOv3 benchmark for their inferencing. But it's very possible to improve on the inferencing per... » read more

Performance Benchmarking Embedded FPGAs


When evaluating the performance of an embedded FPGA, one needs to evaluate the performance of each of the individual modules that make up an FPGA. The basic modules are: Reconfigurable logic building blocks (RBB-Logic), Fine-granularity logic containing LUTs, carry-forwarding adder chain, and flip-flops Reconfigurable DSP building blocks (RBB-DSP), Medium-granularity arith... » read more

System-Level Verification Tackles New Role


Wally Rhines, chairman and CEO of Mentor Graphics, gave the keynote at DVCon this year. He said that if you pull together a bunch of pre-verified IP blocks, it does not change the verification problem at the system level. That sounds like a problem. There are assumptions made that the IP blocks work to a reasonable degree, and that when performing system-level verification the focus is not a... » read more

Metrics For Measuring Performance And Power In IoT SoC Designs


The problem confronting chip designers developing IoT SoCs is the need for high compute performance and low power consumption. This is especially true for SoCs being developed for devices required to operate for years on a battery. One example is the new generation of electronic shelf label (ESL) with a requirement of five years. The ESL receives central server pricing updates along with a f... » read more