Author's Latest Posts


Arm Neoverse N1 Core: Performance Analysis Methodology


The Arm Neoverse ecosystem is growing substantially with many Arm hardware and software partners developing applications and porting their workloads onto Arm-based cloud instances. With Neoverse N1 based systems becoming widely available, many real-world workloads are showing very competitive performance and significant cost savings when compared to legacy systems. Some recent examples include:... » read more

Bandwidth Utilization Side-Channel On ML Inference Accelerators


Abstract—Accelerators used for machine learning (ML) inference provide great performance benefits over CPUs. Securing confidential model in inference against off-chip side-channel attacks is critical in harnessing the performance advantage in practice. Data and memory address encryption has been recently proposed to defend against off-chip attacks. In this paper, we demonstrate that bandwidth... » read more

Post-Quantum Cryptography


Quantum computing is increasingly seen as a threat to communications security: rapid progress towards realizing practical quantum computers has drawn attention to the long understood potential of such machines to break fundamentals of contemporary cryptographic infrastructure. While this potential is so far firmly theoretical, the cryptography community is preparing for this possibility by deve... » read more

Understanding Write Combining On Arm


Write Combining (WC) is a specialized memory type defined by the x86-64 architecture that is used for gathering multiple stores into burst transactions over the system bus. WC is commonly used on x86-64 platforms for interaction with I/O and other peripheral devices. In this whitepaper we provide an overview of the Arm architecture memory types that provide WC-like capabilities. In addition, t... » read more

A Layered Approach To High Performance Device Virtualization


The complexity and performance requirements of computing systems have been growing and demands are further driven by applications, such as ML and the everything-connected world of IoT with many billions of connected devices. Arm has developed a virtualization and accelerator strategy to address this, which we discuss in this white paper from our Architecture and Technology Group A layered... » read more

Powering The Edge: Driving Optimal Performance With Ethos-N77 Processor


Repurposing a CPU, GPU, or DSP is an easy way to add ML capabilities to an edge device. However, where responsiveness or power efficiency is critical, a dedicated Neural Processing Unit (NPU) may be the best solution. In this paper, we describe how the Arm Ethos-N77 NPU delivers optimal performance. Click here to read more. » read more

The Power Of Virtual Prototyping: From SoC Design To Software Development


Virtual prototypes and hardware design: More powerful and complex integrated circuits and System-on-Chip (SoC) designers have a daunting task at both the hardware and software level. SoC architects need a method for early evaluation of hardware components, known as Intellectual Property (IP) blocks, that will have direct impact on the commercial success of the SoC. There are a range of complex ... » read more

Powering The Edge: Driving Optimal Performance With Ethos-N77 Processor


Repurposing a CPU, GPU, or DSP is an easy way to add ML capabilities to an edge device. However, where responsiveness or power efficiency is critical, a dedicated Neural Processing Unit (NPU) may be the best solution. In this paper, we describe how the Arm Ethos-N77 NPU delivers optimal performance. Click here to immediately download the paper. » read more

Deploying Accurate Always-On Face Unlock


Accurate face verification has long been considered a challenge due to the number of variables, ranging from lighting to pose and facial expression. This white paper looks at a new approach — combining classic and modern machine learning (deep learning) techniques — that achieves 98.36% accuracy, running efficiently on Arm ML-optimized platforms, and addressing key security issues such a... » read more

A Performance Analysis Of The First Generation Of HPC‐Optimized Arm Processors


In this paper, the authors present performance results from Isambard, the first production supercomputer to be based on Arm CPUs that have been optimized specifically for HPC. Isambard is the first Cray XC50 “Scout” system, combining Cavium ThunderX2 Arm‐based CPUs with Cray's Aries interconnect. The full Isambard system contained over 10,000 Arm cores. In this work, we present node‐lev... » read more

← Older posts Newer posts →