Author's Latest Posts

Synchronization Overview And Case Study on Arm Architecture

The objective of this white paper is to share knowledge on Arm architecture. The target reader of this document is those who work on synchronization with the Arm architecture. [Warning] When we are dealing with locking optimizations, we must be extremely careful about correctness. Bugs caused by synchronization are usually hard to root cause and the optimized code may crash on other CPUs wit... » read more

Introduction To The Arm Cortex-M55 Processor

This white paper covers the technical details, including pipeline, floating-point support and features of Arm Cortex-M55 processor. The Arm Cortex-M55 processor is Arm’s most AI-capable Cortex-M processor and the first to feature Arm Helium vector processing technology, bringing enhanced, energy efficient signal processing and machine learning (ML) performance. Click here to read more. » read more

Every Walk’s A Hit: Making Page Walks Single-Access Cache Hits

As memory capacity has outstripped TLB coverage, large data applications suffer from frequent page table walks. We investigate two complementary techniques for addressing this cost: reducing the number of accesses required and reducing the latency of each access. The first approach is accomplished by opportunistically "flattening" the page table: merging two levels of traditional 4 KB p... » read more

Components And Tools for Functional Safety Applications

Functional safety is important across a variety of markets, including the automotive, industrial, medical, and railway sectors, and often prevalent in consumer electronics. However, the complexity of the embedded software required for functional safety is growing and security issues are rising due to connectivity requirements. This can result the failure of a safety-critical system and lead to ... » read more

Arm Neoverse N1 Core: Performance Analysis Methodology

The Arm Neoverse ecosystem is growing substantially with many Arm hardware and software partners developing applications and porting their workloads onto Arm-based cloud instances. With Neoverse N1 based systems becoming widely available, many real-world workloads are showing very competitive performance and significant cost savings when compared to legacy systems. Some recent examples include:... » read more

Bandwidth Utilization Side-Channel On ML Inference Accelerators

Abstract—Accelerators used for machine learning (ML) inference provide great performance benefits over CPUs. Securing confidential model in inference against off-chip side-channel attacks is critical in harnessing the performance advantage in practice. Data and memory address encryption has been recently proposed to defend against off-chip attacks. In this paper, we demonstrate that bandwidth... » read more

Post-Quantum Cryptography

Quantum computing is increasingly seen as a threat to communications security: rapid progress towards realizing practical quantum computers has drawn attention to the long understood potential of such machines to break fundamentals of contemporary cryptographic infrastructure. While this potential is so far firmly theoretical, the cryptography community is preparing for this possibility by deve... » read more

Understanding Write Combining On Arm

Write Combining (WC) is a specialized memory type defined by the x86-64 architecture that is used for gathering multiple stores into burst transactions over the system bus. WC is commonly used on x86-64 platforms for interaction with I/O and other peripheral devices. In this whitepaper we provide an overview of the Arm architecture memory types that provide WC-like capabilities. In addition, t... » read more

A Layered Approach To High Performance Device Virtualization

The complexity and performance requirements of computing systems have been growing and demands are further driven by applications, such as ML and the everything-connected world of IoT with many billions of connected devices. Arm has developed a virtualization and accelerator strategy to address this, which we discuss in this white paper from our Architecture and Technology Group A layered... » read more

Powering The Edge: Driving Optimal Performance With Ethos-N77 Processor

Repurposing a CPU, GPU, or DSP is an easy way to add ML capabilities to an edge device. However, where responsiveness or power efficiency is critical, a dedicated Neural Processing Unit (NPU) may be the best solution. In this paper, we describe how the Arm Ethos-N77 NPU delivers optimal performance. Click here to read more. » read more

← Older posts Newer posts →