中文 English

Author's Latest Posts


Every Walk’s A Hit: Making Page Walks Single-Access Cache Hits


As memory capacity has outstripped TLB coverage, large data applications suffer from frequent page table walks. We investigate two complementary techniques for addressing this cost: reducing the number of accesses required and reducing the latency of each access. The first approach is accomplished by opportunistically "flattening" the page table: merging two levels of traditional 4 KB p... » read more

Components And Tools for Functional Safety Applications


Functional safety is important across a variety of markets, including the automotive, industrial, medical, and railway sectors, and often prevalent in consumer electronics. However, the complexity of the embedded software required for functional safety is growing and security issues are rising due to connectivity requirements. This can result the failure of a safety-critical system and lead to ... » read more

Arm Neoverse N1 Core: Performance Analysis Methodology


The Arm Neoverse ecosystem is growing substantially with many Arm hardware and software partners developing applications and porting their workloads onto Arm-based cloud instances. With Neoverse N1 based systems becoming widely available, many real-world workloads are showing very competitive performance and significant cost savings when compared to legacy systems. Some recent examples include:... » read more

Bandwidth Utilization Side-Channel On ML Inference Accelerators


Abstract—Accelerators used for machine learning (ML) inference provide great performance benefits over CPUs. Securing confidential model in inference against off-chip side-channel attacks is critical in harnessing the performance advantage in practice. Data and memory address encryption has been recently proposed to defend against off-chip attacks. In this paper, we demonstrate that bandwidth... » read more

Post-Quantum Cryptography


Quantum computing is increasingly seen as a threat to communications security: rapid progress towards realizing practical quantum computers has drawn attention to the long understood potential of such machines to break fundamentals of contemporary cryptographic infrastructure. While this potential is so far firmly theoretical, the cryptography community is preparing for this possibility by deve... » read more

Understanding Write Combining On Arm


Write Combining (WC) is a specialized memory type defined by the x86-64 architecture that is used for gathering multiple stores into burst transactions over the system bus. WC is commonly used on x86-64 platforms for interaction with I/O and other peripheral devices. In this whitepaper we provide an overview of the Arm architecture memory types that provide WC-like capabilities. In addition, t... » read more

A Layered Approach To High Performance Device Virtualization


The complexity and performance requirements of computing systems have been growing and demands are further driven by applications, such as ML and the everything-connected world of IoT with many billions of connected devices. Arm has developed a virtualization and accelerator strategy to address this, which we discuss in this white paper from our Architecture and Technology Group A layered... » read more

Powering The Edge: Driving Optimal Performance With Ethos-N77 Processor


Repurposing a CPU, GPU, or DSP is an easy way to add ML capabilities to an edge device. However, where responsiveness or power efficiency is critical, a dedicated Neural Processing Unit (NPU) may be the best solution. In this paper, we describe how the Arm Ethos-N77 NPU delivers optimal performance. Click here to read more. » read more

The Power Of Virtual Prototyping: From SoC Design To Software Development


Virtual prototypes and hardware design: More powerful and complex integrated circuits and System-on-Chip (SoC) designers have a daunting task at both the hardware and software level. SoC architects need a method for early evaluation of hardware components, known as Intellectual Property (IP) blocks, that will have direct impact on the commercial success of the SoC. There are a range of complex ... » read more

Powering The Edge: Driving Optimal Performance With Ethos-N77 Processor


Repurposing a CPU, GPU, or DSP is an easy way to add ML capabilities to an edge device. However, where responsiveness or power efficiency is critical, a dedicated Neural Processing Unit (NPU) may be the best solution. In this paper, we describe how the Arm Ethos-N77 NPU delivers optimal performance. Click here to immediately download the paper. » read more

← Older posts