Arm Neoverse N1 Core: Performance Analysis Methodology

Performance analysis on Neoverse N1 core using hardware PMU events.


The Arm Neoverse ecosystem is growing substantially with many Arm hardware and software partners developing applications and porting their workloads onto Arm-based cloud instances. With Neoverse N1 based systems becoming widely available, many real-world workloads are showing very competitive performance and significant cost savings when compared to legacy systems. Some recent examples include: H.264 video encoding, memcached, Elasticsearch, NGINX and more.

To maximize their execution performance, developers use performance analysis and workload characterization techniques to study performance characteristics of applications. Server class systems support a wide range of performance monitoring techniques to measure workload efficiency, evaluate theirresource requirements, and track resource utilization. Such measurements are useful to tune both software and hardware and also help guide future system design requirements.

The Arm Neoverse micro-architecture has been developed with both high performance and power efficiency in mind. As such, our philosophy to performance monitoring might differ slightly from what software developers have used to analyze systems based on other architectures. This paper outlines a methodology for workload characterization using the Performance Monitoring Unit (PMU) capabilities on the Neoverse N1 CPU to identify and eliminate performance bottlenecks. The intended audience is software developers and performance analysts working on software optimizations, tuning, and development.

The content of this paper is divided into four chapters: The first chapter introduces the hardware PMU on Neoverse N1 with a list of most relevant PMU events for workload characterization. The second chapter presents a workload characterization methodology using Neoverse N1 core PMU events. The third chapter illustrates how the Linux perf tool can be used to collect Neoverse N1 PMU events. The final chapter demonstrates a workload characterization and hot spot analysis with an example workload case study.

By Jumana Mundichipparakkal, Krishnendra Nathella, and Tanvir Ahmed Khan

Click here to read more.

Leave a Reply

(Note: This name will be displayed publicly)