Improved DSP And AI Performance On An MCU Core

Benefits of a microarchitecture extension to replace lower to mid-tier DSP cores with on-chip processing.

popularity

In the world of embedded devices, there’s a growing demand for advanced machine learning and signal processing capabilities. ARM Cortex-M85, the latest general-purpose core, aims to meet these demands with its 32-bit Armv8.1-M architecture, offering high performance and power efficiency. The core’s Helium technology, M-profile vector extension (MVE), provides significant uplift for ML/DSP applications (4x ML performance and 3x DSP performance vs. Cortex-M7).

Helium-MVE

Helium technology is a microarchitecture extension for Cortex-M85 and Cortex-M55 cores, designed to replace lower to mid-tier DSP cores with on-chip processing. Helium consists of eight 128-bit vector registers and supports a wide range of vector data types for various applications. Features such as overlapping pipelines, improved branch prediction, and looping optimization contribute to its performance. In addition, enhanced memory access instructions and support for complex value processing make Helium a powerful addition to Cortex-M85.

Cortex-M85 features

Cortex-M85 outshines other Cortex-M cores with a wealth of features, including enhanced security options like pointer authentication, branch target identification (PABTI), and unprivileged debug extensions (DUE). The core also offers a 7-stage scalar pipeline and 9-10 stage vector and floating-point pipeline, with support for various data types. A detailed comparison of features between the top 3 high-end ARM microcontroller cores is illustrated below.

Table 1: Comparison between CM7 vs CM55 vs CM85

Benchmarks

Cortex-M85 achieves significant performance uplifts compared to other Cortex-M cores, outperforming Cortex-M7 in AI/ML performance by 4 times and Cortex-M55 by 20%.

Fig. 1: Performance uplift of CM85 vs CM7 and CM55 [Data Source: Arm]

Empirical data indicates that Helium technology improves the performance of some ML kernels by up to 787%, and up to 57% and 64% for fast Fourier transform and finite impulse response for floating points data types, respectively. However, do note that since Helium natively supports multiple data types, the performance uplift would be significantly higher in those instances.

Fig. 2: Benchmark performance for MVE vs non-MVE aware devices.
(a) CMSIS-NN with ARM compiler AC6.15 averaged result performance over a fully connected layer
(b) CMSIS-FFT&FIR for floating point with ARM Compiler AC6.16 (normalized performance)
[Data source: Arm]

In conclusion, the Cortex-M85 with Helium can contribute to a significant uplift in AI/ML and DSP performance while outshining the rest of the Cortex-M cores in scalar performance. This makes it an ideal choice for more complex processing tasks.



Leave a Reply


(Note: This name will be displayed publicly)