System Design Considerations For Embedded Heterogeneous Multiprocessing (HMP)

Integrating functionally asymmetric compute elements requires unique system design choices.


Heterogeneous multiprocessor (HMP) systems, using functionally asymmetric compute elements, such as application processors and microcontrollers integrated within the same SoCs, are now used extensively across a wide range of applications. These SoCs are used in smart, connected devices to transform the way we live – at home, in the car, and in our cities – with even more intelligence, speed and efficiency.

A fundamental requirement for modern embedded systems is using the right compute element for a given task. It enables these systems to meet the conflicting requirements of delivering high performance, while improving the overall system efficiency of the system. However, architecting heterogeneous systems requires unique system and software considerations.

This blog focuses on two questions:

  1. Why is heterogeneous computing such a fundamental requirement of the modern compute systems? and
  2. What are the system design choices to be made when architecting such heterogeneous multiprocessing (HMP) systems to integrate functionally asymmetric compute elements (e.g. Cortex-A53 and Cortex-M4) in the same system?

The key challenge of modern compute systems
The most significant challenge for modern compute systems is the requirement to handle a diversity of workloads without compromising on system efficiency.

In order to meet these diverse compute requirements and improve the efficiency of these systems, SoC architects rely on integrating functionally asymmetric processors within the same SoC. However, integrating microcontrollers and application processors (which differ significantly in terms of ISA, performance and software), requires some key considerations at system level.

So, what are some fundamental considerations when architecting an energy-efficient, heterogeneous compute system?

System design considerations for architecting embedded HMP systems
There are several types of HMP systems. In a generic sense, HMP system refers to a complex system that combines several different compute elements like a general-purpose processor, a graphics processor, an image processor, a video processor, a display processor and possibly several accelerators. Fig. 1 shows a typical HMP compute system that includes several compute elements.

The context of this blog is to discuss the system design considerations for integrating Arm’s application processors (e.g Cortex-A53, or Cortex-A35) with microcontroller (e.g Cortex-M4, Cortex-M33) in the same SoC. Consider the generic compute subsystem shown in Figure 2, using the Cortex-A and Cortex-M processors.

Figure 1: A generic heterogeneous multiprocessor (HMP) compute system

The system designer needs to consider the following fundamental questions when designing heterogeneous compute systems using Cortex-A and Cortex-M processors:

  1. How do you address the memory map differences?
  2. How do you distribute interrupts across the application processor and the microcontroller subsystems?
  3. How do you handle inter-processor communication?
  4. How do you handle Secure/Non-secure state communication?

1. How do you address the memory map differences?
There are two approaches to approaching different memory map addresses: low area cost or more flexibility.

Low area cost:

  • Advantages: sharing a common address space, peripherals grouped together in a system
  • Disadvantages: restrictive; Requires design time decision

More flexibility with a System Memory Management Unit (SMMU):

  • Provides a moveable window, allowing accesses to addresses beyond 32-bit for the Cortex-M processor subsystem
  • Add security attribute to transactions, allowing access to both Secure and Non-secure resources (if needed)
  • Run-time configurable by software

Figure 2: Using an SMMU allows more flexibility for a processor to access a wider memory addressing space

2. How do you distribute interrupts?
It can be necessary to share interrupt sources between processors of different classes. Interrupt sources may need to be connected to both interrupt controllers, which is relatively simple for wired interrupts. NVIC would need wrapper logic to handle message-based interrupts. For example: A sensor which can be serviced by an always-on Cortex-M core, when the Cortex-A processors are asleep. The GIC architecture is intended for use with Cortex-A and Cortex-R class processors, however, there is no support for connecting Cortex-M processors to GICv3/v4 interrupt controllers. Cortex-M processors have their own interrupt controller, called Nested Vector Interrupt Controller (NVIC), which has a similar programmer’s model and functionality to the GIC.

Option 1: Wired interrupt

  • Advantage: Easy system design
  • Disadvantage: Higher software overhead when switching interrupt allocation
    • NVIC configuration is accessible from the Cortex-M processor only
    • GIC configuration might not be accessible from Cortex-M processor
    • Use software mailbox to synchronize configuration changes (requires IPC)

Option 2: Message-based interrupt

  • Advantages: Small hardware cost; significant reduction in software overhead for interrupt allocation
  • Flexible design options, for example:
    • Message from Cortex-A to Cortex-M
    • Message from Cortex-M to Cortex-A
    • Interrupt distribution unit (shared)

See Figure 3 below for the system design options.

Figure 3: Two system design options for interrupts, for example with Arm Cortex-A and Cortex-M processors

3. How do you handle inter-processor communication?
Software running on two different processors needs to be able to communicate with each other. There are two elements to this:

    • Sending interrupt across to other processor(s)
    • Shared memory for mail boxes / semaphores data

Such communication would typically be via mail boxes in shared memory. This would need to be memory that is part of the main system’s address space, so that, for example, both the Cortex-A processors and the Cortex-M subsystem have visibility.

Such mail boxes might be complimented by door-bell interrupts, to signal the presence of new messages or the completion of previous commands. This requires a mechanism for each processor to generate interrupts in the other’s interrupt controller. See an example system diagram in Figure 4 below.

Here are a few use cases of when this communication is required, using Cortex processors as an example:

  • Cortex-A system requesting system control activities from Cortex-M system controller
  • Cortex-M sensor hub reporting data to Cortex-A processors
  • Initiating hand over of a shared peripheral from one system to another

Figure 4: Handling inter-processor communication between Cortex-A, Cortex-R and Cortex-M processor subsystems

4. How do you handle Secure/Non-secure state communication?
Architecting security in modern compute systems is a necessary requirement to enable devices to counter specific threats that it might experience. Typical use cases include: the protection of authentication mechanisms, cryptography, key material and digital rights management (DRM).

Key considerations when implementing security in an HMP system (see Figure 5 for an example system diagram):

  • If you are combining processors that do not use TrustZone security extension, the compute subsystem using must be defined as always Secure (e.g. system control processor subsystem) or always Non-secure (e.g. audio subsystem)
  • Ensure that the debug system matches the security domains for each processor
  • System memory partitioning and interrupt distribution in the Secure/Non-secure worlds across the two processor subsystems
  • Secure and Non-secure memory partitioning must match between the different processor subsystems

Figure 5: An example of an HMP system with hardware-enforced security, using TrustZone security extension and Cortex processors

Needless to say, there are a number of other design considerations to bear in mind when designing heterogeneous multiprocessing (HMP) systems. This blog only scratches the surface of the system design choices – download the full whitepaper to see more hardware system diagrams and the software considerations, as well.

Leave a Reply

(Note: This name will be displayed publicly)