Locking When Emulating Xtensa LX Multi-Core On A Xilinx FPGA

Enabling atomic access to shared memory data structures.

popularity

Today’s high-performance computing systems often require the designer to instantiate multiple CPU or DSP cores in their subsystem. However, the performance gained by using multiple CPUs comes with additional programming complexity, especially when accessing shared memory data structures and hardware peripherals. CPU cores need to access shared data in an atomic fashion in a multi-core environment. Locking is the most basic requirement for data sharing. A core takes the lock, accesses the shared data structure, and releases the lock. While one core has the lock, other cores are disallowed from accessing the same data structure. Typically, locking is implemented using an atomic read-modify-write bus transaction on a variable allocated in an uncached memory.

This blog shares the AXI4 locking mechanism when implementing an Xtensa LX-based multi-core system on a Xilinx FPGA platform. It uses a dual-core design mapped to a KC705 platform as an example.

Exclusive access to accomplish locking

The Xtensa AXI4 manager provides atomic access using the AXI4 atomic access mechanism. While Xtensa’s AXI manager interface generates an exclusive transaction, the subordinate’s interface is also expected to support exclusive access, i.e., AXI monitoring. Xilinx BRAM controller’s AXI subordinate interface does not support exclusive access, i.e., AXI monitoring: AXI Feature Adoption in Xilinx FPGAs.

Leveraging Xtensa AXI4 subordinate exclusive access

The Xtensa LX AXI subordinate interface supports exclusive access. One approach is to utilize this support and allocate locks in one of the core’s local data memories. Ensure that the number of external exclusive managers is configured, typically to the number of cores (figure 1).

Fig. 1: Xtensa LX AXI subordinate interface.

Note that the Xtensa NX AXI subordinate interface does not support exclusive access. For an Xtensa NX design, shared memory with AXI monitoring is required.

In figure 2, the AXI_crossbar#2 (block in green) routes core#0’s manager AXI access (blue connection) to both core’s local memories. Core#1’s manager AXI (yellow connection) can also access both core’s local memories. Locks can be allocated in either core’s local data memory.

In-bound access on subordinate interface

On inbound access, the Xtensa AXI subordinate interface expects a local memory address, i.e., an external entity needs to present the same address as the core would use to access local memory in its 4GB address space. AXI address remap IP (block in pink) translates the AXI system address to each core’s local address. For example, assuming locks are allocated in core#0’s local memory, core#1 generates an AXI exclusive to access a lock allocated in core#0’s local memory (yellow connection). AXI_crossbar#2 forwards transaction to M03_AXI port (green connection). AXI_address_remap#1 translates the AXI system address to the local memory address before presenting it to core#0’s AXI subordinate interface (pink connection).

It is possible to configure cores with disjoint local data memory addresses and avoid the need for an address remap IP block. But then it will be a heterogeneous multi-core design with a multi-image build. An address remap IP is required to keep things simple, i.e., a homogeneous multi-core with a single image build. A single image uses a single memory map. Therefore, both cores must have the same view of a lock, i.e., the lock’s AXI bus address must be the same for both.

Fig. 2: AXI access.

AXI ID width

Note Xtensa AXI manager interface ID width=4 bits. Xtensa’s AXI subordinate interface ID width=12 bits. So, you must configure AXI crossbar#2 and AXI address remap AXI ID width higher than 4. AXI IDs on a manager port are not globally defined; thus, an AXI crossbar with multiple manager ports will internally prefix the manager port index to the ID and provide this concatenated ID to the subordinate device. On return of the transaction to its manager port of origin, this ID prefix will be used to locate the manager port, and the prefix will be truncated. Therefore, the subordinate port ID is wider in bits than the manager port ID. Figure 3 shows the Xilinx crossbar IP AXI ID width configuration.

Fig. 3: Xilinx crossbar IP AXI ID width configuration.

Software tools support

Cadence tools provide a way to place locks at a specific location. For more details, please refer to Cadence’s “Linker Support Packages (LSP) Reference Manual” for Xtensa SDK. .xtos.lock(green) resides in core#0’s local memory and holds user-defined and C library locks. The lock segment memory attribute is defined as shared inner (cyan) so that L32EX and S32EX instructions generate an exclusive transaction on an AXI bus. See figure 4. The stack and per-core Xtos and C library contexts are allocated in local data memory (yellow).

…………..LSP memory map………….
BEGIN dram0
0x40000000: dataRam : dram0 : 0x8000 : writable ;
dram0_0 : C : 0x40000400 - 0x40007fff : STACK : .dram0.rodata .clib.percpu.data .rtos.percpu.data .dram0.data .clib.percpu.bss .rtos.percpu.bss .dram0.bss;
END dram0
…………………
BEGIN sysViewDataRam0
0xA0100000: system : sysViewDataRam0 : 0x8000 : writable, uncached, shared_inner;
lockRam_0 : C : 0xA0100000 - 0xA01003ff : .xtos.lock;
END sysViewDataRam0
…………..

Fig. 4: Place locks at a specific location.



Leave a Reply


(Note: This name will be displayed publicly)