Neon Intrinsics In Rust

Key tips for porting Neon code to Rust and differences from C.

popularity

At the end of 2021, the Neon intrinsics in Rust were completed and the community proposed stabilizing them (not requiring a nightly compiler). The implementation of the Neon intrinsics was a large effort mostly undertaken by the Rust community so Arm would like to thank everyone involved in that.

At the time of writing, all the Neon intrinsics that are Armv8.0-A are implemented and are stabilized, additionally the intrinsics that are in FEAT_RDM are also stable. The AES, SHA1 and SHA2 intrinsics have been proposed and agreed to be stabilized. In a few months’ time they will also be available in the stable compiler. Except for any intrinsics that work with bfloat or f16 due to a lack of support for those types in Rust. Some intrinsics that are part of other target features are implemented but are not yet stable.

Unsafe

Each of the intrinsics is an unsafe function so writing code with the intrinsics requires unsafe. Therefore, you as the programmer have to ensure that certain invariants are met. The reason the intrinsics are marked as unsafe is because they require a target feature to be set. The compiler is not able to verify that the hardware the binary is run on has that feature. Therefore it is up to you to put any code using the intrinsics behind a runtime feature detection check.

Having said that, some targets do require that certain features exist. For example, aarch64-unknown-linux-gnu requires that Neon is present therefore a runtime feature detection check is not required if you are only using Neon on that target, and the Neon feature is set at a global level so you do not have to annotate the function as such.

When writing C calling the intrinsic would require that the feature is explicitly enabled somewhere by you, in Rust this is not the case. As the target feature is on the function, the codegen emits the instruction leading to illegal instruction exceptions at runtime. Another side effect of this is inlining. As the target feature of the caller would not match the callee, it will not be inlined. Anything that uses a feature that is not enabled at a global level should be in a function that has the feature.

For example:

unsafe fn test() -> int32x4_t  {
    if is_aarch64_feature_detected!("rdm") {
        let res = vqrdmlah_s32(a, b, c);
        vqrdmlah_s32(a, res, c)
    } else {
        todo!("Fallback implementation");
    }
}

should be avoided, and be written as:

#[target_feature(enable = "rdm")]
unsafe fn impl_using_rdm(a: int32x4_t, b: int32x4_t, c: int32x4_t) -> int32x4_t  {
    let res = vqrdmlah_s32(a, b, c);
    vqrdmlah_s32(a, res, c)
}

unsafe fn test() -> int32x4_t  {
    if is_aarch64_feature_detected!("rdm") {
        impl_using_rdm(a, b, c)
    } else {
        todo!("Fallback implementation");
    }
}

The first would end up with 2 bl instructions inserted to a vqrdmlah_s32 function that contains the sqrdmlah instruction whereas the latter would end up with a single bl to the impl_using_rdm and that would contain 2 sqrdmlah instructions.

Porting C code to Rust

With the Neon intrinsics, the ACLE naming conventions have been followed, therefore, porting existing Neon code to Rust should be simple, and most documentation should be fine. As part of the testing for the Neon intrinsics, they are compared against the C implementation (using clang). As a result of that, any code ported from C to Rust should give the same outputs.

Generics for immediate

One difference between the Rust implementation of the ACLE for Neon is that any value that is used as an immediate is represented as a const generic. This means that while some signatures for intrinsics differ between C and Rust, the restrictions and the checking however remain the same. For example, calling the vrshr_n_u8 intrinsic would be different, the constraints are the same and will be checked but there would be a syntax difference in the calling of it.

uint8x8_t res = vrshr_n_u8(a, 4);

Whereas in rust it would be:

let res = vrshr_n_u8::<4>(a);

SVE and beyond

Now that Neon is in a fairly complete state, we would like to see scalable vectors supported in Rust. We have raised an RFC that describes the parts needed to enable this, and are waiting community acceptance for this proposal. The RFC would enable us to support SVE and would be the beginning of support for additional scalable architecture features such as SME.

Although it is still early in the process for this, the hope with the RFC is that it would allow us to follow the same conventions as we did for Neon. This should also allow for existing documentation on SVE to be correct for Rust.


Tags:

Leave a Reply


(Note: This name will be displayed publicly)