Beyond The Demo: Deploying And Evaluating Open-Source AI Workloads


As more open-source AI models move closer to real-world adoption, developers are changing how they evaluate edge deployment. The question is no longer simply whether a model can run, but whether it can be deployed reproducibly on a concrete platform, observed in practice, and turned into meaningful deployment decisions based on actual technical evidence. For developers, the CIX Armv9 platfor... » read more

Introducing “The Architecture Speaks”


What are specifications used for? How do you use them? Are they intelligible? These questions are at the heart of the project that produces a new tool called "The Architecture Speaks". This is an experimental chatbot tool built on generative AI that aims to provide quick answers to complex questions about the Arm architecture. It also provides links to the Arm Architecture Reference Manual. Th... » read more

Rethinking Robotics Reinforcement Learning: A Practical Humanoid Training Workflow


Reinforcement learning (RL) for robotics is often associated with large GPU clusters, distributed infrastructure, and x86-based development environments. Training a humanoid robot with high-fidelity simulation is a resource-intensive workflow that runs in the data center. What if that workflow could run on a single workstation? In this blog post, we explore a complete robotics pipeline bu... » read more

Rethinking Voice AI At The Edge: A Practical Offline Pipeline


Cloud-based AI dominates the headlines, but responsive and private interaction lies at the edge. This blog post shows how to build a fully offline, real-time voice assistant using the Arm-based NVIDIA DGX Spark platform. The system integrates open-source components such as faster-whisper and vLLM. It delivers low-latency, human-like dialogue without sending data outside the local environment. ... » read more

Scaling llama.cpp On Neoverse N2: Solving Cross-NUMA Performance Issues


This blog post explains the cross-NUMA memory access issue that occurs when you run llama.cpp in Neoverse. It also introduces a proof-of-concept patch that addresses this issue and can provide up to a 55% performance increase for text generation when you run the llama3_Q4_0 model on the ZhuFeng Neoverse system. Cross-NUMA memory access problem In llama.cpp, performance drops when the number o... » read more

Smarter Write Barriers For Arm64 In .NET CoreCLR


Last year, I explored how you can use the Arm Scalable Vector Extension (SVE) in .NET to unlock SIMD performance at scale. This year, my focus has shifted to something less visible but just as fundamental to runtime performance. Write barriers in the CoreCLR garbage collector (GC). Write barriers are not a feature most .NET developers ever think about. They do not change how you write C# cod... » read more

Rethinking The Role Of CPUs In AI: A Practical RAG Implementation


In many enterprise environments, engineers and technical staff need to find information quickly. They search internal documents such as hardware specifications, project manuals, and technical notes. These materials are often scattered, making traditional search inefficient. These documents are often confidential or proprietary. This constraint prevents these documents from being processed by... » read more

Future Architecture Technologies: POE2 And vMTE


Future Architecture Technologies are features being developed for currently unreleased versions of the Arm architecture. Arm provides the ecosystem with relevant information and specifications in advance, ensuring software support for when new technologies are realized in hardware. This blog introduces two future technologies: Permission Overlay Extension version 2 (POE2), and Virtual T... » read more

Integrated Modular Firmware Solutions: A Vital Component Of Custom Silicon Chiplet Architecture Designs


By Marc Meunier and Srini Narayana The shift from monolithic SoC designs to chiplet-based architecture isn’t just a packaging innovation. It’s a fundamental rethinking of how custom silicon is designed, manufactured, and deployed. This transition is driven by the growing impracticality of scaling large monolithic dies at advanced nodes. As die sizes increase, so do the costs, yield ri... » read more

How Neural Super Sampling Works: Architecture, Training, And Inference


This blog post is the second in our Neural Super Sampling (NSS) series. The post explores why we introduced NSS and explains its architecture, training, and inference components. In August 2025, we announced Arm neural technology that will ship in Arm GPUs in 2026. The first use case of the technology is Neural Super Sampling (NSS). NSS is a next-generation, AI-powered upscaling solution. ... » read more

← Older posts