Sharing data efficiently between CPU cores, accelerators, and other components.
By Shivi Arora and Sue Hung Fung
As computing demands for HPC, AI/ML, and cloud infrastructure grow, modular architectures are replacing traditional monolithic System-on-Chip (SoC) designs. These legacy designs are increasingly expensive and difficult to scale due to ever-increasing silicon complexity.
In response, the industry is embracing chiplet-based System-in-Package (SiP) solutions, which reduce production costs, enhance yield, and enable flexible system integration. But multi-chiplet architectures need a standard coherent interconnect from the processor (the compute element) to the accelerator, and full cache coherency is required between CPU cores, accelerators, and other components to share data efficiently without redundant memory copies.
Data centers and cloud computing need multi-core and multi-socket scalability. And while the basic building blocks can be built with chiplets, accelerators, CPUs, they also require a high bandwidth link between host and accelerator to enable seamless data transfer.
While historically many of these connections were proprietary, there is a big push underway to create a common platform for how chiplets communicate with each other and with other components such as memories and I/Os to connect to other systems and the outside world. These efforts include everything from common fabrics to mini-consortia targeting specific domains.
The goal in all cases is faster time to market with predictable outcomes, while still maximizing performance and reducing power. That requires a standardized infrastructure. Arm’s Chiplet System Architecture (CSA), for example, delineates chiplet types and definitions, allowing two or more chiplets to be integrated into a package using the company’s Neoverse Compute Subsystems (CSS) platform.
This is the kind of approach that will be needed to create a chiplet marketplace, and Arm has a long history of extensively characterizing its IP cores and other components. By leveraging that infrastructure, multiple Arm compute chiplets can be integrated in a package, allowing for modular expansion of compute multi-core counts on a larger compute platform.
The final piece of the puzzle is Arm’s AMBA CHI C2C, a cache coherency standard that facilitates coherent communication between an Arm Neoverse compute die and an accelerator. That ensures high bandwidth and optimized data movement. AMBA CHI C2C facilitates efficient link aggregation to achieve this high performance using PCIe or UCIe transport layers. Integration with PCIe or CXL infrastructure can be used for control path communication. AMBA CHI C2C enables optimized data flow with load balancing across multiple links. It has relaxed ordering rules for improved efficiency, and extended system capabilities including MPAM (Memory Partitioning and Monitoring) for accelerators.
Alphawave Semi leverages all of these standards for its compute-based chiplet architecture to achieve low-latency, high-bandwidth communication to the accelerator. By implementing CSA, Alphawave Semi can map different system components based on interface types, ensuring that all chiplets communicate efficiently.
Alphawave Semi supports standardized and proprietary interfaces on its SiP architectures. These include widely used UCIe, PCIe, and SerDes. Chiplet expansion with AMBA CHI C2C allows efficient interconnects between the compute and accelerator components. By leveraging the interconnect technology of AMBA CHI C2C, the protocol packetization and data link layers remain efficient while utilizing this transport mechanism. The Alphawave Semi compute chiplet can then operate cohesively with the accelerator while achieving high-performance scalability.
Using an Arm-approved system architecture ensures reliable communication from the host to accelerator. System developers can continue to use a familiar existing software framework such as Arm’s system software, reducing the need for extensive rewrites, and ensuring minimal disruption when adapting software for new hardware architecture. And software teams will be able to use operating systems and software framework optimized for the Arm architecture, which can run with minimal modification. This creates an ideal foundation for next-generation computing architectures in AI, cloud, and HPC environments.
Sue Hung Fung is a principal product marketing manager at Alphawave Semiconductor.
Leave a Reply