Workload Trace Generation

The intended audience for this document is a performance engineer preparing traces for performance prediction on new Arm hardware. It is expected that the reader understands the basic concepts of performance engineering such as: • Workload characterization • Identifying key aspects of a workload • Understanding how Performance Monitor Unit (PMU) events correlate to a workload Click her... » read more

ETH Zurich: PIM (Processing In Memory) Architecture, UPMEM & PrIM Benchmarks

New paper technical titled "Benchmarking a New Paradigm: An Experimental Analysis of a Real Processing-in-Memory Architecture" led by researchers at ETH Zurich. Researchers provide a comprehensive analysis of the first publicly-available real-world PIM architecture, UPMEM, and introduce PrIM (Processing-In-Memory benchmarks), a benchmark suite of 16 workloads from different application domai... » read more

Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware

Abstract "Many modern workloads such as neural network inference and graph processing are fundamentally memory-bound. For such workloads, data movement between memory and CPU cores imposes a significant overhead in terms of both latency and energy. A major reason is that this communication happens through a narrow bus with high latency and limited bandwidth, and the low data reuse in memory-bo... » read more

SparseP: Towards Efficient Sparse Matrix Vector Multiplication on Real Processing-In-Memory Systems

Abstract "Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures. Near-bank PIM architectures place simple cores close to DRAM banks and can yield significant performance and energy improvements in parallel applications by alleviating data access costs. Real PIM systems can provide high levels of parallelism, large aggregate memory bandwi... » read more