A programmable near-memory accelerator that rearranges sparse data into dense. The device leads to significant reductions in data movement and dynamic energy.
Many applications employ irregular and sparse memory accesses that cannot take advantage of existing cache hierarchies in high performance processors. To solve this problem, Data Layout Transformation (DLT) techniques rearrange sparse data into a dense representation, improving locality and cache utilization. However, prior proposals in this space fail to provide a design that (i) scales with multi-core systems, (ii) hides rearrangement latency, and (iii) provides the necessary interfaces to ease programmability.
In this work we present PLANAR, a programmable near-memory accelerator that rearranges sparse data into dense. By placing PLANAR devices at the memory controller level we enable a design that scales well with multi-core systems, hides operation latency by performing non-blocking fine-grain data rearrangements, and eases programmability by supporting virtual memory and conventional memory allocation mechanisms. Our evaluation shows that PLANAR leads to significant reductions in data movement and dynamic energy, providing an average 4.58× speedup.
From:
ICS ’21: Proceedings of the ACM International Conference on Supercomputing
June 2021, 506 pages
ISBN:
9781450383356
DOI:
10.1145/3447818
Contributors
Adrián Barredo
Barcelona Supercomputing Center & Universitat Politècnica de Catalunya
Adrià Armejach
Barcelona Supercomputing Center & Universitat Politècnica de Catalunya
Jonathan Beard
Arm Research
Miquel Moretó
Barcelona Supercomputing Center & Universitat Politècnica de Catalunya
Click here to read more.
Leave a Reply