The ongoing trend of hardware specialization has led to a growing use of custom data formats when processing sparse workloads, which are typically memory-bound. These formats facilitate optimized software/hardware implementations by utilizing sparsity pattern- or target-aware data structures and layouts to enhance memory access latency and bandwidth utilization. However, existing sparse tensor programming models and compilers offer little or no support for productively customizing the sparse formats. Additionally, because these frameworks represent formats using a limited set of per-dimension attributes, they lack the flexibility to accommodate numerous new variations of custom sparse data structures and layouts. To overcome this deficiency, we propose UniSparse, an intermediate language that provides a unified abstraction for representing and customizing sparse formats. Unlike the existing attribute-based frameworks, UniSparse decouples the logical representation of the sparse tensor (i.e., the data structure) from its low-level memory layout, enabling the customization of both. As a result, a rich set of format customizations can be succinctly expressed in a small set of well-defined query, mutation, and layout primitives. We also develop a compiler leveraging the MLIR infrastructure, which supports adaptive customization of formats, and automatic code generation of format conversion and compute operations for heterogeneous architectures. We demonstrate the efficacy of our approach through experiments running commonly-used sparse linear algebra operations with specialized formats on multiple different hardware targets, including an Intel CPU, an NVIDIA GPU, an AMD Xilinx FPGA, and a simulated processing-in-memory (PIM) device.
more »
« less
Optimizing Layout of Recursive Datatypes with Marmoset: Or, Algorithms {+} Data Layouts {=} Efficient Programs
While programmers know that memory representation of data structures can have significant effects on performance, compiler support to optimize the layout of those structures is an under-explored field. Prior work has optimized the layout of individual, non-recursive structures without considering how collections of those objects in linked or recursive data structures are laid out. This work introduces Marmoset, a compiler that optimizes the layouts of algebraic datatypes, with a special focus on producing highly optimized, packed data layouts where recursive structures can be traversed with minimal pointer chasing. Marmoset performs an analysis of how a recursive ADT is used across functions to choose a global layout that promotes simple, strided access for that ADT in memory. It does so by building and solving a constraint system to minimize an abstract cost model, yielding a predicted efficient layout for the ADT. Marmoset then builds on top of Gibbon, a prior compiler for packed, mostly-serial representations, to synthesize optimized ADTs. We show experimentally that Marmoset is able to choose optimal layouts across a series of microbenchmarks and case studies, outperforming both Gibbon’s baseline approach, as well as MLton, a Standard ML compiler that uses traditional pointer-heavy representations.
more »
« less
- PAR ID:
- 10577817
- Editor(s):
- Aldrich, Jonathan; Salvaneschi, Guido
- Publisher / Repository:
- Schloss Dagstuhl – Leibniz-Zentrum für Informatik
- Date Published:
- Volume:
- 313
- ISSN:
- 1868-8969
- ISBN:
- 978-3-95977-341-6
- Page Range / eLocation ID:
- 313-313
- Subject(s) / Keyword(s):
- Tree traversals Compilers Data layout optimization Dense data layout Software and its engineering → Compilers Software and its engineering → Software performance Information systems → Data layout
- Format(s):
- Medium: X Size: 28 pages; 2405695 bytes Other: application/pdf
- Size(s):
- 28 pages 2405695 bytes
- Right(s):
- Creative Commons Attribution 4.0 International license; info:eu-repo/semantics/openAccess
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
null (Ed.)In modern runtime systems, memory layout calculations are hand-coded in systems languages. Primitives in these languages are not powerful enough to describe a rich set of layouts, leading to reliance on ad-hoc macros, numerous interrelated static constants, and other boilerplate code. Memory management policies must also carefully orchestrate their application of address calculations in order to modify memory cooperatively, a task ill-suited to low-level systems languages at hand which lack proper safety mechanisms. In this paper we introduce Floorplan, a declarative language for specifying high level memory layouts. Constraints formerly implemented by describing how to compute locations are, in Floorplan, defined declaratively using explicit layout constructs. The challenge here was to discover constructs capable of sufficiently enabling the automatic generation of address calculations. Floorplan is implemented as a compiler for generating a Rust library. In a case study of an existing implementation of the immix garbage collection algorithm, Floorplan eliminates 55 out of the 63 unsafe lines of code: 100% of unsafe lines pertaining to memory safety.more » « less
-
The inspector/executor paradigm permits using runtime information in concert with compiler optimization. An inspector collects information that is only available at runtime; this information is used by an optimized executor that was created at compile time. Inspectors are widely used in optimizing irregular computations, where information about data dependences, loop bounds, data structures, and memory access pa erns are collected at runtime and used to guide code transformation, parallelization, and data layout. Most research that uses inspectors relies on instantiating inspector templates, invoking inspector library code, or manually writing inspectors. is paper describes abstractions for generating inspectors for loop and data transformations for sparse matrix computations using the Sparse Polyhedral Framework (SPF). SPF is an extension of the polyhedral framework for transformation and code generation. SPF extends the polyhedral framework to represent runtime information with uninterpreted functions and inspector computations that explicitly realize such functions at runtime. It has previously been used to derive inspectors for data and iteration space reordering. is paper introduces data transformations into SPF, such as conversions between sparse matrix formats, and show how prior work can be supported by SPF. We also discuss possible extensions to support inspector composition and incorporate other optimizations. is work represents a step towards creating composable inspectors in keeping with the composability of a ne transformations on the executors.more » « less
-
Data-intensive analytical applications need to support both efficient reads and writes. However, what is usually a good data layout for an update-heavy workload, is not well-suited for a read-mostly one and vice versa. Modern analytical data systems rely on columnar layouts and employ delta stores to inject new data and updates. We show that for hybrid workloads we can achieve close to one order of magnitude better performance by tailoring the column layout design to the data and query workload. Our approach navigates the possible design space of the physical layout: it organizes each column’s data by determining the number of partitions, their corresponding sizes and ranges, and the amount of buffer space and how it is allocated. We frame these design decisions as an optimization problem that, given workload knowledge and performance requirements, provides an optimal physical layout for the workload at hand. To evaluate this work, we build an in-memory storage engine, Casper, and we show that it outperforms state-of-the-art data layouts of analytical systems for hybrid workloads. Casper delivers up to 2.32x higher throughput for update-intensive workloads and up to 2.14x higher throughput for hybrid workloads. We further show how to make data layout decisions robust to workload variation by carefully selecting the input of the optimization.more » « less
-
Young children tend to prioritize objects over layouts in their drawings, often juxtaposing “floating” objects in the picture plane instead of grounding those objects in drawn representations of the extended layout. In the present study, we explore whether implicitly directing children’s attention to elements of the extended layout through a drawing’s communicative goal—to indicate the location of a hidden target to someone else—might lead children to draw more layout information. By comparing children’s drawings to a different group of children’s verbal descriptions, moreover, we explore how communicative medium affects children’s inclusion of layout and object information. If attention modulates children’s symbolic communication about layouts and objects, then children should both draw and talk about layouts and objects when they are relevant to the communicative task. If there are challenges or advantages specific to either medium, then children might treat layouts and objects differently when drawing versus describing them. We find evidence for both of these possibilities: Attention affects what children include in symbolic communication, like drawings and language, but children are more concise in their inclusion of relevant layout or object information in language versus drawings.more » « less