In recent times, geospatial datasets are growing in terms of size, complexity and heterogeneity. High performance systems are needed to analyze such data to produce actionable insights in an efficient manner. For polygonal a.k.a vector datasets, operations such as I/O, data partitioning, communication, and load balancing becomes challenging in a cluster environment. In this work, we present MPI-Vector-IO, a parallel I/O library that we have designed using MPI-IO specifically for partitioning and reading irregular vector data formats such as Well Known Text. It makes MPI aware of spatial data, spatial primitives and provides support for spatial data types embedded within collective computation and communication using MPI message-passing library. These abstractions along with parallel I/O support are useful for parallel Geographic Information System (GIS) application development on HPC platforms. Performance evaluation is done on Lustre and GPFS filesystems. MPI-Vector-IO scales well with MPI processes and file size and achieves bandwidth up to 22 GB/s for common spatial data access patterns. We observed that independent file read functions performed better than collective functions in MPI-IO for contiguous access pattern on Lustre. In general, the I/O is improved by one to two orders of magnitude over real-world datasets using up to 1152 CPU cores. Spatial Join query is used as an exemplar to demonstrate an end-to-end application using MPI-Vector-IO.
more »
« less
Optimizing I/O Performance of Parallel Iterated Relational Algebra Applications
Fixed-point iteration over parallel relational algebra (RA) operators underpins a broad class of symbolic AI and recursive analytic workloads, including graph reachability, points-to analysis, and rule-based inference. At scale, each iteration materializes substantial intermediate and output relations across distributed memory, causing persistence I/O to become a dominant contributor to end-to-end cost. This I/O is inherently write-dominated: every iteration emits a delta of newly derived tuples, while the runtime must periodically checkpoint accumulated relation state to stable storage for fault tolerance and out-of-core execution. This paper presents a case study in optimizing checkpoints for parallel iterated relational algebra, comparing an existing relation-dumping pipeline against a structured, performance-tuned alternative. The baseline stores relations as opaque binary dumps via POSIX positioned writes or collective MPIIO with minimal file-system hinting.We extend this pipeline with a parallel HDF5 backend that stores relations as schema-aware datasets and writes rank partitions using HDF5 hyperslabs with collective transfers. To avoid the performance cliffs associated with default parallel HDF5 settings, we apply a systematic size-aware tuning policy that configures both HDF5 property lists and MPI-IO hints. The optimized pathway uses a two-phase workflow that creates the file and dataset first and then performs the parallel write. It also applies output-regime-dependent alignment and buffering, configures metadata caching and sieve buffering, selects an appropriate dataset layout, and enforces collective dataset transfers. Across final write sizes from 15 GB to 30 GB, the tuned HDF5 path consistently outperforms the MPI-IO baselines. At higher MPI process counts, it is often 35%-58% faster while producing self-describing outputs.
more »
« less
- Award ID(s):
- 2316158
- PAR ID:
- 10680836
- Publisher / Repository:
- ACM
- Date Published:
- ISBN:
- 9798400726040
- Page Range / eLocation ID:
- 50 to 58
- Format(s):
- Medium: X
- Location:
- McEwan Hall/The University of Edinburgh Edinburgh Scotland UK
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Array management libraries, such as HDF5, Zarr, etc., depend on a complex software stack that consists of parallel I/O middleware (MPI-IO), POSIX-IO, and file systems. Components in the stack are interdependent, such that effort in tuning the parameters in these software libraries for optimal performance is non-trivial. On the other hand, it is challenging to choose an array management library based on the array configuration and access patterns. In this poster, we investigate the performance aspect of two array management libraries, i.e., HDF5 and Zarr, in the context of a neuroscience use case. We highlight the performance variability of HDF5 and Zarr in our preliminary results and discuss potential optimization strategies.more » « less
-
Many scientific applications operate on data sets that span hundreds of Gigabytes or even Terabytes in size. Large data sets often use compression to reduce the size of the files. Yet as of today, parallel I/O libraries do not support reading and writing compressed files, necessitating either expensive sequential compression/decompression operations before/after the simulation, or omitting advanced features of parallel I/O libraries, such as collective I/O operations. This paper introduces parallel I/O on compressed data files, discusses the key challenges, requirements, and solutions for supporting compressed data files in MPI I/O, as well as limitations on some MPI I/O operations when using compressed data files. The paper details handling of individual read and write operations of compressed data files, and presents an extension to the two-phase collective I/O algorithm to support data compression. The paper further presents and evaluates an implementation based on the Snappy compression library and the OMPIO parallel I/O framework. The performance evaluation using multiple data sets demonstrate significant performance benefits when using data compression on a parallel BeeGFS file system.more » « less
-
This dataset contains a compressed folder of the data and MATLAB scripts used produce relevant figures and candidates for GRITCLEAN: A glitch veto scheme for Gravitational wave data as presented in https://arxiv.org/abs/2401.15237 The codes in this dataset include: A PSO-based matched filtering search pipeline which can be run on either the positive or the negative chirp time space. A standalone MATLAB script called GRITCLEAN.m which can run the GRITCLEAN hierarchical vetoes on a set of positive and negative chirp time space estimated parameters. A plotting script to generate relevant figures. The files in this dataset include: GVSsegPSDtrainidxs.mat, a binary MATLAB file containing training indices for all segments from which the Power Spectral Densities (PSDs) are estimated, this is done via the scripts provided, namely, getsegPSD.m and createPSD.m. A sample HDF5 file used (H-H1_GWOSC_O3a_4KHZ_R1-1243394048-4096.hdf5) JSON files containing information about the data segments and the strain data files from which they originate from. Text files containing the parameters estimated by the PSO-based pipeline across the positive and negative chirp time space runs. Detailed instructions on dependencies, downloading the dataset and running the codes are given in a README.txt file included with this dataset. The user is recommended to go through this file first. The scripts enclosed have dependencies on JSONLAB , the Parallel Computing Toolbox and Signal Processing Toolbox for MATLAB, along with additional scripts provided in GitHub repositories Accelerated-Network-Analysis and SDMBIGDAT19 . Instructions on installing these dependencies are provided in README.txt. All codes have been developed and tested on MATLAB R2022 and R2023.more » « less
-
Motivated by the significantly higher cost of writing than reading in emerging memory technologies, we consider parallel algorithm design under such asymmetric read-write costs, with the goal of reducing the number of writes while preserving work-efficiency and low span. We present a nested-parallel model of computation that combines (i) small per-task stack-allocated memories with symmetric read-write costs and (ii) an unbounded heap-allocated shared memory with asymmetric read-write costs, and show how the costs in the model map efficiently onto a more concrete machine model under a work-stealing scheduler. We use the new model to design reduced write, work-efficient, low span parallel algorithms for a number of fundamental problems such as reduce, list contraction, tree contraction, breadth-first search, ordered filter, and planar convex hull. For the latter two problems, our algorithms are output-sensitive in that the work and number of writes decrease with the output size. We also present a reduced write, low span minimum spanning tree algorithm that is nearly work-efficient (off by the inverse Ackermann function). Our algorithms reveal several interesting techniques for significantly reducing shared memory writes in parallel algorithms without asymptotically increasing the number of shared memory reads.more » « less
An official website of the United States government

