RDPro : Distributed Processing of Big Raster Data: [Scalable Data Science]

Shang, Zhuocheng; Singla, Samriddhi; Eldawy, Ahmed; Scudiero, Elia

doi:10.14778/3712221.3712229

Citation Details

RDPro : Distributed Processing of Big Raster Data: [Scalable Data Science]

Advancements in remote sensing technology allowed for collecting vast amounts of satellite and aerial imagery with up to 1 cm pixel resolutions, stored in raster format crucial for various research fields. However, processing this data poses challenges, including resolving data dependencies when location, resolution, and coordinate systems do not align and managing large datasets within memory constraints. This paper introduces RDPro, a novel Spark-based system that efficiently processes and analyzes large raster datasets. RDPro features a new data model tailored for data dependencies in a distributed, shared-nothing environment, complete with tools for loading and writing raster data. It also optimizes core raster operations within Spark, allowing users to integrate complex data science workflows. Comparative analysis shows RDPro outperforms existing systems by up to two orders of magnitude. more »

Award ID(s):: 2046236

PAR ID:: 10611919

Author(s) / Creator(s):: Shang, Zhuocheng; Singla, Samriddhi; Eldawy, Ahmed; Scudiero, Elia

Publisher / Repository:: ACM Digital Library

Date Published:: 2024-11-01

Journal Name:: Proceedings of the VLDB Endowment

Volume:: 18

Issue:: 3

ISSN:: 2150-8097

Page Range / eLocation ID:: 613 to 622

Format(s):: Medium: X

Sponsoring Org:: National Science Foundation

Free Publicly Accessible Full Text
Accepted Manuscript
Journal Article:
https://doi.org/10.14778/3712221.3712229

More Like this