skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Scalable Overlay Operations over DCEL Polygon Layers
ABSTRACT The Doubly Connected Edge List (DCEL) is an edge-list structure that has been widely utilized in spatial applications for planar topological computations. An important operation is the overlay which combines the DCELs of two input layers and can easily support spatial queries like the intersection, union and difference between these layers. However, existing sequential implementations for computing the overlay do not scale and fail to complete for large datasets (for example the US census tracks). In this paper we propose a distributed and scalable way to compute the overlay operation and its related supported queries. We address the issues involved in efficiently distributing the overlay operator and over various optimizations that improve performance. Our scalable solution can compute the overlay of very large real datasets (32M edges) in few minutes.  more » « less
Award ID(s):
1831615 1838222 2237348
PAR ID:
10476568
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
International Symposium on Spatial and Temporal Data
ISBN:
9798400708992
Page Range / eLocation ID:
85 to 95
Subject(s) / Keyword(s):
Spatial data structures overlay operator DCEL
Format(s):
Medium: X
Location:
Calgary AB Canada
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract The Doubly Connected Edge List (DCEL) is an edge-list structure widely used in spatial applications, primarily for planar topological and geometric computations. However, it is also applicable to various types of data, including 3D models and geographic data. An essential operation is theoverlay operation, which combines the DCELs of two input polygon layers and can easily support spatial queries on polygons like the intersection, union, and difference between these layers. However, existing techniques for spatial overlay operations suffer from two main limitations. First, they fail to handle many large datasets practically used in real applications. Second, they cannot handle arbitrary spatial lines that practically form polygons, e.g., city blocks, but they are given as a set of scattered lines. This work proposes a distributed and scalable way to compute the overlay operation and its related supported queries. Our operations also support arbitrary spatial lines through a scalable polygonization process. We address the issues of efficiently distributing the lines and overlay operators and offer various optimizations that improve performance. Our experiments demonstrate that the proposed scalable solution can efficiently compute the overlay of large real datasets. 
    more » « less
  2. This paper demonstrates Pynapple-G, an open-source library for scalable spatial grouping queries based on Apache Sedona (formerly known as GeoSpark). We demonstrate two modules, namely, SGPAC and DDCEL, that support grouping points, grouping lines, and polygon overlays. The SGPAC module provides a large-scale grouping of spatial points by highly complex polygon boundaries. The grouping results aggregate the number of spatial points within the boundaries of each polygon. The DDCEL module provides the first parallelized algorithm to group spatial lines into a DCEL data structure and discovers planar polygons from scattered line segments. Exploiting the scalable DCEL, we support scalable overlay operations over multiple polygon layers to compute the layers’ intersection, union, or difference. To showcase Pyneapple-G, we have developed a frontend web application that enables users to interact with these modules, select their data layers or data points, and view results on an interactive map. We also provide interactive notebooks demonstrating the superiority and simplicity of Pyneapple-G to help social scientists and developers explore its full potential. 
    more » « less
  3. This paper demonstratesPynapple-G, an open-source library for scalable spatial grouping queries based on Apache Sedona (formerly known as GeoSpark). We demonstrate two modules, namely,SGPACandDDCEL, that support grouping points, grouping lines, and polygon overlays. TheSGPACmodule provides a large-scale grouping of spatial points by highly complex polygon boundaries. The grouping results aggregate the number of spatial points within the boundaries of each polygon. TheDDCELmodule provides the first parallelized algorithm to group spatial lines into a DCEL data structure and discovers planar polygons from scattered line segments. Exploiting the scalable DCEL, we support scalable overlay operations over multiple polygon layers to compute the layers' intersection, union, or difference. To showcasePyneapple-G, we have developed a frontend web application that enables users to interact with these modules, select their data layers or data points, and view results on an interactive map. We also provide interactive notebooks demonstrating the superiority and simplicity ofPyneapple-Gto help social scientists and developers explore its full potential. 
    more » « less
  4. null (Ed.)
    Many Internet of Things (IoT) applications are time-critical and dynamically changing. However, traditional data processing systems (e.g., stream processing systems, cloud-based IoT data processing systems, wide-area data analytics systems) are not well-suited for these IoT applications. These systems often do not scale well with a large number of concurrently running IoT applications, do not support low-latency processing under limited computing resources, and do not adapt to the level of heterogeneity and dynamicity commonly present at edge environments. This suggests a need for a new edge stream processing system that advances the stream processing paradigm to achieve efficiency and flexibility under the constraints presented by edge computing architectures. We present \textsc{Dart}, a scalable and adaptive edge stream processing engine that enables fast processing of a large number of concurrent running IoT applications’ queries in dynamic edge environments. The novelty of our work is the introduction of a dynamic dataflow abstraction by leveraging distributed hash table (DHT) based peer-to-peer (P2P) overlay networks, which can automatically place, chain, and scale stream operators to reduce query latency, adapt to edge dynamics, and recover from failures. We show analytically and empirically that DART outperforms Storm and EdgeWise on query latency and significantly improves scalability and adaptability when processing a large number of real-world IoT stream applications' queries. DART significantly reduces application deployment setup times, becoming the first streaming engine to support DevOps for IoT applications on edge platforms. 
    more » « less
  5. Due to the developments of topographic techniques, clear satellite imagery, and various means for collecting information, geospatial datasets are growing in volume, complexity, and heterogeneity. For efficient execution of spatial computations and analytics on large spatial data sets, parallel processing is required. To exploit fine-grained parallel processing in large scale compute clusters, partitioning in a load-balanced way is necessary for skewed datasets. In this work, we focus on spatial join operation where the inputs are two layers of geospatial data. Our partitioning method for spatial join uses Adaptive Partitioning (ADP) technique, which is based on Quadtree partitioning. Unlike existing partitioning techniques, ADP partitions the spatial join workload instead of partitioning the individual datasets separately to provide better load-balancing. Based on our experimental evaluation, ADP partitions spatial data in a more balanced way than Quadtree partitioning and Uniform grid partitioning. ADP uses an output-sensitive duplication avoidance technique which minimizes duplication of geometries that are not part of spatial join output. In a distributed memory environment, this technique can reduce data communication and storage requirements compared to traditional methods. To improve the performance of ADP, an MPI+Threads based parallelization is presented. With ParADP, a pair of real world datasets, one with 717 million polylines and another with 10 million polygons, is partitioned into 65,536 grid cells within 7 seconds. ParADP performs well with both good weak scaling up to 4,032 CPU cores and good strong scaling up to 4,032 CPU cores. 
    more » « less