skip to main content


Search for: All records

Award ID contains: 2237348

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The widespread of geotagged data combined with modern map services allows for the accurate attachment of data to spatial networks. Applying statistical analysis, such as hotspot detection, over spatial networks is very important for precise quantification and patterns analysis, which empowers effective decision-making in various important applications. Existing hotspot detection algorithms on spatial networks either lack statistical evidence on detected hotspots, such as clustering, or they provide statistical evidence at a prohibitive computational overhead. In this paper, we propose efficient algorithms for detecting hotspots based on the network local K-function for predefined and unknown hotspot radii. The network local K-function is a widely adopted statistical approach for network pattern analysis that enables the understanding of the density and distribution of activities and events in the spatial network. However, its practical application has been limited due to the inefficiency of existing algorithms, particularly for large-sized networks. Extensive experimental evaluation using real and synthetic datasets shows that our algorithms are up to 28 times faster than the state-of-the-art algorithms in computing hotspots with a predefined radius and up to more than four orders of magnitude faster in identifying hotspots without a predefined radius. 
    more » « less
    Free, publicly-accessible full text available November 13, 2024
  2. The process of regionalization involves clustering a set of spatial areas into spatially contiguous regions. Given the NP-hard nature of regionalization problems, all existing algorithms yield approximate solutions. To ascertain the quality of these approximations, it is crucial for domain experts to obtain statistically significant evidence on optimizing the objective function, in comparison to a random reference distribution derived from all potential sample solutions. In this paper, we propose a novel spatial regionalization problem, denoted as SISR (Statistical Inference for Spatial Regionalization), which generates random sample solutions with a predetermined region cardinality. The driving motivation behind SISR is to conduct statistical inference on any given regionalization scheme. To address SISR, we present a parallel technique named PRRP (P-Regionalization through Recursive Partitioning). PRRP operates over three phases: the region-growing phase constructs initial regions with a predetermined region cardinality, while the region merging and region-splitting phases ensure the spatial contiguity of unassigned areas, allowing for the growth of subsequent regions with predetermined cardinalities. An extensive evaluation shows the effectiveness of PRRP using various real datasets. 
    more » « less
    Free, publicly-accessible full text available November 13, 2024
  3. This paper studies the spatial group-by query over complex polygons. Given a set of spatial points and a set of polygons, the spatial group-by query returns the number of points that lie within the boundaries of each polygon. Groups are selected from a set of non-overlapping complex polygons, typically in the order of thousands, while the input is a large-scale dataset that contains hundreds of millions or even billions of spatial points. This problem is challenging because real polygons (like counties, cities, postal codes, voting regions, etc.) are described by very complex boundaries. We propose a highly-parallelized query processing framework to efficiently compute the spatial group-by query on highly skewed spatial data. We also propose an effective query optimizer that adaptively assigns the appropriate processing scheme based on the query polygons. Our experimental evaluation with real data and queries has shown significant superiority over all existing techniques. 
    more » « less
    Free, publicly-accessible full text available October 1, 2024
  4. Regionalization techniques group spatial areas into a set of homogeneous regions to analyze and draw conclusions about spatial phenomena. A recent regionalization problem, called MP-regions, groups spatial areas to produce a maximum number of regions by enforcing a user-defined constraint at the regional level. The MP-regions problem is NP-hard. Existing approximate algorithms for MP-regions do not scale for large datasets due to their high computational cost and inherently centralized approaches to process data. This article introduces a parallel scalable regionalization framework (PAGE) to support MP-regions on large datasets. The proposed framework works in two stages. The first stage finds an initial solution through randomized search, and the second stage improves this solution through efficient heuristic search. To build an initial solution efficiently, we extend traditional spatial partitioning techniques to enable parallelized region building without violating the spatial constraints. Furthermore, we optimize the region building efficiency and quality by tuning the randomized area selection to trade off runtime with region homogeneity. The experimental evaluation shows the superiority of our framework to support an order of magnitude larger datasets efficiently compared to the state-of-the-art techniques while producing high-quality solutions.

     
    more » « less
    Free, publicly-accessible full text available September 30, 2024
  5. ABSTRACT The Doubly Connected Edge List (DCEL) is an edge-list structure that has been widely utilized in spatial applications for planar topological computations. An important operation is the overlay which combines the DCELs of two input layers and can easily support spatial queries like the intersection, union and difference between these layers. However, existing sequential implementations for computing the overlay do not scale and fail to complete for large datasets (for example the US census tracks). In this paper we propose a distributed and scalable way to compute the overlay operation and its related supported queries. We address the issues involved in efficiently distributing the overlay operator and over various optimizations that improve performance. Our scalable solution can compute the overlay of very large real datasets (32M edges) in few minutes. 
    more » « less
    Free, publicly-accessible full text available August 23, 2024
  6. Spatial regionalization is the process of combining a collection of spatial polygons into contiguous regions that satisfy user-defined criteria and objectives. Numerous techniques for spatial regionalization have been proposed in the literature, which employs varying methods for region growing, seeding, optimization, and enforce different user-defined constraints and objectives. This paper introduces a scalable unified system for addressing seeding spatial regionalization queries efficiently. The proposed system provides a usable and scalable framework that employs a wide-range of existing spatial regionalization techniques and allows users to submit novel combinations of queries that have not been previously explored. This represents a significant step forward in the field of spatial regionalization as it provides a robust platform for addressing different regionalization queries. The system is mainly composed of three components: query parser, query planner, and query executor. Preliminary evaluations of the system demonstrate its efficacy in efficiently addressing various regionalization queries. 
    more » « less
    Free, publicly-accessible full text available August 23, 2024
  7. Abstract—The Doubly Connected Edge List (DCEL) is a popular data structure for representing planar subdivisions and is used to accelerate spatial applications like map overlay, graph simplification, and subdivision traversal. Current DCEL imple- mentations assume a standalone machine environment, which does not scale when processing the large dataset sizes that abound in today’s spatial applications. This paper proposes a Distributed Doubly Connected Edge List (DDCEL) data structure extending the DCEL to a distributed environment. The DDCEL constructor undergoes a two-phase paradigm to generate the subdivision’s vertices, half-edges, and faces. After spatially partitioning the input data, the first phase runs the sequential DCEL construction algorithm on each data partition in parallel. The second phase then iteratively merges information from multiple data parti- tions to generate the shared data structure. Our experimental evaluation with real data of road networks of up to 563 million line segments shows significant performance advantages of the proposed approach over the existing techniques. 
    more » « less
  8. This paper introduces a generalized spatial regionalization problem, namely, PRUC ( P -Regions with User-defined Constraint) that partitions spatial areas into homogeneous regions. PRUC accounts for user-defined constraints imposed over aggregate region properties. We show that PRUC is an NP-Hard problem. To solve PRUC, we introduce GSLO (Global Search with Local Optimization), a parallel stochastic regionalization algorithm. GSLO is composed of two phases: (1) Global Search that initially partitions areas into regions that satisfy a user-defined constraint, and (2) Local Optimization that further improves the quality of the partitioning with respect to intra-region similarity. We conduct an extensive experimental study using real datasets to evaluate the performance of GSLO. Experimental results show that GSLO is up to 100× faster than the state-of-the-art algorithms. GSLO provides partitioning that is up to 6× better with respect to intra-region similarity. Furthermore, GSLO is able to handle 4× larger datasets than the state-of-the-art algorithms. 
    more » « less