skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Statistical Inference for Spatial Regionalization
The process of regionalization involves clustering a set of spatial areas into spatially contiguous regions. Given the NP-hard nature of regionalization problems, all existing algorithms yield approximate solutions. To ascertain the quality of these approximations, it is crucial for domain experts to obtain statistically significant evidence on optimizing the objective function, in comparison to a random reference distribution derived from all potential sample solutions. In this paper, we propose a novel spatial regionalization problem, denoted as SISR (Statistical Inference for Spatial Regionalization), which generates random sample solutions with a predetermined region cardinality. The driving motivation behind SISR is to conduct statistical inference on any given regionalization scheme. To address SISR, we present a parallel technique named PRRP (P-Regionalization through Recursive Partitioning). PRRP operates over three phases: the region-growing phase constructs initial regions with a predetermined region cardinality, while the region merging and region-splitting phases ensure the spatial contiguity of unassigned areas, allowing for the growth of subsequent regions with predetermined cardinalities. An extensive evaluation shows the effectiveness of PRRP using various real datasets.  more » « less
Award ID(s):
2237348
PAR ID:
10494683
Author(s) / Creator(s):
; ;
Publisher / Repository:
ACM
Date Published:
Journal Name:
The 31st ACM International Conference on Advances in Geographic Information Systems (ACM SIGSPATIAL '23)
ISBN:
9798400701689
Page Range / eLocation ID:
1 to 12
Subject(s) / Keyword(s):
Statistical Inference, Regionalization, Spatial Clustering
Format(s):
Medium: X
Location:
Hamburg, Germany
Sponsoring Org:
National Science Foundation
More Like this
  1. Regionalization techniques group spatial areas into a set of homogeneous regions to analyze and draw conclusions about spatial phenomena. A recent regionalization problem, called MP-regions, groups spatial areas to produce a maximum number of regions by enforcing a user-defined constraint at the regional level. The MP-regions problem is NP-hard. Existing approximate algorithms for MP-regions do not scale for large datasets due to their high computational cost and inherently centralized approaches to process data. This article introduces a parallel scalable regionalization framework (PAGE) to support MP-regions on large datasets. The proposed framework works in two stages. The first stage finds an initial solution through randomized search, and the second stage improves this solution through efficient heuristic search. To build an initial solution efficiently, we extend traditional spatial partitioning techniques to enable parallelized region building without violating the spatial constraints. Furthermore, we optimize the region building efficiency and quality by tuning the randomized area selection to trade off runtime with region homogeneity. The experimental evaluation shows the superiority of our framework to support an order of magnitude larger datasets efficiently compared to the state-of-the-art techniques while producing high-quality solutions. 
    more » « less
  2. This paper introduces a generalized spatial regionalization problem, namely, PRUC ( P -Regions with User-defined Constraint) that partitions spatial areas into homogeneous regions. PRUC accounts for user-defined constraints imposed over aggregate region properties. We show that PRUC is an NP-Hard problem. To solve PRUC, we introduce GSLO (Global Search with Local Optimization), a parallel stochastic regionalization algorithm. GSLO is composed of two phases: (1) Global Search that initially partitions areas into regions that satisfy a user-defined constraint, and (2) Local Optimization that further improves the quality of the partitioning with respect to intra-region similarity. We conduct an extensive experimental study using real datasets to evaluate the performance of GSLO. Experimental results show that GSLO is up to 100× faster than the state-of-the-art algorithms. GSLO provides partitioning that is up to 6× better with respect to intra-region similarity. Furthermore, GSLO is able to handle 4× larger datasets than the state-of-the-art algorithms. 
    more » « less
  3. Spatial data regularly suffer from error and uncertainty, ranging from poorly georeferenced coordinate pairs to sampling error associated with American Community Survey data. Geographic information systems can amplify and propagate error and uncertainty through the abstraction and representation of spatial data, as can the manipulation, processing, and analysis of spatial data using exploratory and confirmatory statistical techniques. The purpose of this article is to explore and address uncertainty in regionalization, a fundamental spatial analytical method that aggregates spatial units (e.g., tracts) into a set of contiguous regions for strategic purposes, including school districting, habitat areas, and the like. Specifically, we develop a new regionalization method, theuncertain‐max‐p‐regionsproblem that explicitly incorporates attribute uncertainty and allows its impacts to be evaluated with a degree of statistical certainty. We also detail an efficient solution approach for dealing the problem. The results suggest that the developed problem can out‐perform existing regionalization approaches and that the addition of a measure of statistical confidence can help to facilitate more clarity in planning and policy decisions. 
    more » « less
  4. Spatial regionalization is the process of combining a collection of spatial polygons into contiguous regions that satisfy user-defined criteria and objectives. Numerous techniques for spatial regionalization have been proposed in the literature, which employs varying methods for region growing, seeding, optimization, and enforce different user-defined constraints and objectives. This paper introduces a scalable unified system for addressing seeding spatial regionalization queries efficiently. The proposed system provides a usable and scalable framework that employs a wide-range of existing spatial regionalization techniques and allows users to submit novel combinations of queries that have not been previously explored. This represents a significant step forward in the field of spatial regionalization as it provides a robust platform for addressing different regionalization queries. The system is mainly composed of three components: query parser, query planner, and query executor. Preliminary evaluations of the system demonstrate its efficacy in efficiently addressing various regionalization queries. 
    more » « less
  5. Abstract The max‐p‐compact‐regions problem involves the aggregation of a set of small areas into an unknown maximum number (p) of compact, homogeneous, and spatially contiguous regions such that a regional attribute value is higher than a predefined threshold. The max‐p‐compact‐regions problem is an extension of the max‐p‐regions problem accounting for compactness. The max‐p‐regions model has been widely used to define study regions in many application cases since it allows users to specify criteria and then to identify a regionalization scheme. However, the max‐p‐regions model does not consider compactness even though compactness is usually a desirable goal in regionalization, implying ideal accessibility and apparent homogeneity. This article discusses how to integrate a compactness measure into the max‐pregionalization process by constructing a multiobjective optimization model that maximizes the number of regions while optimizing the compactness of identified regions. An efficient heuristic algorithm is developed to address the computational intensity of the max‐p‐compact‐regions problem so that it can be applied to large‐scale practical regionalization problems. This new algorithm will be implemented in the open‐source Python Spatial Analysis Library. One hypothetical and one practical application of the max‐p‐compact‐regions problem are introduced to demonstrate the effectiveness and efficiency of the proposed algorithm. 
    more » « less