skip to main content

Attention:

The NSF Public Access Repository (NSF-PAR) system and access will be unavailable from 11:00 PM ET on Friday, September 13 until 2:00 AM ET on Saturday, September 14 due to maintenance. We apologize for the inconvenience.


Title: A Probabilistic Approach to Address Data Uncertainty in Regionalization

Spatial data regularly suffer from error and uncertainty, ranging from poorly georeferenced coordinate pairs to sampling error associated with American Community Survey data. Geographic information systems can amplify and propagate error and uncertainty through the abstraction and representation of spatial data, as can the manipulation, processing, and analysis of spatial data using exploratory and confirmatory statistical techniques. The purpose of this article is to explore and address uncertainty in regionalization, a fundamental spatial analytical method that aggregates spatial units (e.g., tracts) into a set of contiguous regions for strategic purposes, including school districting, habitat areas, and the like. Specifically, we develop a new regionalization method, theuncertain‐max‐p‐regionsproblem that explicitly incorporates attribute uncertainty and allows its impacts to be evaluated with a degree of statistical certainty. We also detail an efficient solution approach for dealing the problem. The results suggest that the developed problem can out‐perform existing regionalization approaches and that the addition of a measure of statistical confidence can help to facilitate more clarity in planning and policy decisions.

 
more » « less
Award ID(s):
1831615
NSF-PAR ID:
10366423
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Geographical Analysis
Volume:
54
Issue:
2
ISSN:
0016-7363
Page Range / eLocation ID:
p. 405-426
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. The process of regionalization involves clustering a set of spatial areas into spatially contiguous regions. Given the NP-hard nature of regionalization problems, all existing algorithms yield approximate solutions. To ascertain the quality of these approximations, it is crucial for domain experts to obtain statistically significant evidence on optimizing the objective function, in comparison to a random reference distribution derived from all potential sample solutions. In this paper, we propose a novel spatial regionalization problem, denoted as SISR (Statistical Inference for Spatial Regionalization), which generates random sample solutions with a predetermined region cardinality. The driving motivation behind SISR is to conduct statistical inference on any given regionalization scheme. To address SISR, we present a parallel technique named PRRP (P-Regionalization through Recursive Partitioning). PRRP operates over three phases: the region-growing phase constructs initial regions with a predetermined region cardinality, while the region merging and region-splitting phases ensure the spatial contiguity of unassigned areas, allowing for the growth of subsequent regions with predetermined cardinalities. An extensive evaluation shows the effectiveness of PRRP using various real datasets. 
    more » « less
  2. Abstract

    The max‐p‐compact‐regions problem involves the aggregation of a set of small areas into an unknown maximum number (p) of compact, homogeneous, and spatially contiguous regions such that a regional attribute value is higher than a predefined threshold. The max‐p‐compact‐regions problem is an extension of the max‐p‐regions problem accounting for compactness. The max‐p‐regions model has been widely used to define study regions in many application cases since it allows users to specify criteria and then to identify a regionalization scheme. However, the max‐p‐regions model does not consider compactness even though compactness is usually a desirable goal in regionalization, implying ideal accessibility and apparent homogeneity. This article discusses how to integrate a compactness measure into the max‐pregionalization process by constructing a multiobjective optimization model that maximizes the number of regions while optimizing the compactness of identified regions. An efficient heuristic algorithm is developed to address the computational intensity of the max‐p‐compact‐regions problem so that it can be applied to large‐scale practical regionalization problems. This new algorithm will be implemented in the open‐source Python Spatial Analysis Library. One hypothetical and one practical application of the max‐p‐compact‐regions problem are introduced to demonstrate the effectiveness and efficiency of the proposed algorithm.

     
    more » « less
  3. Regionalization techniques group spatial areas into a set of homogeneous regions to analyze and draw conclusions about spatial phenomena. A recent regionalization problem, called MP-regions, groups spatial areas to produce a maximum number of regions by enforcing a user-defined constraint at the regional level. The MP-regions problem is NP-hard. Existing approximate algorithms for MP-regions do not scale for large datasets due to their high computational cost and inherently centralized approaches to process data. This article introduces a parallel scalable regionalization framework (PAGE) to support MP-regions on large datasets. The proposed framework works in two stages. The first stage finds an initial solution through randomized search, and the second stage improves this solution through efficient heuristic search. To build an initial solution efficiently, we extend traditional spatial partitioning techniques to enable parallelized region building without violating the spatial constraints. Furthermore, we optimize the region building efficiency and quality by tuning the randomized area selection to trade off runtime with region homogeneity. The experimental evaluation shows the superiority of our framework to support an order of magnitude larger datasets efficiently compared to the state-of-the-art techniques while producing high-quality solutions.

     
    more » « less
  4. Abstract

    Patterns ofδ18O andδ2H in Earth's precipitation provide essential scientific data for use in hydrological, climatological, ecological and forensic research. Insufficient global spatial data coverage promulgated the use of gridded datasets employing geostatistical techniques (isoscapes) for spatiotemporally coherent isotope predictions. Cluster‐based isoscape regionalization combines the advantages of local or regional prediction calibrations into a global framework. Here we present a revision of a Regionalized Cluster‐Based Water Isotope Prediction model (RCWIP2) incorporating new isotope data having extensive spatial coverage and a wider array of predictor variables combined with high‐resolution gridded climatic data. We introduced coupling ofδ18O andδ2H (e.g.,d‐excess constrained) in the model predictions to prevent runaway isoscapes when each isotope is modelled separately and cross‐checked observed versus modelledd‐excess values. We improved model error quantification by adopting full uncertainty propagation in all calculations. RCWIP2 improved the RMSE over previous isoscape models by ca. 0.3 ‰ forδ18O and 2.5 ‰ forδ2H with an uncertainty <1.0 ‰ forδ18O and < 8 ‰ forδ2H for most regions of the world. The determination of the relative importance of each predictor variable in each ecoclimatic zone is a new approach to identify previously unrecognized climatic drivers on mean annual precipitationδ18O andδ2H. The improved RCWIP2 isoscape grids and maps (season, monthly, annual, regional) are available for download athttps://isotopehydrologynetwork.iaea.org.

     
    more » « less
  5. Abstract

    Models of bathymetry derived from satellite radar altimetry are essential for modeling many marine processes. They are affected by uncertainties which require quantification. We propose an uncertainty model that assumes errors are caused by the lack of high‐wavenumber content within the altimetry data. The model is then applied to a tsunami hazard assessment. We build a bathymetry uncertainty model for northern Chile. Statistical properties of the altimetry‐predicted bathymetry error are obtained using multibeam data. We find that a Von Karman correlation function and a Laplacian marginal distribution can be used to define an uncertainty model based on a random field. We also propose a method for generating synthetic bathymetry samples conditional to shipboard measurements. The method is further extended to account for interpolation uncertainties, when bathymetry data resolution is finer than10 km. We illustrate the usefulness of the method by quantifying the bathymetry‐induced uncertainty of a tsunami hazard estimate. We demonstrate that tsunami leading wave predictions at middle/near field tide gauges and buoys are insensitive to bathymetry uncertainties in Chile. This result implies that tsunami early warning approaches can take full advantage of altimetry‐predicted bathymetry in numerical simulations. Finally, we evaluate the feasibility of modeling uncertainties in regions without multibeam data by assessing the bathymetry error statistics of 15 globally distributed regions. We find that a general Von Karman correlation and a Laplacian marginal distribution can serve as a first‐order approximation. The standard deviation of the uncertainty random field model varies regionally and is estimated from a proposed scaling law.

     
    more » « less