skip to main content


Title: The max‐ p ‐compact‐regions problem
Abstract

The max‐p‐compact‐regions problem involves the aggregation of a set of small areas into an unknown maximum number (p) of compact, homogeneous, and spatially contiguous regions such that a regional attribute value is higher than a predefined threshold. The max‐p‐compact‐regions problem is an extension of the max‐p‐regions problem accounting for compactness. The max‐p‐regions model has been widely used to define study regions in many application cases since it allows users to specify criteria and then to identify a regionalization scheme. However, the max‐p‐regions model does not consider compactness even though compactness is usually a desirable goal in regionalization, implying ideal accessibility and apparent homogeneity. This article discusses how to integrate a compactness measure into the max‐pregionalization process by constructing a multiobjective optimization model that maximizes the number of regions while optimizing the compactness of identified regions. An efficient heuristic algorithm is developed to address the computational intensity of the max‐p‐compact‐regions problem so that it can be applied to large‐scale practical regionalization problems. This new algorithm will be implemented in the open‐source Python Spatial Analysis Library. One hypothetical and one practical application of the max‐p‐compact‐regions problem are introduced to demonstrate the effectiveness and efficiency of the proposed algorithm.

 
more » « less
Award ID(s):
1831615
NSF-PAR ID:
10364743
Author(s) / Creator(s):
 ;  ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Transactions in GIS
Volume:
26
Issue:
2
ISSN:
1361-1682
Page Range / eLocation ID:
p. 717-734
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Abstract

    We introduce and study a class of optimization problems we call replenishment problems with fixed turnover times: a very natural model that has received little attention in the literature. Clients with capacity for storing a certain commodity are located at various places; at each client the commodity depletes within a certain time, the turnover time, which is constant but can vary between locations. Clients should never run empty. The natural feature that makes this problem interesting is that we may schedule a replenishment (well) before a client becomes empty, but then the next replenishment will be due earlier also. This added workload needs to be balanced against the cost of routing vehicles to do the replenishments. In this paper, we focus on the aspect of minimizing routing costs. However, the framework of recurring tasks, in which the next job of a task must be done within a fixed amount of time after the previous one is much more general and gives an adequate model for many practical situations. Note that our problem has an infinite time horizon. However, it can be fully characterized by a compact input, containing only the location of each client and a turnover time. This makes determining its computational complexity highly challenging and indeed it remains essentially unresolved. We study the problem for two objectives:minavg  minimizes the average tour cost andminmax  minimizes the maximum tour cost over all days. Forminmax  we derive a logarithmic factor approximation for the problem on general metrics and a 6-approximation for the problem on trees, for which we have a proof of NP-hardness. Forminavg  we present a logarithmic factor approximation on general metrics, a 2-approximation for trees, and a pseudopolynomial time algorithm for the line. Many intriguing problems remain open.

     
    more » « less
  2. Spatial data regularly suffer from error and uncertainty, ranging from poorly georeferenced coordinate pairs to sampling error associated with American Community Survey data. Geographic information systems can amplify and propagate error and uncertainty through the abstraction and representation of spatial data, as can the manipulation, processing, and analysis of spatial data using exploratory and confirmatory statistical techniques. The purpose of this article is to explore and address uncertainty in regionalization, a fundamental spatial analytical method that aggregates spatial units (e.g., tracts) into a set of contiguous regions for strategic purposes, including school districting, habitat areas, and the like. Specifically, we develop a new regionalization method, theuncertain‐max‐p‐regionsproblem that explicitly incorporates attribute uncertainty and allows its impacts to be evaluated with a degree of statistical certainty. We also detail an efficient solution approach for dealing the problem. The results suggest that the developed problem can out‐perform existing regionalization approaches and that the addition of a measure of statistical confidence can help to facilitate more clarity in planning and policy decisions.

     
    more » « less
  3. Regionalization techniques group spatial areas into a set of homogeneous regions to analyze and draw conclusions about spatial phenomena. A recent regionalization problem, called MP-regions, groups spatial areas to produce a maximum number of regions by enforcing a user-defined constraint at the regional level. The MP-regions problem is NP-hard. Existing approximate algorithms for MP-regions do not scale for large datasets due to their high computational cost and inherently centralized approaches to process data. This article introduces a parallel scalable regionalization framework (PAGE) to support MP-regions on large datasets. The proposed framework works in two stages. The first stage finds an initial solution through randomized search, and the second stage improves this solution through efficient heuristic search. To build an initial solution efficiently, we extend traditional spatial partitioning techniques to enable parallelized region building without violating the spatial constraints. Furthermore, we optimize the region building efficiency and quality by tuning the randomized area selection to trade off runtime with region homogeneity. The experimental evaluation shows the superiority of our framework to support an order of magnitude larger datasets efficiently compared to the state-of-the-art techniques while producing high-quality solutions.

     
    more » « less
  4. The sum-product network (SPN) has been extended to model sequence data with the recurrent SPN (RSPN), and to decision-making problems with sum-product-max networks (SPMN). In this paper, we build on the concepts introduced by these extensions and present state-based recurrent SPMNs (S-RSPMNs) as a generalization of SPMNs to sequential decision-making problems where the state may not be perfectly observed. As with recurrent SPNs, S-RSPMNs utilize a repeatable template network to model sequences of arbitrary lengths. We present an algorithm for learning compact template structures by identifying unique belief states and the transitions between them through a state matching process that utilizes augmented data. In our knowledge, this is the first data-driven approach that learns graphical models for planning under partial observability, which can be solved efficiently. S-RSPMNs retain the linear solution complexity of SPMNs, and we demonstrate significant improvements in compactness of representation and the run time of structure learning and inference in sequential domains.

     
    more » « less
  5. Abstract

    We map crustal regions in Southern California that have similar depth variations in seismic velocities by applying cluster analysis to 1.5 millionPandSvelocity profiles from the three‐dimensional tomographic model CVM‐S4.26. We use aK‐means algorithm to partition the profiles intoKsets that minimize the inter‐cluster variance. The regionalizations forK≤ 10 generate a coherent sequence of structural refinements: each increment ofKintroduces a new region typically by partitioning a larger region into two smaller regions or by occupying a transition zone between two regions. The results forK≤ 7 are insensitive to initialization and trimming of the model periphery; nearly identical results are obtained if thePandSvelocity profiles are treated separately or jointly. The regions forK= 7 can be associated with major physiographic provinces and geologic areas with recognized tectonic affinities, including the Continental Borderland, Great Valley, Salton Trough, and Mojave Desert. The regionalization splits the Sierra Nevada and Peninsular Range batholiths into the western and eastern zones consistent with geological, geochemical, and potential‐field mapping. Three of the regions define a geographic domain comprising almost all of the upper crust derived from continental lithosphere. Well‐resolved regional boundaries coincide with major faults, topographic fronts, and/or geochemical transitions mapped at the surface. The consistent alignment of these surface features with deeper transitions in the crustal velocity profiles indicates that regional boundaries are typically narrow, high‐angle structures separating regions with characteristic crustal columns that reflect different compositions and tectonic histories.

     
    more » « less