skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Linear feature conflation: An optimization‐based matching model with connectivity constraints
Abstract Geospatial data conflation is the process of combining multiple datasets about a geographic phenomenon to produce a single, richer dataset. It has received increased research attention due to its many applications in map making, transportation, planning, and temporal geospatial analyses, among many others. One approach to conflation, attempted from the outset in the literature, is the use of optimization‐based conflation methods. Conflation is treated as a natural optimization problem of minimizing the total number of discrepancies while finding corresponding features from two datasets. Optimization‐based conflation has several advantages over traditional methods including conciseness, being able to find an optimal solution, and ease of implementation. However, current optimization‐based conflation methods are also limited. A main shortcoming with current optimized conflation models (and other traditional methods as well) is that they are often too weak and cannot utilize the spatial context in each dataset while matching corresponding features. In particular, current optimal conflation models match a feature to targets independently from other features and therefore treat each GIS dataset as a collection of unrelated elements, reminiscent of the spaghetti GIS data model. Important contextual information such as the connectivity between adjacent elements (such as roads) is neglected during the matching. Consequently, such models may produce topologically inconsistent results. In this article, we address this issue by introducing new optimization‐based conflation models with structural constraints to preserve the connectivity and contiguity relation among features. The model is implemented using integer linear programming and compared with traditional spaghetti‐style models on multiple test datasets. Experimental results show that the new element connectivity (ec‐bimatching) model reduces false matches and consistently outperforms traditional models.  more » « less
Award ID(s):
2215155
PAR ID:
10418655
Author(s) / Creator(s):
 ;  
Publisher / Repository:
Wiley-Blackwell
Date Published:
Journal Name:
Transactions in GIS
Volume:
27
Issue:
4
ISSN:
1361-1682
Format(s):
Medium: X Size: p. 1205-1227
Size(s):
p. 1205-1227
Sponsoring Org:
National Science Foundation
More Like this
  1. Geospatial data conflation is the process of identifying and merging the corresponding features in two datasets that represent the same objects in reality. Conflation is needed in a wide range of geospatial analyses, yet it is a difficult task, often considered too unreliable and costly due to various discrepancies between GIS data sources. This study addresses the reliability issue of computerized conflation by developing stronger optimization-based conflation models for matching two network datasets with minimum discrepancy. Conventional models match roads on a feature-by-feature basis. By comparison, we propose a new node-arc conflation model that simultaneously matches road-center lines and junctions in a topologically consistent manner. Enforcing this topological consistency increases the reliability of conflation and reduces false matches. Similar to the well-known rubber-sheeting method, our model allows for the use of network junctions as “control” points for matching network edges. Unlike rubber sheeting, the new model is automatic and matches all junctions (and edges) in one pass. To the best of our knowledge, this is the first optimized conflation model that can match nodes and edges in one model. Computational experiments using six road networks in Santa Barbara, CA, showed that the new model is selective and reduces false matches more than existing optimized conflation models. On average, it achieves a precision of 94.7% with over 81% recall and achieves a 99.4% precision when enhanced with string distances. 
    more » « less
  2. Geospatial data conflation involves matching and combining two maps to create a new map. It has received increased research attention in recent years due to its wide range of applications in GIS (Geographic Information System) data production and analysis. The map assignment problem (conceptualized in the 1980s) is one of the earliest conflation methods, in which GIS features from two maps are matched by minimizing their total discrepancy or distance. Recently, more flexible optimization models have been proposed. This includes conflation models based on the network flow problem and new models based on Mixed Integer Linear Programming (MILP). A natural question is: how are these models related or different, and how do they compare? In this study, an analytic review of major optimized conflation models in the literature is conducted and the structural linkages between them are identified. Moreover, a MILP model (the base-matching problem) and its bi-matching version are presented as a common basis. Our analysis shows that the assignment problem and all other optimized conflation models in the literature can be viewed or reformulated as variants of the base models. For network-flow based models, proof is presented that the base-matching problem is equivalent to the network-flow based fixed-charge-matching model. The equivalence of the MILP reformulation is also verified experimentally. For the existing MILP-based models, common notation is established and used to demonstrate that they are extensions of the base models in straight-forward ways. The contributions of this study are threefold. Firstly, it helps the analyst to understand the structural commonalities and differences of current conflation models and to choose different models. Secondly, by reformulating the network-flow models (and therefore, all current models) using MILP, the presented work eases the practical application of conflation by leveraging the many off-the-shelf MILP solvers. Thirdly, the base models can serve as a common ground for studying and writing new conflation models by allowing a modular and incremental way of model development. 
    more » « less
  3. Spatial data conflation is aimed at matching and merging objects in two datasets into a more comprehensive one. Starting from the “map assignment problem” in the 1980s, optimized conflation models treat feature matching as a natural optimization problem of minimizing certain metrics, such as the total discrepancy. One complication in optimized conflation is that heterogeneous datasets can represent geographic features differently. Features can correspond to target features in the other dataset either on a one-to-one basis (forming full matches) or on a many-to-one basis (forming partial matches). Traditional models consider either full matching or partial matches exclusively. This dichotomy has several issues. Firstly, full matching models are limited and cannot capture any partial match. Secondly, partial matching models treat full matches just as partial matches, and they are more prone to admit false matches. Thirdly, existing conflation models may introduce conflicting directional matches. This paper presents a new model that captures both full and partial matches simultaneously. This allows us to impose structural constraints differently on full/partial matches and enforce the consistency between directional matches. Experimental results show that the new model outperforms conventional optimized conflation models in terms of precision (89.2%), while achieving a similar recall (93.2%). 
    more » « less
  4. Outdoor ambient acoustical environments may be predicted through supervised machine learning using geospatial features as inputs. However, collecting sufficient training data is an expensive process, particularly when attempting to improve the accuracy of models based on supervised learning methods over large, geospatially diverse regions. Unsupervised machine learning methods, such as K-Means clustering analysis, enable a statistical comparison between the geospatial diversity represented in the current training dataset versus the predictor locations. In this case, the geospatial features that represent the regions of western North Carolina and Utah have been simultaneously clustered to examine the common clusters between the two locations. Initial results show that most geospatial clusters group themselves according to a relatively small number of prominent geospatial features, and that Utah requires appreciably more clusters to represent its geospace. Additionally, the training dataset has a relatively low geospatial diversity because most of the current training data sites reside in a small number of clusters. This analysis informs a choice of new site locations for data acquisition that maximize the statistical similarity of the training and input datasets. 
    more » « less
  5. In this paper, a Segment Anything Model (SAM)-based pedestrian infrastructure segmentation workflow is designed and optimized, which is capable of efficiently processing multi-sourced geospatial data, including LiDAR data and satellite imagery data. We used an expanded definition of pedestrian infrastructure inventory, which goes beyond the traditional transportation elements to include street furniture objects that are important for accessibility but are often omitted from the traditional definition. Our contributions lie in producing the necessary knowledge to answer the following three questions. First, how can mobile LiDAR technology be leveraged to produce comprehensive pedestrian-accessible infrastructure inventory? Second, which data representation can facilitate zero-shot segmentation of infrastructure objects with SAM? Third, how well does the SAM-based method perform on segmenting pedestrian infrastructure objects? Our proposed method is designed to efficiently create pedestrian-accessible infrastructure inventory through the zero-shot segmentation of multi-sourced geospatial datasets. Through addressing three research questions, we show how the multi-mode data should be prepared, what data representation works best for what asset features, and how SAM performs on these data presentations. Our findings indicate that street-view images generated from mobile LiDAR point-cloud data, when paired with satellite imagery data, can work efficiently with SAM to create a scalable pedestrian infrastructure inventory approach with immediate benefits to GIS professionals, city managers, transportation owners, and walkers, especially those with travel-limiting disabilities, such as individuals who are blind, have low vision, or experience mobility disabilities. 
    more » « less