Abstract Geospatial data conflation is the process of combining multiple datasets about a geographic phenomenon to produce a single, richer dataset. It has received increased research attention due to its many applications in map making, transportation, planning, and temporal geospatial analyses, among many others. One approach to conflation, attempted from the outset in the literature, is the use of optimization‐based conflation methods. Conflation is treated as a natural optimization problem of minimizing the total number of discrepancies while finding corresponding features from two datasets. Optimization‐based conflation has several advantages over traditional methods including conciseness, being able to find an optimal solution, and ease of implementation. However, current optimization‐based conflation methods are also limited. A main shortcoming with current optimized conflation models (and other traditional methods as well) is that they are often too weak and cannot utilize the spatial context in each dataset while matching corresponding features. In particular, current optimal conflation models match a feature to targets independently from other features and therefore treat each GIS dataset as a collection of unrelated elements, reminiscent of the spaghetti GIS data model. Important contextual information such as the connectivity between adjacent elements (such as roads) is neglected during the matching. Consequently, such models may produce topologically inconsistent results. In this article, we address this issue by introducing new optimization‐based conflation models with structural constraints to preserve the connectivity and contiguity relation among features. The model is implemented using integer linear programming and compared with traditional spaghetti‐style models on multiple test datasets. Experimental results show that the new element connectivity (ec‐bimatching) model reduces false matches and consistently outperforms traditional models.
more »
« less
On the Theoretical Link between Optimized Geospatial Conflation Models for Linear Features
Geospatial data conflation involves matching and combining two maps to create a new map. It has received increased research attention in recent years due to its wide range of applications in GIS (Geographic Information System) data production and analysis. The map assignment problem (conceptualized in the 1980s) is one of the earliest conflation methods, in which GIS features from two maps are matched by minimizing their total discrepancy or distance. Recently, more flexible optimization models have been proposed. This includes conflation models based on the network flow problem and new models based on Mixed Integer Linear Programming (MILP). A natural question is: how are these models related or different, and how do they compare? In this study, an analytic review of major optimized conflation models in the literature is conducted and the structural linkages between them are identified. Moreover, a MILP model (the base-matching problem) and its bi-matching version are presented as a common basis. Our analysis shows that the assignment problem and all other optimized conflation models in the literature can be viewed or reformulated as variants of the base models. For network-flow based models, proof is presented that the base-matching problem is equivalent to the network-flow based fixed-charge-matching model. The equivalence of the MILP reformulation is also verified experimentally. For the existing MILP-based models, common notation is established and used to demonstrate that they are extensions of the base models in straight-forward ways. The contributions of this study are threefold. Firstly, it helps the analyst to understand the structural commonalities and differences of current conflation models and to choose different models. Secondly, by reformulating the network-flow models (and therefore, all current models) using MILP, the presented work eases the practical application of conflation by leveraging the many off-the-shelf MILP solvers. Thirdly, the base models can serve as a common ground for studying and writing new conflation models by allowing a modular and incremental way of model development.
more »
« less
- Award ID(s):
- 2215155
- PAR ID:
- 10621229
- Publisher / Repository:
- MDPI
- Date Published:
- Journal Name:
- ISPRS International Journal of Geo-Information
- Volume:
- 13
- Issue:
- 9
- ISSN:
- 2220-9964
- Page Range / eLocation ID:
- 310
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Spatial data conflation is aimed at matching and merging objects in two datasets into a more comprehensive one. Starting from the “map assignment problem” in the 1980s, optimized conflation models treat feature matching as a natural optimization problem of minimizing certain metrics, such as the total discrepancy. One complication in optimized conflation is that heterogeneous datasets can represent geographic features differently. Features can correspond to target features in the other dataset either on a one-to-one basis (forming full matches) or on a many-to-one basis (forming partial matches). Traditional models consider either full matching or partial matches exclusively. This dichotomy has several issues. Firstly, full matching models are limited and cannot capture any partial match. Secondly, partial matching models treat full matches just as partial matches, and they are more prone to admit false matches. Thirdly, existing conflation models may introduce conflicting directional matches. This paper presents a new model that captures both full and partial matches simultaneously. This allows us to impose structural constraints differently on full/partial matches and enforce the consistency between directional matches. Experimental results show that the new model outperforms conventional optimized conflation models in terms of precision (89.2%), while achieving a similar recall (93.2%).more » « less
-
Geospatial data conflation is the process of identifying and merging the corresponding features in two datasets that represent the same objects in reality. Conflation is needed in a wide range of geospatial analyses, yet it is a difficult task, often considered too unreliable and costly due to various discrepancies between GIS data sources. This study addresses the reliability issue of computerized conflation by developing stronger optimization-based conflation models for matching two network datasets with minimum discrepancy. Conventional models match roads on a feature-by-feature basis. By comparison, we propose a new node-arc conflation model that simultaneously matches road-center lines and junctions in a topologically consistent manner. Enforcing this topological consistency increases the reliability of conflation and reduces false matches. Similar to the well-known rubber-sheeting method, our model allows for the use of network junctions as “control” points for matching network edges. Unlike rubber sheeting, the new model is automatic and matches all junctions (and edges) in one pass. To the best of our knowledge, this is the first optimized conflation model that can match nodes and edges in one model. Computational experiments using six road networks in Santa Barbara, CA, showed that the new model is selective and reduces false matches more than existing optimized conflation models. On average, it achieves a precision of 94.7% with over 81% recall and achieves a 99.4% precision when enhanced with string distances.more » « less
-
Abstract In this paper we present a reconstruction technique for the reduction of unsteady flow data based on neural representations of time‐varying vector fields. Our approach is motivated by the large amount of data typically generated in numerical simulations, and in turn the types of data that domain scientists can generatein situthat are compact, yet useful, for post hoc analysis. One type of data commonly acquired during simulation are samples of the flow map, where a single sample is the result of integrating the underlying vector field for a specified time duration. In our work, we treat a collection of flow map samples for a single dataset as a meaningful, compact, and yet incomplete, representation of unsteady flow, and our central objective is to find a representation that enables us to best recover arbitrary flow map samples. To this end, we introduce a technique for learning implicit neural representations of time‐varying vector fields that are specifically optimized to reproduce flow map samples sparsely covering the spatiotemporal domain of the data. We show that, despite aggressive data reduction, our optimization problem — learning a function‐space neural network to reproduce flow map samples under a fixed integration scheme — leads to representations that demonstrate strong generalization, both in the field itself, and using the field to approximate the flow map. Through quantitative and qualitative analysis across different datasets we show that our approach is an improvement across a variety of data reduction methods, and across a variety of measures ranging from improved vector fields, flow maps, and features derived from the flow map.more » « less
-
We consider several variants of the map-matching problem, which seeks to find a path Q in graph G that has the smallest distance to a given trajectory P (which is likely not to be exactly on the graph). In a typical application setting, P models a noisy GPS trajectory from a person traveling on a road network, and the desired path Q should ideally correspond to the actual path in G that the person has traveled. Existing map-matching algorithms in the literature consider all possible paths in G as potential candidates for Q. We find solutions to the map-matching problem under different settings. In particular, we restrict the set of paths to shortest paths, or concatenations of shortest paths, in G. As a distance measure, we use the Fréchet distance, which is a suitable distance measure for curves since it takes the continuity of the curves into account.more » « less
An official website of the United States government

