Spatial data conflation is aimed at matching and merging objects in two datasets into a more comprehensive one. Starting from the “map assignment problem” in the 1980s, optimized conflation models treat feature matching as a natural optimization problem of minimizing certain metrics, such as the total discrepancy. One complication in optimized conflation is that heterogeneous datasets can represent geographic features differently. Features can correspond to target features in the other dataset either on a one-to-one basis (forming full matches) or on a many-to-one basis (forming partial matches). Traditional models consider either full matching or partial matches exclusively. This dichotomy has several issues. Firstly, full matching models are limited and cannot capture any partial match. Secondly, partial matching models treat full matches just as partial matches, and they are more prone to admit false matches. Thirdly, existing conflation models may introduce conflicting directional matches. This paper presents a new model that captures both full and partial matches simultaneously. This allows us to impose structural constraints differently on full/partial matches and enforce the consistency between directional matches. Experimental results show that the new model outperforms conventional optimized conflation models in terms of precision (89.2%), while achieving a similar recall (93.2%).
more »
« less
Towards Topological Geospatial Conflation: An Optimized Node-Arc Conflation Model for Road Networks
Geospatial data conflation is the process of identifying and merging the corresponding features in two datasets that represent the same objects in reality. Conflation is needed in a wide range of geospatial analyses, yet it is a difficult task, often considered too unreliable and costly due to various discrepancies between GIS data sources. This study addresses the reliability issue of computerized conflation by developing stronger optimization-based conflation models for matching two network datasets with minimum discrepancy. Conventional models match roads on a feature-by-feature basis. By comparison, we propose a new node-arc conflation model that simultaneously matches road-center lines and junctions in a topologically consistent manner. Enforcing this topological consistency increases the reliability of conflation and reduces false matches. Similar to the well-known rubber-sheeting method, our model allows for the use of network junctions as “control” points for matching network edges. Unlike rubber sheeting, the new model is automatic and matches all junctions (and edges) in one pass. To the best of our knowledge, this is the first optimized conflation model that can match nodes and edges in one model. Computational experiments using six road networks in Santa Barbara, CA, showed that the new model is selective and reduces false matches more than existing optimized conflation models. On average, it achieves a precision of 94.7% with over 81% recall and achieves a 99.4% precision when enhanced with string distances.
more »
« less
- Award ID(s):
- 2215155
- PAR ID:
- 10525066
- Publisher / Repository:
- MDPI
- Date Published:
- Journal Name:
- ISPRS International Journal of Geo-Information
- Volume:
- 13
- Issue:
- 1
- ISSN:
- 2220-9964
- Page Range / eLocation ID:
- 15
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Abstract Geospatial data conflation is the process of combining multiple datasets about a geographic phenomenon to produce a single, richer dataset. It has received increased research attention due to its many applications in map making, transportation, planning, and temporal geospatial analyses, among many others. One approach to conflation, attempted from the outset in the literature, is the use of optimization‐based conflation methods. Conflation is treated as a natural optimization problem of minimizing the total number of discrepancies while finding corresponding features from two datasets. Optimization‐based conflation has several advantages over traditional methods including conciseness, being able to find an optimal solution, and ease of implementation. However, current optimization‐based conflation methods are also limited. A main shortcoming with current optimized conflation models (and other traditional methods as well) is that they are often too weak and cannot utilize the spatial context in each dataset while matching corresponding features. In particular, current optimal conflation models match a feature to targets independently from other features and therefore treat each GIS dataset as a collection of unrelated elements, reminiscent of the spaghetti GIS data model. Important contextual information such as the connectivity between adjacent elements (such as roads) is neglected during the matching. Consequently, such models may produce topologically inconsistent results. In this article, we address this issue by introducing new optimization‐based conflation models with structural constraints to preserve the connectivity and contiguity relation among features. The model is implemented using integer linear programming and compared with traditional spaghetti‐style models on multiple test datasets. Experimental results show that the new element connectivity (ec‐bimatching) model reduces false matches and consistently outperforms traditional models.more » « less
-
Geospatial data conflation involves matching and combining two maps to create a new map. It has received increased research attention in recent years due to its wide range of applications in GIS (Geographic Information System) data production and analysis. The map assignment problem (conceptualized in the 1980s) is one of the earliest conflation methods, in which GIS features from two maps are matched by minimizing their total discrepancy or distance. Recently, more flexible optimization models have been proposed. This includes conflation models based on the network flow problem and new models based on Mixed Integer Linear Programming (MILP). A natural question is: how are these models related or different, and how do they compare? In this study, an analytic review of major optimized conflation models in the literature is conducted and the structural linkages between them are identified. Moreover, a MILP model (the base-matching problem) and its bi-matching version are presented as a common basis. Our analysis shows that the assignment problem and all other optimized conflation models in the literature can be viewed or reformulated as variants of the base models. For network-flow based models, proof is presented that the base-matching problem is equivalent to the network-flow based fixed-charge-matching model. The equivalence of the MILP reformulation is also verified experimentally. For the existing MILP-based models, common notation is established and used to demonstrate that they are extensions of the base models in straight-forward ways. The contributions of this study are threefold. Firstly, it helps the analyst to understand the structural commonalities and differences of current conflation models and to choose different models. Secondly, by reformulating the network-flow models (and therefore, all current models) using MILP, the presented work eases the practical application of conflation by leveraging the many off-the-shelf MILP solvers. Thirdly, the base models can serve as a common ground for studying and writing new conflation models by allowing a modular and incremental way of model development.more » « less
-
Robust feature matching forms the backbone for most Visual Simultaneous Localization and Mapping (vSLAM), visual odometry, 3D reconstruction, and Structure from Motion (SfM) algorithms. However, recovering feature matches from texture-poor scenes is a major challenge and still remains an open area of research. In this paper, we present a Stereo Visual Odometry (StereoVO) technique based on point and line features which uses a novel feature-matching mechanism based on an Attention Graph Neural Network that is designed to perform well even under adverse weather conditions such as fog, haze, rain, and snow, and dynamic lighting conditions such as nighttime illumination and glare scenarios. We perform experiments on multiple real and synthetic datasets to validate our method's ability to perform StereoVO under low-visibility weather and lighting conditions through robust point and line matches. The results demonstrate that our method achieves more line feature matches than state-of-the-art line-matching algorithms, which when complemented with point feature matches perform consistently well in adverse weather and dynamic lighting conditions.more » « less
-
We present a new scientific document similarity model based on matching fine-grained aspects of texts. To train our model, we exploit a naturally-occurring source of supervision: sentences in the full-text of papers that cite multiple papers together (co-citations). Such co-citations not only reflect close paper relatedness, but also provide textual descriptions of how the co-cited papers are related. This novel form of textual supervision is used for learning to match aspects across papers. We develop multi-vector representations where vectors correspond to sentence-level aspects of documents, and present two methods for aspect matching: (1) A fast method that only matches single aspects, and (2) a method that makes sparse multiple matches with an Optimal Transport mechanism that computes an Earth Mover’s Distance between aspects. Our approach improves performance on document similarity tasks in four datasets. Further, our fast single-match method achieves competitive results, paving the way for applying fine-grained similarity to large scientific corpora.more » « less
An official website of the United States government

