Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database. As an extremely challenging task, its difficulties root in the drastic view changes and different capturing time between two views. Despite these difficulties, recent works achieve outstanding progress on cross-view geo-localization benchmarks. However, existing methods still suffer from poor performance on the cross-area benchmarks, in which the training and testing data are captured from two different regions. We attribute this deficiency to the lack of ability to extract the spatial configuration of visual feature layouts and models' overfitting on low-level details from the training set. In this paper, we propose GeoDTR which explicitly disentangles geometric information from raw features and learns the spatial correlations among visual features from aerial and ground pairs with a novel geometric layout extractor module. This module generates a set of geometric layout descriptors, modulating the raw features and producing high-quality latent representations. In addition, we elaborate on two categories of data augmentations, (i) Layout simulation, which varies the spatial configuration while keeping the low-level details intact. (ii) Semantic augmentation, which alters the low-level details and encourages the model to capture spatial configurations. These augmentations help to improve the performance of the cross-view geo-localization models, especially on the cross-area benchmarks. Moreover, we propose a counterfactual-based learning process to benefit the geometric layout extractor in exploring spatial information. Extensive experiments show that GeoDTR not only achieves state-of-the-art results but also significantly boosts the performance on same-area and cross-area benchmarks. Our code can be found at https://gitlab.com/vail-uvm/geodtr.
more »
« less
Toward Systematic Design Considerations of Organizing Multiple Views
Multiple-view visualization (MV) has been used for visual analytics in various fields (e.g., bioinformatics, cybersecurity, and intelligence analysis). Because each view encodes data from a particular per-spective, analysts often use a set of views laid out in 2D space to link and synthesize information. The difficulty of this process is impacted by the spatial organization of these views. For instance, connecting information from views far from each other can be more challenging than neighboring ones. However, most visual analysis tools currently either fix the positions of the views or completely delegate this organization of views to users (who must manually drag and move views). This either limits user involvement in managing the layout of MV or is overly flexible without much guidance. Then, a key design challenge in MV layout is determining the factors in a spatial organization that impact understanding. To address this, we review a set of MV-based systems and identify considerations for MV layout rooted in two key concerns: perception, which considers how users perceive view relationships, and content, which considers the relationships in the data. We show how these allow us to study and analyze the design of MV layout systematically.
more »
« less
- Award ID(s):
- 2022443
- PAR ID:
- 10351829
- Publisher / Repository:
- IEEE
- Date Published:
- Journal Name:
- 2022 IEEE Visualization and Visual Analytics (VIS)
- ISBN:
- 978-1-6654-8812-9
- Format(s):
- Medium: X
- Location:
- Oklahoma City, OK, USA
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Visualization grammars are gaining popularity as they allow visualization specialists and experienced users to quickly create static and interactive views. Existing grammars, however, mostly focus on abstract views, ignoring three-dimensional (3D) views, which are very important in fields such as natural sciences. We propose a generalized interaction grammar for the problem of coordinating heterogeneous view types, such as standard charts (e.g., based on Vega-Lite) and 3D anatomical views. An important aspect of our web-based framework is that user interactions with data items at various levels of detail can be systematically integrated and used to control the overall layout of the application workspace. With the help of a concise JSON-based specification of the intended workflow, we can handle complex interactive visual analysis scenarios. This enables rapid prototyping and iterative refinement of the visual analysis tool in collaboration with domain experts. We illustrate the usefulness of our framework in two real-world case studies from the field of neuroscience. Since the logic of the presented grammar-based approach for handling interactions between heterogeneous web-based views is free of any application specifics, it can also serve as a template for applications beyond biological research.more » « less
-
Geovisualizations are powerful tools for exploratory spatial analysis, enabling sighted users to discern patterns, trends, and relationships within geographic data. However, these visual tools have remained largely inaccessible to screen-reader users. We introduce AltGeoViz, a new interactive geovisualization approach that dynamically generates alt-text descriptions based on the user’s current map view, providing voiceover summaries of spatial patterns and descriptive statistics. In a remote user study with five screenreader users, we found that participants were able to interact with spatial data in previously infeasible ways, demonstrated a clear understanding of data summaries and their location context, and could synthesize spatial understandings of their explorations. Moreover, we identified key areas for improvement, such as the addition of spatial navigation controls and comparative analysis featuresmore » « less
-
Learning global features by aggregating information over multiple views has been shown to be effective for 3D shape analysis. For view aggregation in deep learning models, pooling has been applied extensively. However, pooling leads to a loss of the content within views, and the spatial relationship among views, which limits the discriminability of learned features. We propose 3DViewGraph to resolve this issue, which learns 3D global features by more effectively aggregating unordered views with attention. Specifically, unordered views taken around a shape are regarded as view nodes on a view graph. 3DViewGraph first learns a novel latent semantic mapping to project low-level view features into meaningful latent semantic embeddings in a lower dimensional space, which is spanned by latent semantic patterns. Then, the content and spatial information of each pair of view nodes are encoded by a novel spatial pattern correlation, where the correlation is computed among latent semantic patterns. Finally, all spatial pattern correlations are integrated with attention weights learned by a novel attention mechanism. This further increases the discriminability of learned features by highlighting the unordered view nodes with distinctive characteristics and depressing the ones with appearance ambiguity. We show that 3DViewGraph outperforms state-of-the-art methods under three large-scale benchmarks.more » « less
-
Abstract The quantitative estimation of precipitation from orbiting passive microwave imagers has been performed for more than 30 years. The development of retrieval methods consists of establishing physical or statistical relationships between the brightness temperatures (TBs) measured at frequencies between 5 and 200 GHz and precipitation. Until now, these relationships have essentially been established at the “pixel” level, associating the average precipitation rate inside a predefined area (the pixel) to the collocated multispectral radiometric measurement. This approach considers each pixel as an independent realization of a process and ignores the fact that precipitation is a dynamic variable with rich multiscale spatial and temporal organization. Here we propose to look beyond the pixel values of the TBs and show that useful information for precipitation retrieval can be derived from the variations of the observed TBs in a spatial neighborhood around the pixel of interest. We also show that considering neighboring information allows us to better handle the complex observation geometry of conical-scanning microwave imagers, involving frequency-dependent beamwidths, overlapping fields of view, and large Earth incidence angles. Using spatial convolution filters, we compute “nonlocal” radiometric parameters sensitive to spatial patterns and scale-dependent structures of the TB fields, which are the “geometric signatures” of specific precipitation structures such as convective cells. We demonstrate that using nonlocal radiometric parameters to enrich the spectral information associated to each pixel allows for reduced retrieval uncertainty (reduction of 6%–11% of the mean absolute retrieval error) in a simple k-nearest neighbors retrieval scheme.more » « less
An official website of the United States government

