Image-based localization has been widely used for autonomous vehicles, robotics, augmented reality, etc., and this is carried out by matching a query image taken from a cell phone or vehicle dashcam to a large scale of geo-tagged reference images, such as satellite/aerial images or Google Street Views. However, the problem remains challenging due to the inconsistency between the query images and the large-scale reference datasets regarding various light and weather conditions. To tackle this issue, this work proposes a novel view synthesis framework equipped with deep generative models, which can merge the unique features from the outdated reference dataset with features from the images containing seasonal changes. Our design features a unique scheme to ensure that the synthesized images contain the important features from both reference and patch images, covering seasonable features and minimizing the gap for the image-based localization tasks. The performance evaluation shows that the proposed framework can synthesize the views in various weather and lighting conditions.
How can one visually characterize photographs of people over time? In this work, we describe the
- NSF-PAR ID:
- 10420800
- Publisher / Repository:
- Wiley-Blackwell
- Date Published:
- Journal Name:
- Computer Graphics Forum
- Volume:
- 42
- Issue:
- 2
- ISSN:
- 0167-7055
- Page Range / eLocation ID:
- p. 281-291
- Format(s):
- Medium: X
- Sponsoring Org:
- National Science Foundation
More Like this
-
-
Jonathan R. Whitlock (Ed.)
Introduction Understanding the neural code has been one of the central aims of neuroscience research for decades. Spikes are commonly referred to as the units of information transfer, but multi-unit activity (MUA) recordings are routinely analyzed in aggregate forms such as binned spike counts, peri-stimulus time histograms, firing rates, or population codes. Various forms of averaging also occur in the brain, from the spatial averaging of spikes within dendritic trees to their temporal averaging through synaptic dynamics. However, how these forms of averaging are related to each other or to the spatial and temporal units of information representation within the neural code has remained poorly understood.
Materials and methods In this work we developed NeuroPixelHD, a symbolic hyperdimensional model of MUA, and used it to decode the spatial location and identity of static images shown to
n = 9 mice in the Allen Institute Visual Coding—NeuroPixels dataset from large-scale MUA recordings. We parametrically varied the spatial and temporal resolutions of the MUA data provided to the model, and compared its resulting decoding accuracy.Results For almost all subjects, we found 125ms temporal resolution to maximize decoding accuracy for both the spatial location of Gabor patches (81 classes for patches presented over a 9×9 grid) as well as the identity of natural images (118 classes corresponding to 118 images) across the whole brain. This optimal temporal resolution nevertheless varied greatly between different regions, followed a sensory-associate hierarchy, and was significantly modulated by the central frequency of theta-band oscillations across different regions. Spatially, the optimal resolution was at either of two mesoscale levels for almost all mice: the area level, where the spiking activity of all neurons within each brain area are combined, and the population level, where neuronal spikes within each area are combined across fast spiking (putatively inhibitory) and regular spiking (putatively excitatory) neurons, respectively. We also observed an expected interplay between optimal spatial and temporal resolutions, whereby increasing the amount of averaging across one dimension (space or time) decreases the amount of averaging that is optimal across the other dimension, and vice versa.
Discussion Our findings corroborate existing empirical practices of spatiotemporal binning and averaging in MUA data analysis, and provide a rigorous computational framework for optimizing the level of such aggregations. Our findings can also synthesize these empirical practices with existing knowledge of the various sources of biological averaging in the brain into a new theory of neural information processing in which the
unit of information varies dynamically based on neuronal signal and noise correlations across space and time. -
Abstract Background Live imaging is the gold standard for determining how cells give rise to organs. However, tracking many cells across whole organs over large developmental time windows is extremely challenging. In this work, we provide a comparably simple method for confocal live imaging entire
Arabidopsis thaliana first leaves across early development. Our imaging method works for both wild-type leaves and the complex curved leaves of thejaw-1D mutant.Results We find that dissecting the cotyledons, affixing a coverslip above the samples and mounting samples with perfluorodecalin yields optimal imaging series for robust cellular and organ level analysis. We provide details of our complementary image processing steps in MorphoGraphX software for segmenting, tracking lineages, and measuring a suite of cellular properties. We also provide MorphoGraphX image processing scripts we developed to automate analysis of segmented images and data presentation.
Conclusions Our imaging techniques and processing steps combine into a robust imaging pipeline. With this pipeline we are able to examine important nuances in the cellular growth and differentiation of
jaw-D versus WT leaves that have not been demonstrated before. Our pipeline is approachable and easy to use for leaf development live imaging. -
Optical coherence tomography (OCT) has stimulated a wide range of medical image-based diagnosis and treatment in fields such as cardiology and ophthalmology. Such applications can be further facilitated by deep learning-based super-resolution technology, which improves the capability of resolving morphological structures. However, existing deep learning-based method only focuses on spatial distribution and disregards frequency fidelity in image reconstruction, leading to a frequency bias. To overcome this limitation, we propose a frequency-aware super-resolution framework that integrates three critical frequency-based modules (i.e., frequency transformation, frequency skip connection, and frequency alignment) and frequency-based loss function into a conditional generative adversarial network (cGAN). We conducted a large-scale quantitative study from an existing coronary OCT dataset to demonstrate the superiority of our proposed framework over existing deep learning frameworks. In addition, we confirmed the generalizability of our framework by applying it to fish corneal images and rat retinal images, demonstrating its capability to super-resolve morphological details in eye imaging.
-
Abstract How do drawings—ranging from detailed illustrations to schematic diagrams—reliably convey meaning? Do viewers understand drawings based on how strongly they resemble an entity (i.e., as images) or based on socially mediated conventions (i.e., as symbols)? Here we evaluate a cognitive account of pictorial meaning in which visual and social information jointly support visual communication. Pairs of participants used drawings to repeatedly communicate the identity of a target object among multiple distractor objects. We manipulated social cues across three experiments and a full replication, finding that participants developed object-specific and interaction-specific strategies for communicating more efficiently over time, beyond what task practice or a resemblance-based account alone could explain. Leveraging model-based image analyses and crowdsourced annotations, we further determined that drawings did not drift toward “arbitrariness,” as predicted by a pure convention-based account, but preserved visually diagnostic features. Taken together, these findings advance psychological theories of how successful graphical conventions emerge.