skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Multi-scale representation learning for spatial feature distributions using grid cells
Unsupervised text encoding models have recently fueled substantial progress in NLP. The key idea is to use neural networks to convert words in texts to vector space representations based on word positions in a sentence and their contexts, which are suitable for end-to-end training of downstream tasks. We see a strikingly similar situation in spatial analysis, which focuses on incorporating both absolute positions and spatial contexts of geographic objects such as POIs into models. A general-purpose representation model for space is valuable for a multitude of tasks. However, no such general model exists to date beyond simply applying discretization or feed-forward nets to coordinates, and little effort has been put into jointly modeling distributions with vastly different characteristics, which commonly emerges from GIS data. Meanwhile, Nobel Prize-winning Neuroscience research shows that grid cells in mammals provide a multi-scale periodic representation that functions as a metric for location encoding and is critical for recognizing places and for path-integration. Therefore, we propose a representation learning model called Space2Vec to encode the absolute positions and spatial relationships of places. We conduct experiments on two real-world geographic data for two different tasks: 1) predicting types of POIs given their positions and context, 2) image classification leveraging their geo-locations. Results show that because of its multi-scale representations, Space2Vec outperforms well-established ML approaches such as RBF kernels, multi-layer feed-forward nets, and tile embedding approaches for location modeling and image classification tasks. Detailed analysis shows that all baselines can at most well handle distribution at one scale but show poor performances in other scales. In contrast, Space2Vec's multi-scale representation can handle distributions at different scales.  more » « less
Award ID(s):
2033521
PAR ID:
10462499
Author(s) / Creator(s):
Date Published:
Journal Name:
ICLR 2020
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. Satellite imagery is being leveraged for many societally critical tasks across climate, economics, and public health. Yet, because of heterogeneity in landscapes (e.g. how a road looks in different places), models can show disparate performance across geographic areas. Given the important potential of disparities in algorithmic systems used in societal contexts, here we consider the risk of urban-rural disparities in identification of land-cover features. This is via semantic segmentation (a common computer vision task in which image regions are labelled according to what is being shown) which uses pre-trained image representations generated via contrastive self-supervised learning. We propose fair dense representation with contrastive learning (FairDCL) as a method for de-biasing the multi-level latent space of a convolution neural network. The method improves feature identification by removing spurious latent representations which are disparately distributed across urban and rural areas, and is achieved in an unsupervised way by contrastive pre-training. The pre-trained image representation mitigates downstream urban-rural prediction disparities and outperforms state-of-the-art baselines on real-world satellite images. Embedding space evaluation and ablation studies further demonstrate FairDCL’s robustness. As generalizability and robustness in geographic imagery is a nascent topic, our work motivates researchers to consider metrics beyond average accuracy in such applications. 
    more » « less
  2. Extensive studies in rodents show that place cells in the hippocampus have firing patterns that are highly correlated with the animal's location in the environment and are organized in layers of increasing field sizes or scales along its dorsoventral axis. In this study, we use a spatial cognition model to show that different field sizes could be exploited to adapt the place cell representation to different environments according to their size and complexity. Specifically, we provide an in-depth analysis of how to distribute place cell fields according to the obstacles in cluttered environments to optimize learning time and path optimality during goal-oriented spatial navigation tasks. The analysis uses a reinforcement learning (RL) model that assumes that place cells allow encoding the state. While previous studies have suggested exploiting different field sizes to represent areas requiring different spatial resolutions, our work analyzes specific distributions that adapt the representation to the environment, activating larger fields in open areas and smaller fields near goals and subgoals (e.g., obstacle corners). In addition to assessing how the multi-scale representation may be exploited in spatial navigation tasks, our analysis and results suggest place cell representations that can impact the robotics field by reducing the total number of cells for path planning without compromising the quality of the paths learned. 
    more » « less
  3. Abstract Investigations into how individual neurons encode behavioral variables of interest have revealed specific representations in single neurons, such as place and object cells, as well as a wide range of cells with conjunctive encodings or mixed selectivity. However, as most experiments examine neural activity within individual tasks, it is currently unclear if and how neural representations change across different task contexts. Within this discussion, the medial temporal lobe is particularly salient, as it is known to be important for multiple behaviors including spatial navigation and memory, however the relationship between these functions is currently unclear. Here, to investigate how representations in single neurons vary across different task contexts in the medial temporal lobe, we collected and analyzed single‐neuron activity from human participants as they completed a paired‐task session consisting of a passive‐viewing visual working memory and a spatial navigation and memory task. Five patients contributed 22 paired‐task sessions, which were spike sorted together to allow for the same putative single neurons to be compared between the different tasks. Within each task, we replicated concept‐related activations in the working memory task, as well as target‐location and serial‐position responsive cells in the navigation task. When comparing neuronal activity between tasks, we first established that a significant number of neurons maintained the same kind of representation, responding to stimuli presentations across tasks. Further, we found cells that changed the nature of their representation across tasks, including a significant number of cells that were stimulus responsive in the working memory task that responded to serial position in the spatial task. Overall, our results support a flexible encoding of multiple, distinct aspects of different tasks by single neurons in the human medial temporal lobe, whereby some individual neurons change the nature of their feature coding between task contexts. 
    more » « less
  4. Geospatial analysis lacks methods like the word vector representations and pre-trained networks that significantly boost performance across a wide range of natural language and computer vision tasks. To fill this gap, we introduce Tile2Vec, an unsupervised representation learning algorithm that extends the distributional hypothesis from natural language — words appearing in similar contexts tend to have similar meanings — to spatially distributed data. We demonstrate empirically that Tile2Vec learns semantically meaningful representations for both image and non-image datasets. Our learned representations significantly improve performance in downstream classification tasks and, similarly to word vectors, allow visual analogies to be obtained via simple arithmetic in the latent space. 
    more » « less
  5. Recovering multi-person 3D poses and shapes with absolute scales from a single RGB image is a challenging task due to the inherent depth and scale ambiguity from a single view. Current works on 3D pose and shape estimation tend to mainly focus on the estimation of the 3D joint locations relative to the root joint , usually defined as the one closest to the shape centroid, in case of humans defined as the pelvis joint. In this paper, we build upon an existing multi-person 3D mesh predictor network, ROMP, to create Absolute-ROMP. By adding absolute root joint localization in the camera coordinate frame, we are able to estimate multi-person 3D poses and shapes with absolute scales from a single RGB image. Such a single-shot approach allows the system to better learn and reason about the inter-person depth relationship, thus improving multi-person 3D estimation. In addition to this end to end network, we also train a CNN and transformer hybrid network, called TransFocal, to predict the f ocal length of the image’s camera. Absolute-ROMP estimates the 3D mesh coordinates of all persons in the image and their root joint locations normalized by the focal point. We then use TransFocal to obtain focal length and get absolute depth information of all joints in the camera coordinate frame. We evaluate Absolute-ROMP on the root joint localization and root-relative 3D pose estimation tasks on publicly available multi-person 3D pose datasets. We evaluate TransFocal on dataset created from the Pano360 dataset and both are applicable to in-the-wild images and videos, due to real time performance. 
    more » « less