skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Title: Learning V1 simple cells with vector representations of local contents and matrix representations of local motions.
This paper proposes a representational model for image pairs such as consecutive video frames that are related by local pixel displacements, in the hope that the model may shed light on motion perception in primary visual cortex (V1). The model couples the following two components: (1) the vector representations of local contents of images and (2) the matrix representations of local pixel displacements caused by the relative motions between the agent and the objects in the 3D scene. When the image frame undergoes changes due to local pixel displacements, the vectors are multiplied by the matrices that represent the local displacements. Thus the vector representation is equivariant as it varies according to the local displacements. Our experiments show that our model can learn Gabor-like filter pairs of quadrature phases. The profiles of the learned filters match those of simple cells in Macaque V1. Moreover, we demonstrate that the model can learn to infer local motions in either a supervised or unsupervised manner. With such a simple model, we achieve competitive results on optical flow estimation.  more » « less
Award ID(s):
2015577
PAR ID:
10351292
Author(s) / Creator(s):
; ; ; ; ;
Date Published:
Journal Name:
The Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI) 2022.
Format(s):
Medium: X
Sponsoring Org:
National Science Foundation
More Like this
  1. How to effectively represent camera pose is an essential problem in 3D computer vision, especially in tasks such as camera pose regression and novel view synthesis. Traditionally, 3D position of the camera is represented by Cartesian coordinate and the orientation is represented by Euler angle or quaternions. These representations are manually designed, which may not be the most effective representation for downstream tasks. In this work, we propose an approach to learn neural representations of camera poses and 3D scenes, coupled with neural representations of local camera movements. Specifically, the camera pose and 3D scene are represented as vectors and the local camera movement is represented as a matrix operating on the vector of the camera pose. We demonstrate that the camera movement can further be parametrized by a matrix Lie algebra that underlies a rotation system in the neural space. The vector representations are then concatenated and generate the posed 2D image through a decoder network. The model is learned from only posed 2D images and corresponding camera poses, without access to depths or shapes. We conduct extensive experiments on synthetic and real datasets. The results show that compared with other camera pose representations, our learned representation is more robust to noise in novel view synthesis and more effective in camera pose regression. 
    more » « less
  2. Deep neural networks can learn powerful prior probability models for images, as evidenced by the high-quality generations obtained with recent score-based diffusion methods. But the means by which these networks capture complex global statistical structure, apparently without suffering from the curse of dimensionality, remain a mystery. To study this, we incorporate diffusion methods into a multi-scale decomposition, reducing dimensionality by assuming a stationary local Markov model for wavelet coefficients conditioned on coarser-scale coefficients. We instantiate this model using convolutional neural networks (CNNs) with local receptive fields, which enforce both the stationarity and Markov properties. Global structures are captured using a CNN with receptive fields covering the entire (but small) low-pass image. We test this model on a dataset of face images, which are highly non-stationary and contain large-scale geometric structures. Remarkably, denoising, super-resolution, and image synthesis results all demonstrate that these structures can be captured with significantly smaller conditioning neighborhoods than required by a Markov model implemented in the pixel domain. Our results show that score estimation for large complex images can be reduced to low-dimensional Markov conditional models across scales, alleviating the curse of dimensionality. 
    more » « less
  3. Abstract Following large earthquakes, viscoelastic stress relaxation may contribute to postseismic deformation observed at Earth's surface. Mechanical representations of viscoelastic deformation require a constitutive relationship for the lower crust/upper mantle material where stresses are diffused and, for non‐linear rheologies, knowledge of absolute stress level. Here, we describe a kinematic approach to representing geodetically observed postseismic motions that does not require an assumed viscoelastic rheology. The core idea is to use observed surface motions to constrain time‐dependent displacement boundary conditions applied at the base of the elastic upper crust by viscoelastic motions in the lower crust/upper mantle, approximating these displacements as slip on a set of dislocation elements. Using three‐dimensional forward models of viscoelastically modulated postseismic deformation in a thrust fault setting, we show how this approach can accurately represent surface motions and recover predicted displacements at the base of the elastic layer. Applied to the 1999 Chi‐Chi (Taiwan) earthquake, this kinematic approach can reproduce geodetically observed displacements and estimates of the partitioning between correlated postseismic deformation mechanisms. Specifically, we simultaneously estimate afterslip on the earthquake source fault that is similar to previous estimates, along with slip on dislocations at the base of the elastic layer that mimic predictions from viscous stress dissipation models in which viscosity is inferred to vary three‐dimensionally. A use case for the dislocation approach to modeling viscoelastic deformation is the estimation of spatiotemporally variable fault slip processes, including across sequential interseismic phases of the earthquake cycle, without assuming a lower crust/upper mantle rheology. 
    more » « less
  4. Implementing local contextual guidance principles in a single-layer CNN architecture, we propose an efficient algorithm for developing broad-purpose representations (i.e., representations transferable to new tasks without additional training) in shallow CNNs trained on limited-size datasets. A contextually guided CNN (CG-CNN) is trained on groups of neighboring image patches picked at random image locations in the dataset. Such neighboring patches are likely to have a common context and therefore are treated for the purposes of training as belonging to the same class. Across multiple iterations of such training on different context-sharing groups of image patches, CNN features that are optimized in one iteration are then transferred to the next iteration for further optimization, etc. In this process, CNN features acquire higher pluripotency, or inferential utility for any arbitrary classification task. In our applications to natural images and hyperspectral images, we find that CG-CNN can learn transferable features similar to those learned by the first layers of the well-known deep networks and produce favorable classification accuracies. 
    more » « less
  5. Implicit neural representations (INR) have been recently proposed as deep learning (DL) based solutions for image compression. An image can be compressed by training an INR model with fewer weights than the number of image pixels to map the coordinates of the image to corresponding pixel values. While traditional training approaches for INRs are based on enforcing pixel-wise image consistency, we propose to further improve image quality by using a new structural regularizer. We present structural regularization for INR compression (SINCO) as a novel INR method for image compression. SINCO imposes structural consistency of the compressed images to the groundtruth by using a segmentation network to penalize the discrepancy of segmentation masks predicted from compressed images. We validate SINCO on brain MRI images by showing that it can achieve better performance than some recent INR methods. 
    more » « less